Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Overview

Olric Tweet

GoDoc Coverage Status Build Status Go Report Card Gitter License

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

With Olric, you can instantly create a fast, scalable, shared pool of RAM across a cluster of computers.

See Docker and Sample Code sections to get started!

At a glance

  • Designed to share some transient, approximate, fast-changing data between servers,
  • Embeddable but can be used as a language-independent service with olricd,
  • Supports different eviction algorithms,
  • Fast binary protocol,
  • Highly available and horizontally scalable,
  • Provides best-effort consistency guarantees without being a complete CP (indeed PA/EC) solution,
  • Supports replication by default (with sync and async options),
  • Quorum-based voting for replica control (Read/Write quorums),
  • Supports atomic operations,
  • Supports distributed queries on keys,
  • Provides a plugin interface for service discovery daemons,
  • Provides a locking primitive which inspired by SETNX of Redis,
  • Supports distributed topic data structure,

Possible Use Cases

With this feature set, Olric is suitable to use as a distributed cache. But it also provides distributed topics, data replication, failure detection and simple anti-entropy services. So it can be used as an ordinary key/value data store to scale your cloud application.

Table of Contents

Features

  • Designed to share some transient, approximate, fast-changing data between servers,
  • Accepts arbitrary types as value,
  • Only in-memory,
  • Implements a fast and simple binary protocol,
  • Embeddable but can be used as a language-independent service with olricd,
  • GC-friendly storage engine,
  • O(1) running time for lookups,
  • Supports atomic operations,
  • Provides a lock implementation which can be used for non-critical purposes,
  • Different eviction policies: LRU, MaxIdleDuration and Time-To-Live (TTL),
  • Highly available,
  • Horizontally scalable,
  • Provides best-effort consistency guarantees without being a complete CP (indeed PA/EC) solution,
  • Distributes load fairly among cluster members with a consistent hash function,
  • Supports replication by default (with sync and async options),
  • Quorum-based voting for replica control,
  • Thread-safe by default,
  • Supports distributed queries on keys,
  • Provides a plugin interface for service discovery daemons and cloud providers,
  • Provides a command-line-interface to access the cluster directly from the terminal,
  • Supports different serialization formats. Gob, JSON and MessagePack are supported out of the box,
  • Provides a locking primitive which inspired by SETNX of Redis,
  • Supports distributed topic data structure,

See Architecture section to see details.

Planned Features

  • Distributed queries over keys and values,
  • Database backend for persistence,
  • Anti-entropy system to repair inconsistencies in DMaps,
  • Eviction listeners by using Publish/Subscribe,
  • Memcached interface,
  • Client implementations for different languages: Java, Python and JavaScript,
  • Expose DMap API via HTTP.

Support

You feel free to ask any questions about Olric and possible integration problems.

You also feel free to open an issue on GitHub to report bugs and share feature requests.

Installing

With a correctly configured Golang environment:

go get -u github.com/buraksezer/olric

Then, install olricd and its siblings:

go install -v ./cmd/*

Now you should access olricd, olric-stats, olric-cli and olric-load on your path. You can just run olricd to start experimenting:

olricd -c cmd/olricd/olricd.yaml

See Configuration section to create your cluster properly.

Docker

You can launch olricd Docker container by running the following command.

docker run -p 3320:3320 olricio/olricd:latest

This command will pull olricd Docker image and run a new Olric Instance. You should know that the container exposes 3320 and 3322 ports.

Now you should access the olricd instance by using olric-cli. So you can build olric-cli by using the following command:

go get -u github.com/buraksezer/olric/cmd/olric-cli

Now you are able to connect the olricd server:

olric-cli
[127.0.0.1:3320] »

Give help command to see available commands. Olric has a dedicated repository for Docker related resources. Please take a look at buraksezer/olric-docker for more information.

Kubernetes

Olric is able to discover peers automatically on Kubernetes platform via olric-cloud-plugin. We have a very simple Kubernetes setup right now. In the near future, this will be a major development/improvement area for Olric.

If you have a running Kubernetes cluster, you can use the following command to deploy a new Olric cluster with 3 nodes:

kubectl apply -f https://raw.githubusercontent.com/buraksezer/olric-kubernetes/master/olricd.yaml

If everything goes well, you should see something like that:

kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
dnsutils                  1/1     Running   0          20d
olricd-6c7f54d445-ndm8v   1/1     Running   0          34s
olricd-6c7f54d445-s6g6r   1/1     Running   0          34s
olricd-6c7f54d445-vjkhf   1/1     Running   0          34s

Now we have an Olric cluster on Kubernetes with 3 nodes. One of them is the cluster coordinator and manages the routing table for rest of the cluster.

Deploy olric-debug to reach the cluster:

kubectl apply -f https://raw.githubusercontent.com/buraksezer/olric-kubernetes/master/olric-debug.yaml

Verify whether olric-debug pod works or not:

kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
...
olric-debug               1/1     Running   0          54s
...

Get a shell to the running container:

kubectl exec -it olric-debug -- /bin/sh

Now you have a running Alpine Linux setup on Kubernetes. It includes olric-cli, olric-load and olric-stats commands.

/go/src/github.com/buraksezer/olric # olric-cli -a olricd.default.svc.cluster.local:3320
[olricd.default.svc.cluster.local:3320] » use users
use users
[olricd.default.svc.cluster.local:3320] » put buraksezer {"_id": "06054057", "name": "Burak", "surname": "Sezer", "job": "Engineer"}
[olricd.default.svc.cluster.local:3320] » get buraksezer
{"_id": "06054057", "name": "Burak", "surname": "Sezer", "job": "Engineer"}
[olricd.default.svc.cluster.local:3320] »

Congrats!

Bringing Olric into Kubernetes will be a major development area in the next releases.

Working with Docker Compose

We provide a multi-container environment to test, develop and deploy Olric clusters. Here is the documentation.

Operation Modes

Olric has two different operation modes.

Embedded Member

In Embedded Member Mode, members include both the application and Olric data and services. The advantage of the Embedded Member Mode is having a low-latency data access and locality.

Client-Server

In the Client-Server deployment, Olric data and services are centralized in one or more server members and they are accessed by the application through clients. You can have a cluster of server members that can be independently created and scaled. Your clients communicate with these members to reach to Olric data and services on them.

Client-Server deployment has advantages including more predictable and reliable performance, easier identification of problem causes and, most importantly, better scalability. When you need to scale in this deployment type, just add more Olric server members. You can address client and server scalability concerns separately.

See olricd section to get started.

Currently we only have the official Golang client. A possible Python implementation is on the way. After stabilizing the Olric Binary Protocol, the others may appear quickly.

Tooling

Olric comes with some useful tools to interact with the cluster.

olricd

With olricd, you can create an Olric cluster with a few commands. This is how to install olricd:

go get -u github.com/buraksezer/olric/cmd/olricd

Let's create a cluster with the following:

olricd -c <YOUR_CONFIG_FILE_PATH>

olricd also supports OLRICD_CONFIG environment variable to set configuration. Just like that:

OLRICD_CONFIG=<YOUR_CONFIG_FILE_PATH> olricd

Olric uses hashicorp/memberlist for failure detection and cluster membership. Currently there are different ways to discover peers in a cluster. You can use a static list of nodes in your olricd.yaml file. It's ideal for development and test environments. Olric also supports Consul, Kubernetes and well-known cloud providers for service discovery. Please take a look at Service Discovery section for further information.

You can find a sample configuration file under cmd/olricd/olricd.yaml.

See Client-Server section to get more information about this deployment scenario.

olric-cli

olric-cli is the Olric command line interface, a simple program that allows to send commands to Olric, and read the replies sent by the server, directly from the terminal.

In order to install olric-cli:

go get -u github.com/buraksezer/olric/cmd/olric-cli

olric-cli has an interactive (REPL) mode just like redis-cli:

olric-cli
[127.0.0.1:3320] >> use mydmap
use mydmap
[127.0.0.1:3320] >> get mykey
myvalue
[127.0.0.1:3320] >>

The interactive mode also keeps command history. It's possible to send protocol commands as command line arguments:

olric-cli -d mydmap -c "put mykey myvalue"

Then, retrieve the key:

olric-cli -d mydmap -c "get mykey"

It'll print myvalue.

In order to get more details about the options, call olric-cli -h in your shell.

olric-stats

olric-stats calls Stats command on a cluster member and prints the result. The returned data from the member includes the Go runtime metrics and statistics from hosted primary and backup partitions.

In order to install olric-stats:

go get -u github.com/buraksezer/olric/cmd/olric-stats

Statistics about a partition:

olric-stats -p 69
PartID: 69
  Owner: olric.node:3320
  Previous Owners: not found
  Backups: not found
  DMap count: 1
  DMaps:
    Name: olric-load-test
    Length: 1374
    Allocated: 1048576
    Inuse: 47946
    Garbage: 0

In order to get detailed statistics about the Go runtime, you should call olric-stats -a <ADDRESS> -r.

Without giving a partition number, it will print everything about the cluster and hosted primary/backup partitions. In order to get more details about the command, call olric-stats -h.

olric-load

olric-load simulates running commands done by N clients at the same time sending M total queries. It measures response time.

In order to install olric-load:

go get -u github.com/buraksezer/olric/cmd/olric-load

The following command calls Put command for 100000 keys on 127.0.0.1:3320 (it's default) and uses msgpack for serialization.

olric-load -c put -s msgpack -k 100000
### STATS FOR COMMAND: PUT ###
Serializer is msgpack
100000 requests completed in 1.209334678s
50 parallel clients

  93%  <=  1 milliseconds
   5%  <=  2 milliseconds

In order to get more details about the command, call olric-load -h.

Usage

Olric is designed to work efficiently with the minimum amount of configuration. So the default configuration should be enough for experimenting:

db, err := olric.New(config.New("local"))

This creates an Olric object without running any server at background. In order to run Olric, you need to call Start method.

err := db.Start()

When you call Start method, your process joins the cluster and will be responsible for some parts of the data. This call blocks indefinitely. So you may need to run it in a goroutine. Of course, this is just a single-node instance, because you didn't give any configuration.

When you want to leave the cluster, just need to call Shutdown method:

err := db.Shutdown(context.Background())

This will stop background tasks and servers. Finally purges in-memory data and quits.

Please note that this section aims to document DMap API in embedded member mode. If you prefer to use Olric in Client-Server mode, please jump to Golang Client section.

Distributed Map

Create a DMap instance:

dm, err := db.NewDMap("my-dmap")

Put

Put sets the value for the given key. It overwrites any previous value for that key and it's thread-safe.

err := dm.Put("my-key", "my-value")

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after Put returns but not before.

PutIf

PutIf sets the value for the given key. It overwrites any previous value for that key and it's thread-safe.

err := dm.PutIf("my-key", "my-value", flags)

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after PutIf returns but not before.

Flag argument currently has two different options:

  • IfNotFound: Only set the key if it does not already exist. It returns ErrFound if the key already exist.

  • IfFound: Only set the key if it already exist.It returns ErrKeyNotFound if the key does not exist.

Sample use:

err := dm.PutIf("my-key", "my-value", IfNotFound)

PutEx

PutEx sets the value for the given key with TTL. It overwrites any previous value for that key. It's thread-safe.

err := dm.PutEx("my-key", "my-value", time.Second)

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after PutEx returns but not before.

PutIfEx

PutIfEx sets the value for the given key with TTL. It overwrites any previous value for that key. It's thread-safe.

err := dm.PutIfEx("my-key", "my-value", time.Second, flags)

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after PutIfEx returns but not before.

Flag argument currently has two different options:

  • IfNotFound: Only set the key if it does not already exist. It returns ErrFound if the key already exist.

  • IfFound: Only set the key if it already exist.It returns ErrKeyNotFound if the key does not exist.

Sample use:

err := dm.PutIfEx("my-key", "my-value", time.Second, IfNotFound)

Get

Get gets the value for the given key. It returns ErrKeyNotFound if the DB does not contains the key. It's thread-safe.

value, err := dm.Get("my-key")

It is safe to modify the contents of the returned value.

GetEntry

Get gets the value for the given key with its metadata. It returns ErrKeyNotFound if the DB does not contains the key. It's thread-safe.

entry, err := dm.GetEntry("my-key")

Definition of Entry:

type Entry struct {
	Key       string
	Value     interface{}
	TTL       int64
	Timestamp int64
}

It is safe to modify the contents of the returned value.

Expire

Expire updates the expiry for the given key. It returns ErrKeyNotFound if the DB does not contains the key. It's thread-safe.

err := dm.Expire("my-key", time.Second)

The key has to be string. The second parameter is time.Duration.

Delete

Delete deletes the value for the given key. Delete will not return error if key doesn't exist. It's thread-safe.

err := dm.Delete("my-key")

It is safe to modify the contents of the argument after Delete returns.

LockWithTimeout

LockWithTimeout sets a lock for the given key. If the lock is still unreleased the end of given period of time, it automatically releases the lock. Acquired lock is only for the key in this DMap.

ctx, err := dm.LockWithTimeout("lock.foo", time.Millisecond, time.Second)

It returns immediately if it acquires the lock for the given key. Otherwise, it waits until deadline. You should keep LockContext (as ctx) value to call Unlock method to release the lock.

Creating a seperated DMap to keep locks may be a good idea.

You should know that the locks are approximate, and only to be used for non-critical purposes.

Please take a look at Lock Implementation section for implementation details.

Lock

Lock sets a lock for the given key. Acquired lock is only for the key in this DMap.

ctx, err := dm.Lock("lock.foo", time.Second)

It returns immediately if it acquires the lock for the given key. Otherwise, it waits until deadline. You should keep LockContext (as ctx) value to call Unlock method to release the lock.

You should know that the locks are approximate, and only to be used for non-critical purposes.

Unlock

Unlock releases an acquired lock for the given key. It returns ErrNoSuchLock if there is no lock for the given key.

err := ctx.Unlock()

Destroy

Destroy flushes the given DMap on the cluster. You should know that there is no global lock on DMaps. So if you call Put/PutEx and Destroy methods concurrently on the cluster, Put/PutEx calls may set new values to the DMap.

err := dm.Destroy()

Stats

Stats exposes some useful metrics to monitor an Olric node. It includes memory allocation metrics from partitions and the Go runtime metrics.

data, err := db.Stats()

See stats/stats.go for detailed info about the metrics.

Ping

Ping sends a dummy protocol messsage to the given host. This is useful to measure RTT between hosts. It also can be used as aliveness check.

err := db.Ping()

Query

Query runs a distributed query on a DMap instance. Olric supports a very simple query DSL and now, it only scans keys. The query DSL has very few keywords:

  • $onKey: Runs the given query on keys or manages options on keys for a given query.
  • $onValue: Runs the given query on values or manages options on values for a given query.
  • $options: Useful to modify data returned from a query

Keywords for $options:

  • $ignore: Ignores a value.

A distributed query looks like the following:

  query.M{
	  "$onKey": query.M{
		  "$regexMatch": "^even:",
		  "$options": query.M{
			  "$onValue": query.M{
				  "$ignore": true,
			  },
		  },
	  },
  }

This query finds the keys starts with even:, drops the values and returns only keys. If you also want to retrieve the values, just remove the $options directive:

  query.M{
	  "$onKey": query.M{
		  "$regexMatch": "^even:",
	  },
  }

In order to iterate over all the keys:

  query.M{
	  "$onKey": query.M{
		  "$regexMatch": "",
	  },
  }

This is how you call a distributed query over the cluster:

c, err := dm.Query(query.M{"$onKey": query.M{"$regexMatch": "",}})

Query function returns a cursor which has Range and Close methods. Please take look at the Range function for further info.

Here is a working query example.

Cursor

Cursor implements distributed queries in Olric. It has two methods: Range and Close

Range

Range calls f sequentially for each key and value yielded from the cursor. If f returns false, range stops the iteration.

err := c.Range(func(key string, value interface{}) bool {
		fmt.Printf("KEY: %s, VALUE: %v\n", key, value)
		return true
})

Close

Close cancels the underlying context and background goroutines stops running. It's a good idea that defer Close after getting a Cursor. By this way, you can ensure that there is no dangling goroutine after your distributed query execution is stopped.

c.Close()

Atomic Operations

Operations on key/value pairs are performed by the partition owner. In addition, atomic operations are guarded by a lock implementation which can be found under internal/locker. It means that Olric guaranties consistency of atomic operations, if there is no network partition. Basic flow for Incr:

  • Acquire the lock for the given key,
  • Call Get to retrieve the current value,
  • Calculate the new value,
  • Call Put to set the new value,
  • Release the lock.

It's important to know that if you call Put and GetPut concurrently on the same key, this will break the atomicity.

internal/locker package is provided by Docker.

Important note about consistency:

You should know that Olric is a PA/EC (see Consistency and Replication Model) product. So if your network is stable, all the operations on key/value pairs are performed by a single cluster member. It means that you can be sure about the consistency when the cluster is stable. It's important to know that computer networks fail occasionally, processes crash and random GC pauses may happen. Many factors can lead a network partitioning. If you cannot tolerate losing strong consistency under network partitioning, you need to use a different tool for atomic operations.

See Hazelcast and the Mythical PA/EC System and Jepsen Analysis on Hazelcast 3.8.3 for more insight on this topic.

Incr

Incr atomically increments key by delta. The return value is the new value after being incremented or an error.

nr, err := dm.Incr("atomic-key", 3)

The returned value is int.

Decr

Decr atomically decrements key by delta. The return value is the new value after being decremented or an error.

nr, err := dm.Decr("atomic-key", 1)

The returned value is int.

GetPut

GetPut atomically sets key to value and returns the old value stored at key.

value, err := dm.GetPut("atomic-key", someType{})

The returned value is an arbitrary type.

Pipelining

Olric Binary Protocol(OBP) supports pipelining. All protocol commands can be pushed to a remote Olric server through a pipeline in a single write call. A sample use looks like the following:

// Create an ordinary Olric client, not Olric node!
// ...
// Create a new pipe and call on it whatever you want.
pipe := client.NewPipeline()
for i := 0; i < 10; i++ {
    key := "key-" + strconv.Itoa(i)
    err := pipe.Put("mydmap", key, i)
    if err != nil {
        fmt.Println("returned an error: ", err)
    }
}

for i := 0; i < 10; i++ {
    key := "key-" + strconv.Itoa(i)
    err := pipe.Get("mydmap", key)
    if err != nil {
        fmt.Println("returned an error: ", err)
    }
}

// Flush messages to the server.
responses, err := pipe.Flush()
if err != nil {
    fmt.Println("returned an error: ", err)
}

// Read responses from the pipeline.
for _, resp := range responses {
    if resp.Operation() == "Get" {
        val, err := resp.Get()
        if err != nil {
            fmt.Println("returned an error: ", err)
        }
        fmt.Println("Get response: ", val)
    }
}

There is no hard-limit on message count in a pipeline. You should set a convenient KeepAlive for large pipelines. Otherwise, you can get a timeout error.

The Flush method returns errors along with success messages. Furthermore, you need to know the command order for matching responses with requests.

Distributed Topic

Distributed topic is an asynchronous messaging service that decouples services that produce events from services that process events. It has two delivery modes:

  • olric.UnorderedDelivery: Messages are delivered in random order. It's good to distribute independent events in a distributed system.
  • olric.OrderedDelivery: Messages are delivered in some order. Not implemented yet.

You should know that:

  • Communication between parties is one-to-many (fan-out).
  • All data is in-memory, and the published messages are not stored in the cluster.
  • Fire&Forget: message delivery is not guaranteed.

Create a DTopic instance:

dt, err := db.NewDTopic("my-topic", 0, olric.UnorderedDelivery)

Publish

Publish sends a message to the given topic. It accepts any serializable type as message.

err := dt.Publish("my-message")

AddListener

AddListener adds a new listener for the topic. Returns a listener ID or a non-nil error. The callback functions for this DTopic are run by parallel.

listenerID, err := dt.AddListener(func(msg DTopicMessage) {
    fmt.Println("Message:", msg)
})

You have to store listenerID to remove the listener.

RemoveListener

RemoveListener removes a listener with the given listenerID.

err := dt.RemoveListener(listenerID)

Destroy

Destroy a DTopic from the cluster. It stops background goroutines and releases underlying data structures.

err := dt.Destroy()

Golang Client

This repo contains the official Golang client for Olric. It implements Olric Binary Protocol(OBP). With this client, you can access to Olric clusters in your Golang programs. In order to create a client instance:

var clientConfig = &client.Config{
    Addrs:       []string{"localhost:3320"},
    DialTimeout: 10 * time.Second,
    KeepAlive:   10 * time.Second,
    MaxConn:     100,
}

client, err := client.New(clientConfig)
if err != nil {
    return err
}

dm := client.NewDMap("foobar")
err := dm.Put("key", "value")
// Handle this error

This implementation supports TCP connection pooling. So it recycles the opened TCP connections to avoid wasting resources. The requests are distributed among available TCP connections using an algorithm called round-robin. In order to see detailed list of configuration parameters, see Olric documentation on GoDoc.org.

The official Golang client has its dedicated documentation. Please take a look at this.

Configuration

You should feel free to ask any questions about configuration and integration. Please see Support section.

Embedded-Member Mode

Olric provides a function to generate default configuration to use in embedded-member mode:

import "github.com/buraksezer/olric/config"
...
c := config.New("local")

The New function takes a parameter called env. It denotes the network environment and consumed by hashicorp/memberlist. Default configuration is good enough for distributed caching scenario. In order to see all configuration parameters, please take a look at this.

See Sample Code section for an introduction.

Manage the configuration in YAML format

You can also import configuration from a YAML file by using the Load function:

c, err := config.Load(path/to/olric.yaml)

A sample configuration file in YAML format can be found here. This may be the most appropriate way to manage the Olric configuration.

Client-Server Mode

Olric provides olricd to implement client-server mode. olricd gets a YAML file for the configuration. The most basic functionality of olricd is that translating YAML configuration into Olric's configuration struct. A sample olricd.yaml file is being provided here.

Network Configuration

In an Olric instance, there are two different TCP servers. One for Olric, and the other one is for memberlist. BindAddr is very critical to deploy a healthy Olric node. There are different scenarios:

  • You can freely set a domain name or IP address as BindAddr for both Olric and memberlist. Olric will resolve and use it to bind.
  • You can freely set localhost, 127.0.0.1 or ::1 as BindAddr in development environment for both Olric and memberlist.
  • You can freely set 0.0.0.0 as BindAddr for both Olric and memberlist. Olric will pick an IP address, if there is any.
  • If you don't set BindAddr, hostname will be used, and it will be resolved to get a valid IP address.
  • You can set a network interface by using Config.Interface and Config.MemberlistInterface fields. Olric will find an appropriate IP address for the given interfaces, if there is any.
  • You can set both BindAddr and interface parameters. In this case Olric will ensure that BindAddr is available on the given interface.

You should know that Olric needs a single and stable IP address to function properly. If you don't know the IP address of the host at the deployment time, you can set BindAddr as 0.0.0.0. Olric will very likely to find an IP address for you.

Service Discovery

Olric provides a service discovery interface which can be used to implement plugins.

We currently have a bunch of service discovery plugins for automatic peer discovery on cloud environments:

In order to get more info about installation and configuration of the plugins, see their GitHub page.

Timeouts

Olric nodes supports setting KeepAlivePeriod on TCP sockets.

Server-side:

config.KeepAlivePeriod

KeepAlivePeriod denotes whether the operating system should send keep-alive messages on the connection.

Client-side:

config.DialTimeout

Timeout for TCP dial. The timeout includes name resolution, if required. When using TCP, and the host in the address parameter resolves to multiple IP addresses, the timeout is spread over each consecutive dial, such that each is given an appropriate fraction of the time to connect.

config.ReadTimeout

Timeout for socket reads. If reached, commands will fail with a timeout instead of blocking. Use value -1 for no timeout and 0 for default. The default is config.DefaultReadTimeout

config.WriteTimeout

Timeout for socket writes. If reached, commands will fail with a timeout instead of blocking. The default is config.DefaultWriteTimeout

config.KeepAlive

KeepAlive specifies the interval between keep-alive probes for an active network connection. If zero, keep-alive probes are sent with a default value (currently 15 seconds), if supported by the protocol and operating system. Network protocols or operating systems that do not support keep-alives ignore this field. If negative, keep-alive probes are disabled.

Architecture

Overview

Olric uses:

Olric distributes data among partitions. Every partition is being owned by a cluster member and may have one or more backups for redundancy. When you read or write a DMap entry, you transparently talk to the partition owner. Each request hits the most up-to-date version of a particular data entry in a stable cluster.

In order to find the partition which the key belongs to, Olric hashes the key and mod it with the number of partitions:

partID = MOD(hash result, partition count)

The partitions are being distributed among cluster members by using a consistent hashing algorithm. In order to get details, please see buraksezer/consistent.

When a new cluster is created, one of the instances is elected as the cluster coordinator. It manages the partition table:

  • When a node joins or leaves, it distributes the partitions and their backups among the members again,
  • Removes empty previous owners from the partition owners list,
  • Pushes the new partition table to all the members,
  • Pushes the partition table to the cluster periodically.

Members propagate their birthdate(POSIX time in nanoseconds) to the cluster. The coordinator is the oldest member in the cluster. If the coordinator leaves the cluster, the second oldest member gets elected as the coordinator.

Olric has a component called rebalancer which is responsible for keeping underlying data structures consistent:

  • Works on every node,
  • When a node joins or leaves, the cluster coordinator pushes the new partition table. Then, the rebalancer runs immediately and moves the partitions and backups to their new hosts,
  • Merges fragmented partitions.

Partitions have a concept called owners list. When a node joins or leaves the cluster, a new primary owner may be assigned by the coordinator. At any time, a partition may have one or more partition owners. If a partition has two or more owners, this is called fragmented partition. The last added owner is called primary owner. Write operation is only done by the primary owner. The previous owners are only used for read and delete.

When you read a key, the primary owner tries to find the key on itself, first. Then, queries the previous owners and backups, respectively. The delete operation works the same way.

The data(distributed map objects) in the fragmented partition is moved slowly to the primary owner by the rebalancer. Until the move is done, the data remains available on the previous owners. The DMap methods use this list to query data on the cluster.

Please note that, 'multiple partition owners' is an undesirable situation and the rebalancer component is designed to fix that in a short time.

Consistency and Replication Model

Olric is an AP product in the context of CAP theorem, which employs the combination of primary-copy and optimistic replication techniques. With optimistic replication, when the partition owner receives a write or delete operation for a key, applies it locally, and propagates it to the backup owners.

This technique enables Olric clusters to offer high throughput. However, due to temporary situations in the system, such as network failure, backup owners can miss some updates and diverge from the primary owner. If a partition owner crashes while there is an inconsistency between itself and the backups, strong consistency of the data can be lost.

Two types of backup replication are available: sync and async. Both types are still implementations of the optimistic replication model.

  • sync: Blocks until write/delete operation is applied by backup owners.
  • async: Just fire & forget.

Last-write-wins conflict resolution

Every time a piece of data is written to Olric, a timestamp is attached by the client. Then, when Olric has to deal with conflict data in the case of network partitioning, it simply chooses the data with the most recent timestamp. This called LWW conflict resolution policy.

PACELC Theorem

From Wikipedia:

In theoretical computer science, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).

In the context of PACELC theorem, Olric is a PA/EC product. It means that Olric is considered to be consistent data store if the network is stable. Because the key space is divided between partitions and every partition is controlled by its primary owner. All operations on DMaps are redirected to the partition owner.

In the case of network partitioning, Olric chooses availability over consistency. So that you can still access some parts of the cluster when the network is unreliable, but the cluster may return inconsistent results.

Olric implements read-repair and quorum based voting system to deal with inconsistencies in the DMaps.

Readings on PACELC theorem:

Read-Repair on DMaps

Read repair is a feature that allows for inconsistent data to be fixed at query time. Olric tracks every write operation with a timestamp value and assumes that the latest write operation is the valid one. When you want to access a key/value pair, the partition owner retrieves all available copies for that pair and compares the timestamp values. The latest one is the winner. If there is some outdated version of the requested pair, the primary owner propagates the latest version of the pair.

Read-repair is disabled by default for the sake of performance. If you have a use case that requires a more strict consistency control than a distributed caching scenario, you can enable read-repair via the configuration.

Quorum-based replica control

Olric implements Read/Write quorum to keep the data in a consistent state. When you start a write operation on the cluster and write quorum (W) is 2, the partition owner tries to write the given key/value pair on its own data storage and on the replica nodes. If the number of successful write operations is below W, the primary owner returns ErrWriteQuorum. The read flow is the same: if you have R=2 and the owner only access one of the replicas, it returns ErrReadQuorum.

Simple Split-Brain Protection

Olric implements a technique called majority quorum to manage split-brain conditions. If a network partitioning occurs, and some of the members lost the connection to rest of the cluster, they immediately stops functioning and return an error to incoming requests. This behaviour is controlled by MemberCountQuorum parameter. It's default 1.

When the network healed, the stopped nodes joins again the cluster and fragmented partitions is merged by their primary owners in accordance with LWW policy. Olric also implements an ownership report mechanism to fix inconsistencies in partition distribution after a partitioning event.

Eviction

Olric supports different policies to evict keys from distributed maps.

Expire with TTL

Olric implements TTL eviction policy. It shares the same algorithm with Redis:

Periodically Redis tests a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

Specifically this is what Redis does 10 times per second:

  • Test 20 random keys from the set of keys with an associated expire.
  • Delete all the keys found expired.
  • If more than 25% of keys were expired, start again from step 1.

This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%

When a client tries to access a key, Olric returns ErrKeyNotFound if the key is found to be timed out. A background task evicts keys with the algorithm described above.

Expire with MaxIdleDuration

Maximum time for each entry to stay idle in the DMap. It limits the lifetime of the entries relative to the time of the last read or write access performed on them. The entries whose idle period exceeds this limit are expired and evicted automatically. An entry is idle if no Get, Put, PutEx, Expire, PutIf, PutIfEx on it. Configuration of MaxIdleDuration feature varies by preferred deployment method.

Expire with LRU

Olric implements LRU eviction method on DMaps. Approximated LRU algorithm is borrowed from Redis. The Redis authors proposes the following algorithm:

It is important to understand that the eviction process works like this:

  • A client runs a new command, resulting in more data added.
  • Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts keys according to the policy.
  • A new command is executed, and so forth.

So we continuously cross the boundaries of the memory limit, by going over it, and then by evicting keys to return back under the limits.

If a command results in a lot of memory being used (like a big set intersection stored into a new key) for some time the memory limit can be surpassed by a noticeable amount.

Approximated LRU algorithm

Redis LRU algorithm is not an exact implementation. This means that Redis is not able to pick the best candidate for eviction, that is, the access that was accessed the most in the past. Instead it will try to run an approximation of the LRU algorithm, by sampling a small number of keys, and evicting the one that is the best (with the oldest access time) among the sampled keys.

Olric tracks access time for every DMap instance. Then it picks and sorts some configurable amount of keys to select keys for eviction. Every node runs this algorithm independently. The access log is moved along with the partition when a network partition is occured.

Configuration of eviction mechanisms

Here is a simple configuration block for olricd.yaml:

cache:
  numEvictionWorkers: 1
  maxIdleDuration: ""
  ttlDuration: "100s"
  maxKeys: 100000
  maxInuse: 1000000 # in bytes
  lRUSamples: 10
  evictionPolicy: "LRU" # NONE/LRU

You can also set cache configuration per DMap. Here is a simple configuration for a DMap named foobar:

dmaps:
  foobar:
    maxIdleDuration: "60s"
    ttlDuration: "300s"
    maxKeys: 500000 # in-bytes
    lRUSamples: 20
    evictionPolicy: "NONE" # NONE/LRU

If you prefer embedded-member deployment scenario, please take a look at config#CacheConfig and config#DMapCacheConfig for the configuration.

Lock Implementation

The DMap implementation is already thread-safe to meet your thread safety requirements. When you want to have more control on the concurrency, you can use LockWithTimeout and Lock methods. Olric borrows the locking algorithm from Redis. Redis authors propose the following algorithm:

The command is a simple way to implement a locking system with Redis.

A client can acquire the lock if the above command returns OK (or retry after some time if the command returns Nil), and remove the lock just using DEL.

The lock will be auto-released after the expire time is reached.

It is possible to make this system more robust modifying the unlock schema as follows:

Instead of setting a fixed string, set a non-guessable large random string, called token. Instead of releasing the lock with DEL, send a script that only removes the key if the value matches. This avoids that a client will try to release the lock after the expire time deleting the key created by another client that acquired the lock later.

Equivalent ofSETNX command in Olric is PutIf(key, value, IfNotFound). Lock and LockWithTimeout commands are properly implements the algorithm which is proposed above.

You should know that this implementation is subject to the clustering algorithm. So there is no guarantee about reliability in the case of network partitioning. I recommend the lock implementation to be used for efficiency purposes in general, instead of correctness.

Important note about consistency:

You should know that Olric is a PA/EC (see Consistency and Replication Model) product. So if your network is stable, all the operations on key/value pairs are performed by a single cluster member. It means that you can be sure about the consistency when the cluster is stable. It's important to know that computer networks fail occasionally, processes crash and random GC pauses may happen. Many factors can lead a network partitioning. If you cannot tolerate losing strong consistency under network partitioning, you need to use a different tool for locking.

See Hazelcast and the Mythical PA/EC System and Jepsen Analysis on Hazelcast 3.8.3 for more insight on this topic.

Storage Engine

Olric implements an append-only log file, indexed with a builtin map (uint64 => uint64). It creates new tables and evacuates existing data to the new ones if it needs to shrink or expand.

Sample Code

The following snipped can be run on your computer directly. It's a single-node setup, of course:

package main

import (
	"context"
	"fmt"
	"log"
	"reflect"
	"time"

	"github.com/buraksezer/olric"
	"github.com/buraksezer/olric/config"
)

func main() {
	// Deployment scenario: embedded-member
	// This creates a single-node Olric cluster. It's good enough for experimenting.

	// config.New returns a new config.Config with sane defaults. Available values for env:
	// local, lan, wan
	c := config.New("local")

	// Callback function. It's called when this node is ready to accept connections.
	ctx, cancel := context.WithCancel(context.Background())
	c.Started = func() {
		defer cancel()
		log.Println("[INFO] Olric is ready to accept connections")
	}

	db, err := olric.New(c)
	if err != nil {
		log.Fatalf("Failed to create Olric instance: %v", err)
	}

	go func() {
		// Call Start at background. It's a blocker call.
		err = db.Start()
		if err != nil {
			log.Fatalf("olric.Start returned an error: %v", err)
		}
	}()

	<-ctx.Done()

	dm, err := db.NewDMap("bucket-of-arbitrary-items")
	if err != nil {
		log.Fatalf("olric.NewDMap returned an error: %v", err)
	}

	// Magic starts here!
	fmt.Println("##")
	fmt.Println("Operations on a DMap instance:")
	err = dm.Put("string-key", "buraksezer")
	if err != nil {
		log.Fatalf("Failed to call Put: %v", err)
	}
	stringValue, err := dm.Get("string-key")
	if err != nil {
		log.Fatalf("Failed to call Get: %v", err)
	}
	fmt.Printf("Value for string-key: %v, reflect.TypeOf: %s\n", stringValue, reflect.TypeOf(stringValue))

	err = dm.Put("uint64-key", uint64(1988))
	if err != nil {
		log.Fatalf("Failed to call Put: %v", err)
	}
	uint64Value, err := dm.Get("uint64-key")
	if err != nil {
		log.Fatalf("Failed to call Get: %v", err)
	}
	fmt.Printf("Value for uint64-key: %v, reflect.TypeOf: %s\n", uint64Value, reflect.TypeOf(uint64Value))
	fmt.Println("##")

	// Don't forget the call Shutdown when you want to leave the cluster.
	ctx, _ = context.WithTimeout(context.Background(), 10*time.Second)
	err = db.Shutdown(ctx)
	if err != nil {
		log.Printf("Failed to shutdown Olric: %v", err)
	}
}

Contributions

Please don't hesitate to fork the project and send a pull request or just e-mail me to ask questions and share ideas.

License

The Apache License, Version 2.0 - see LICENSE for more details.

About the name

The inner voice of Turgut Özben who is the main character of Oğuz Atay's masterpiece -The Disconnected-.

Issues
  • Refactor, test and document eviction mechanisms

    Refactor, test and document eviction mechanisms

    We need to refactor, test and document the existed eviction mechanisms.

    Fix #7 and #8 firstly.

    enhancement quality assurance 
    opened by buraksezer 34
  • Olric embedded in a Kubernetes StatefulSet throws network errors, rejects writes and uses too much CPU

    Olric embedded in a Kubernetes StatefulSet throws network errors, rejects writes and uses too much CPU

    The intended use case for Olric in my application is to act as a replicated cache, that doesn't need transactions or disk persistence, but needs to keep values in case of node/datacenter failures and through upgrades.

    To achieve this I added olric embedded to my application which is deployed using a Kubernetes StatefulSet with 3 replicas.

    To discover the other replicas of my application for olric I use the olric-cloud-plugin, and after some guessing and trial and error I got it to discover the other peers and have olric seem to form a cluster.

    However with 3 replicas of my application, MinimumReplicaCount = 2 and ReplicationMode = AsyncReplicationMode olric seems to be unhappy:

    • writes often fail with network I/O timeouts
    • one member uses ~100% of a CPU all the time
    • memory use seems very high
    • not sure node cleanup works at all (relevant after application updates/node restarts,...)

    I still need to try this use case with olricd to see if I can reproduce the problem there.

    Now I know that my application (because it is still in it's early stages) has a dumb behavior: the data that needs to be cached comes from the kubernetes API server and all nodes receive the same data, and also all write the same values to the olric DMap. You documentation leads me to believe this should be fine, because the last write will win.

    I guess I'll have to write a sample application to demonstrate the problem... but I don't have access to the applications code right now, so I will add that later.

    bug question 
    opened by cobexer 15
  • Failed to get final advertise address

    Failed to get final advertise address

    Hi, @buraksezer . I got lots of failures when running test cases. FYI,

    dmap_backup_test.go:26: 
    Expected nil. Got: Failed to get final advertise address: 
    No private IP address found, and explicit IP not provided
    
    bug 
    opened by asdf2014 14
  • Large allocations when value size << table size, possible memory leak

    Large allocations when value size << table size, possible memory leak

    We're hitting out of memory errors with high memory usage for only ~200 keys: 26.64 GiB! The keys are fairly small, no more than 8k max, so this was pretty surprising. Further, the total sum of all allocated slabs returned by Stats() only shows a small fraction of the actually in-use memory.

    For one sample: 40 MiB slab total, 18 GiB container total with 134 keys on [email protected].

    Of course, some of this could be our app – so I dug in more and captured profiles. 😄

    Data collection

    Here's top heap:

    Type: inuse_space
    Time: Jun 29, 2021 at 12:16pm (CDT)
    Active filters:
       focus=github\.com/buraksezer/olric/internal/storage\.newTable
    Showing nodes accounting for 7199.49MB, 99.87% of 7208.76MB total
    ----------------------------------------------------------+-------------
          flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
    ----------------------------------------------------------+-------------
                                             7156.94MB 99.59% |   github.com/buraksezer/olric/internal/storage.(*Storage).Put /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:98 (inline)
                                               29.56MB  0.41% |   github.com/buraksezer/olric/internal/storage.New /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:52 (inline)
     7186.49MB 99.69% 99.69%  7186.49MB 99.69%                | github.com/buraksezer/olric/internal/storage.newTable /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/table.go:63
    ----------------------------------------------------------+-------------
                                                8.50MB   100% |   github.com/buraksezer/olric/internal/storage.(*Storage).Put /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:98 (inline)
        8.50MB  0.12% 99.81%     8.50MB  0.12%                | github.com/buraksezer/olric/internal/storage.newTable /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/table.go:51
    ----------------------------------------------------------+-------------
                                                4.50MB   100% |   github.com/buraksezer/olric/internal/storage.(*Storage).Put /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:98 (inline)
        4.50MB 0.062% 99.87%     4.50MB 0.062%                | github.com/buraksezer/olric/internal/storage.newTable /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/table.go:52
    ----------------------------------------------------------+-------------
    

    Which points to these lines: https://github.com/buraksezer/olric/blob/7e13bd0c669b83f2b1ef1716ec8564b2e0098977/internal/storage/storage.go#L98-L99

    I also ran a CPU profile for good measure, and found storage.Inuse() was taking up a large percentage of time compared to the typical HTTP traffic this app expected:

    Type: cpu
    Time: Jun 29, 2021 at 12:21pm (CDT)
    Duration: 30.11s, Total samples = 30.52s (101.37%)
    Showing nodes accounting for 30.52s, 100% of 30.52s total
    ----------------------------------------------------------+-------------
          flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
    ----------------------------------------------------------+-------------
                                                28.26s   100% |   github.com/buraksezer/olric/internal/storage.(*Storage).Put /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:98
        28.23s 92.50% 92.50%     28.26s 92.60%                | github.com/buraksezer/olric/internal/storage.(*Storage).Inuse /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:333
                                                 0.03s  0.11% |   runtime.asyncPreempt /usr/local/go/src/runtime/preempt_amd64.s:8
    ----------------------------------------------------------+-------------
                                                 1.40s   100% |   github.com/buraksezer/olric/internal/storage.(*Storage).Put /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:98
         1.40s  4.59% 97.08%      1.40s  4.59%                | github.com/buraksezer/olric/internal/storage.(*Storage).Inuse /go/pkg/mod/github.com/buraksezer/[email protected]/internal/storage/storage.go:332
    ----------------------------------------------------------+-------------
                                                 0.07s   100% |   runtime.netpoll /usr/local/go/src/runtime/netpoll_epoll.go:126
         0.07s  0.23% 97.31%      0.07s  0.23%                | runtime.epollwait /usr/local/go/src/runtime/sys_linux_amd64.s:725
    ----------------------------------------------------------+-------------
    

    52e1d700-d8d8-11eb-8700-28f25093f181

    Steps to reproduce

    I made a test and benchmark that cause this same issue (the heap profile shows the same lines affected).

    Each of them have several tuning knobs (constants) you can use to experiment, if you like. I found a high table size and small value byte size triggered the issue most easily.

    Click to expand for the test:

    In dmap_put_test.go:

    const mebibyte = 1 << 20
    
    func TestDMap_PutMany(t *testing.T) {
    	const (
    		keyCount  = 1000 // Number of keys to set in 1 round.
    		valueSize = 10   // Value size in bytes.
    		// Cumulative value memory = keyCount * valueSize
    		tableSize = 10 * mebibyte
    
    		sampleInterval = 5 * time.Second
    		statusInterval = 10 * time.Second
    	)
    
    	cfg := testSingleReplicaConfig()
    	cfg.TableSize = tableSize
    	db, err := newDB(cfg)
    	if err != nil {
    		t.Fatalf("Expected nil. Got: %v", err)
    	}
    	dm, err := db.NewDMap("mymap")
    	if err != nil {
    		t.Fatalf("Expected nil. Got: %v", err)
    	}
    
    	before := readMem()
    	t.Logf("Before Memory: %.2f MiB", toMiB(before.Alloc))
    	expectedMemDiff := toMiB(keyCount * valueSize)
    	t.Logf("Expected memory increase of %.2f MiB", expectedMemDiff)
    
    	ctx, cancel := context.WithCancel(context.Background())
    	t.Cleanup(cancel)
    	go runMemStatusTicker(ctx, t, sampleInterval, statusInterval)
    
    	value := make([]byte, valueSize)
    	for i := 0; i < keyCount; i++ {
    		err := dm.Put(bkey(i), value)
    		if err != nil {
    			panic(err)
    		}
    	}
    	runtime.GC()
    	after := readMem()
    	writePprofHeap(t)
    
    	t.Logf("After Memory: %.2f MiB", toMiB(after.Alloc))
    	actualMemDiff := toMiB(after.Alloc) - toMiB(before.Alloc)
    	t.Logf("Expected memory increase of %.2f MiB, got %.2f MiB. actual/expected ratio: %.2f", expectedMemDiff, actualMemDiff, actualMemDiff/expectedMemDiff)
    }
    
    func toMiB(i uint64) float64 {
    	return float64(i) / mebibyte
    }
    
    func readMem() runtime.MemStats {
    	var mem runtime.MemStats
    	runtime.ReadMemStats(&mem)
    	return mem
    }
    
    func writePprofHeap(t *testing.T) {
    	f, err := ioutil.TempFile("", "")
    	if err != nil {
    		t.Fatal(err)
    	}
    	runtime.GC()
    	if err := pprof.WriteHeapProfile(f); err != nil {
    		t.Fatal(err)
    	}
    	_ = f.Close()
    	t.Log("Pprof written to:", f.Name())
    }
    
    func max(values ...uint64) uint64 {
    	maxVal := values[0]
    	for _, val := range values[1:] {
    		if val > maxVal {
    			maxVal = val
    		}
    	}
    	return maxVal
    }
    
    func runMemStatusTicker(ctx context.Context, t *testing.T, sampleInterval, statusInterval time.Duration) {
    	sampleTick := time.NewTicker(sampleInterval)
    	defer sampleTick.Stop()
    	statusTick := time.NewTicker(statusInterval)
    	defer statusTick.Stop()
    
    	const maxSamples = 10
    	samples := make([]uint64, maxSamples)
    	i := 0
    	for {
    		select {
    		case <-ctx.Done():
    			return
    		case <-sampleTick.C:
    			mem := readMem()
    			samples[i%maxSamples] = mem.Alloc
    			i++
    		case <-statusTick.C:
    			t.Logf("Current memory: %.2f MiB", toMiB(max(samples...)))
    		}
    	}
    }
    

    And the benchmark:

    func BenchmarkDMap_PutMany(b *testing.B) {
    	b.ReportAllocs()
    	const (
    		valueSize = 10 // Value size in bytes.
    		tableSize = 10 * mebibyte
    
    		sampleInterval = 5 * time.Second
    		statusInterval = 10 * time.Second
    	)
    
    	cfg := testSingleReplicaConfig()
    	cfg.TableSize = tableSize
    	cfg.LogOutput = ioutil.Discard
    	db, err := newDB(cfg)
    	if err != nil {
    		b.Fatalf("Expected nil. Got: %v", err)
    	}
    	dm, err := db.NewDMap("mymap")
    	if err != nil {
    		b.Fatalf("Expected nil. Got: %v", err)
    	}
    
    	value := make([]byte, valueSize)
    	b.ResetTimer()
    	for i := 0; i < b.N; i++ {
    		err := dm.Put(bkey(i), value)
    		if err != nil {
    			panic(err)
    		}
    	}
    	b.StopTimer()
    	runtime.GC()
    	var mem runtime.MemStats
    	runtime.ReadMemStats(&mem)
    	b.ReportMetric(toMiB(mem.Alloc), "heap-MiB")
    }
    

    Which produced this output for me for a 1m bench time:

    go test . -run '^$' -bench BenchmarkDMap_PutMany -v -benchtime=1m
    
    goos: darwin
    goarch: amd64
    pkg: github.com/buraksezer/olric
    cpu: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    BenchmarkDMap_PutMany
    BenchmarkDMap_PutMany-8         22833872              4389 ns/op              3410 heap-MiB         1640 B/op         26 allocs/op
    PASS
    ok      github.com/buraksezer/olric     106.409s
    

    The 3 GiB heap is telling me there's effectively a leak in there. I imagine compaction could help take care of it, but some of the memory persists long-term.

    Theories

    It seems this is most easily triggered when the table size is much larger than the average value size.

    I've explored some possibilities so far:

    • Race condition when determining if a new table should be allocated
      • I attempted to find a data race on the affected lines but I can't seem to trigger it with go test -race. It might still be a race in the protocol somehow, which the race detector isn't catching, but I tend to trust the race detector.
    • Incorrect formula for determining the next table size. This one seems most likely to me, though I don't know enough about olric's table allocation to say with any certainty.

    Hope that helps. I'll try and keep this issue updated if I discover anything new.

    bug investigation release:v0.3.x release:v0.4.x 
    opened by JohnStarich 12
  • unknown field 'Addrs' in struct literal of type client.Config

    unknown field 'Addrs' in struct literal of type client.Config

    When I try to run olric client with the sample code, I meet this error: ./client_apply.go:14:3: unknown field 'Addrs' in struct literal of type client.Config ./client_apply.go:15:3: unknown field 'MaxConn' in struct literal of type client.Config

    I did not find any definition about Addrs in the Config, so how can I solve it?

    opened by qiankun11516 11
  • Production usage

    Production usage

    I like this project and would like to use it in a heavy duty production environment. How stable are the APIs and is it being used in production?

    question 
    opened by saifat29 9
  • Drop persistence support

    Drop persistence support

    Drop persistence support. It was too early to add. BadgerDB is not suitable to fulfil our requirements. We may implement an in-house persistence support in the future. Of course, after implementing a proper fsck component.

    enhancement 
    opened by buraksezer 9
  • Consul integration for peer discovery

    Consul integration for peer discovery

    Memberlist needs a peers list to operate. Now we need to manage that list manually. It's not ideal for a production environment. We need to integrate Consul for service discovery.

    See this https://github.com/buraksezer/olric/issues/20#issuecomment-594153082

    enhancement 
    opened by buraksezer 8
  • Keys starting with

    Keys starting with "e" can't be used

    First of all: Thanks for this amazing KV store! It is tremendously helpful in building the distributed routing logic for gloeth.

    Right now though, it does not seem to be possible to use a key starting with "e":

    [127.0.0.1:3320] » use testing
    use testing
    [127.0.0.1:3320] » put a b
    [127.0.0.1:3320] » get a
    b
    [127.0.0.1:3320] » put b a
    [127.0.0.1:3320] » get b
    a
    [127.0.0.1:3320] » put e a
    [127.0.0.1:3320] » get e
    Failed to call get e on testing: key not found
    [127.0.0.1:3320] » put e1234 a
    [127.0.0.1:3320] » get e1234
    Failed to call get e1234 on testing: key not found
    [127.0.0.1:3320] » put "efss" a
    [127.0.0.1:3320] » get "efss"
    Failed to call get "efss" on testing: key not found
    [127.0.0.1:3320] » get efss
    Failed to call get efss on testing: key not found
    [127.0.0.1:3320] » put esadfjaslfjalsfjlsdjflsjdfajksdjfla a
    [127.0.0.1:3320] » get esadfjaslfjalsfjlsdjflsjdfajksdjfla
    Failed to call get esadfjaslfjalsfjlsdjflsjdfajksdjfla on testing: key not found
    [127.0.0.1:3320] » 
    
    2020/03/26 20:56:08 [INFO] Routing table has been pushed by 0.0.0.0:3320 => routing.go:488
    2020/03/26 20:56:08 [INFO] Stale DMap (backup: false) has been deleted: testing on PartID: 9 => dmap_delete.go:34
    2020/03/26 20:56:08 [INFO] Stale DMap (backup: false) has been deleted: testing on PartID: 15 => dmap_delete.go:34
    2020/03/26 20:56:08 [INFO] Stale DMap (backup: false) has been deleted: testing on PartID: 30 => dmap_delete.go:34
    

    Thanks for any help!

    bug olric-cli 
    opened by pojntfx 5
  • Add NewDefaultConfig function for hassle-free onboarding new users

    Add NewDefaultConfig function for hassle-free onboarding new users

    Currently we doesn't have proper documentation to assist users while creating their own configuration. Configuring Olric in embedded-member deployment scenario can be very tricky. We need to add a new function to config package to return a config with sane defaults.

    Check this: https://github.com/buraksezer/olric/issues/9#issuecomment-588569127

    enhancement 
    opened by buraksezer 5
  • Backup to S3

    Backup to S3

    Hi, I haven't noticed anything about backups in the documentation. Somewhere I saw that it is not planned to dump to disk, I agree with that. But the product would have expanded the use case if there was a backup function, say on S3, and the ability to raise the last backup from the storage at the start of the cluster.

    question 
    opened by sash2222 2
  • feat: add lease method for lock

    feat: add lease method for lock

    #122

    enhancement dmap 
    opened by ShawnHsiung 2
  • suggestion for Lock

    suggestion for Lock

    type LockContext struct {
    	name  string
    	key   string
    	token []byte
    	dmap  *DMap
    }
    

    May we add a Lease method for LockContext to update the expiry for the given Lock? It's not convenient for users to implement by themselves because of no exported field in LockContext.

    enhancement 
    opened by ShawnHsiung 1
  • olric in k8s is always in the initial member list

    olric in k8s is always in the initial member list

    env: ubuntu 18.04 k8s describe: When I update the version, there will be a problem of synchronizing nodes all the time.Then it will be killed over time(20minutes). Never wait for this sentence Join completed

    image image

    this is my config

    olricd:
      bindPort: 3320
      serializer: gob
      keepAlivePeriod: 300s
      requestTimeout: 5s
      partitionCount:  271
      replicaCount: 1
      writeQuorum: 1
      readQuorum: 1
      readRepair: false
      bootstrapTimeout: 1000s
      replicationMode: 0
      memberCountQuorum: 1
    
    storageEngines:
      config:
        kvstore:
          tableSize: 4194304
    
    logging:
      verbosity: 6
      level: DEBUG
      output: stderr
    
    memberlist:
      environment: "lan"
      bindPort: 3322
      enableCompression: false
      joinRetryInterval: "500ms"
      maxJoinAttempts: 100
    
    client:
      dialTimeout: -1s
      readTimeout: 2s
      writeTimeout: 2s
      keepAlive: 15s
      minConn: 1
      maxConn: 100
      poolTimeout: 2s
    
    dmaps:
      img-bucket:
        maxIdleDuration: "3600s"
        ttlDuration: "14400s"
        maxKeys: 15000
        maxInuse: 3000000000 # in bytes ;3GB
        lRUSamples: 10
        evictionPolicy: "LRU" # NONE/LRU
    
      resp-bucket:
        maxIdleDuration: "36000s"
        ttlDuration: "36000s"
        maxKeys: 150000
        maxInuse: 300000000 # in bytes ;300MB
        lRUSamples: 10
        evictionPolicy: "LRU" # NONE/LRU
    
    serviceDiscovery:
      provider: "k8s"
      path: "/usr/lib/olric-cloud-plugin.so"
      args: 'label_selector="app = img-proxy"'
    
    question 
    opened by hacktmz 5
  • Config examples

    Config examples

    Hello! I've been looking to replace caching using replicated Redis in my golang app with an integrated solution like Olric. I would like to remove the Redis requirement but the configuration options of Olric are a bit overwhelming..

    Could you provide an example or advice on how to implement the following:

    • Multiple app instances with embedded Olric that all have the same cached items (JWT with TTL)
    • One of these instances is the master (that writes tokens to cache)
    • The other instances are slaves that replicate the tokens from the master
    • Network connections between instances running Olric should be secure
    • Some way to export/import all cached JWT to/from file on the master instance in case the app needs restarting/upgrading

    The last item; I do not know if Olric can do this out of the box? If not perhaps tokens written to cache could also be written to a persistent database store but this would add unwanted overhead for a situation that might only occur ever so often (app restart/upgrade) Perhaps some advice on a shutdown routine that reads all items & corresponding TTL and write that to file?

    Any help much appreciated.. Thanks in advance!

    opened by sebstyle 3
  • Seeking clarity for persisting cache to disk

    Seeking clarity for persisting cache to disk

    I have a similar use-case like a past user discussed with you at https://github.com/buraksezer/olric/issues/9 . I have read over the README, some issues, and olricd.yaml config examples.

    I've been informed that olric primarily provides a KV in-memory store (that I'll refer to as cache), but also capable of overflowing/persisting to disk (where a smaller in-memory pool caches the most active queries for latency, but can leverage larger available disk space to extend the cache). Is this supported?

    Olric implements an append-only log file, indexed with a builtin map (uint64 => uint64). It creates new tables and evacuates existing data to the new ones if it needs to shrink or expand. - README - Storage Engine

    Not entirely sure what is being conveyed here. Is this about utilizing disk storage at all?

    There's no information on where files would be written so that persistence via a docker volume could be added.

    I think, you could use Souin with the EmbeddedOlric or Olric provider. This way, it will store in memory and save on disk when it reaches the RAM limit. This way it will support the LRU based storage. - Advice from author of Souin (caching service that leverages olric)

    Is this correct information? Will it also keep a copy of the in-memory contents on disk and persist with container restarts to allow filling the in-memory store/cache when a key exists on disk?


    Use case - Caching processed responses from an image service and persisting to disk

    This is new territory for me personally. We have a small site with about 20TB monthly traffic, heavy image content from user uploads.

    To reduce storage costs we're adopting an API that receives a web request for an image asset and will process the original image to a transformed variant (differing image format, resolution, quality/filesize, etc).

    Reducing impact of resource load with that approach benefits from caching the request responses of course :grinning: This is where Souin (and thus Olric) is meant to help. Presently we've only been scaling vertically on a single node (although it's great that scaling horizontally is an option!), I just need to know if I can leverage disk storage in addition to a smaller memory cache or if only in-memory is supported by olric and I am required to persist to disk elsewhere?

    One benefit for persisting to disk, not just extending the memory cache is that the most frequent requests cached will survive service restarts/failures (we use docker-compose for now).


    Additional Questions

    Few questions about configuring / understanding allocation size limits that the README wasn't clear on for me (in the eviction section, it also has a comment for maxKeys being in bytes for some reason?) and I tried to understand better reading issue threads:

    From https://github.com/buraksezer/olric/issues/9 , you mentioned that when more than 1 node is involved, partitions were distributed evenly for the maxInuse memory limit, then later mention you had implemented a new algorithm for it. I wasn't sure if you addressed that users concern about nodes having differing RAM available to allocate (eg 2GB on one node, 4GB on another), where the partition/allocation could adapt instead of be evenly divided? (node A using 2GB, node B being able to use more than 2GB)

    In https://github.com/buraksezer/olric/issues/106#issuecomment-876739121 I also see tableSize clarified as not only the 1MiB default size and default 271 partitions (where you advise using a prime number if modifying), you mention that this is the size per partition (portion of a DMap?), so the actual total size to keep in mind was 271 MiB? (at least when configuring for a single node)

    You then later mention in #106 inserting 1M keys with values of 10 bytes using 350Mb of memory. Is that 350 Megabits (43.75 MB), or MegaBytes (upper case M), 1M * 10B == 10MB, so I assume it was around 40MB with other overheads?

    question 
    opened by polarathene 3
  • Single member stats for existing metrics collector

    Single member stats for existing metrics collector

    Is there a way to programmatically identify the current embedded server's Stats? Or more simply, collect only the current member's stats?

    I'm attempting to connect Prometheus metrics to Olric so it only reports on the local instance – but not all members. Since Prometheus is typically set up to scrape all of my Kubernetes pods, I don't want the extra noise of each pod reporting the state of the world.

    Perhaps we can expose only the current member's stats in a new method on the server instance? Alternatively, could the current member's auto-generated ID or name be exposed somewhere to filter the results of Stats()?

    Thanks for such a great project, by the way! My team and I look forward to using this in the coming days. 😄

    Related https://github.com/buraksezer/olric/issues/32, https://github.com/buraksezer/olric/issues/57

    enhancement question 
    opened by JohnStarich 11
  • Limit max client connections

    Limit max client connections

    PR for handling max client connections

    opened by yashschandra 5
  • Provide stats in Prometheus compatible exposition format to make integration into Kubernetes environments simpler

    Provide stats in Prometheus compatible exposition format to make integration into Kubernetes environments simpler

    While there is an effort underway to standardize on https://openmetrics.io/ the current industry standard is still Prometheus metrics.

    Having olric metrics available in Promethus compatible text format would allow very simple integration into a Kubernetes based service.

    enhancement 
    opened by cobexer 1
  • Provide an API tho introspect the cluster state

    Provide an API tho introspect the cluster state

    In order to communicate liveness and readiness to the Kubernetes cluster, I would like to have some visibility into the olric cluster health.

    For example: If the current node can't communicate with the requested number of nodes to satisfy WriteQuorum I want to mark the current node as not ready in terms of Kubernetes so it stops receiving traffic from clients, or even as not live to get the pod restarted.

    opened by cobexer 0
Releases(v0.4.0)
  • v0.4.0(Aug 17, 2021)

    What is Olric?

    Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

    With Olric, you can instantly create a fast, scalable, shared pool of RAM across a cluster of computers.

    Install

    Learn how to install Olric and see Sample Code to get started!

    Support

    Join our Discord server!

    Changes

    This release includes the following fixes and improvements:

    • Validate the configuration before running an Olric node #68
    • Design an interface for different storage engine implementations #46
    • Move data structure implementations to their own packages #70
    • Add configuration directives for dead member tracker #101
    • fatal error: concurrent map writes #105
    • Potential race condition in fragment creation #99
    • Read repair feature doesn't work properly #97
    • Consistency only works with two members... #92
    • Collect more metrics about the current state of a node #87
    • Data race in various tests #85
    • panic when set replica count and async replica #93
    • unknown field 'Addrs' in struct literal of type client.Config #110
    • Custom logger doesn't work due to usage of c.Logger.SetOutput #117
    • Single member stats for existing metrics collector #82
    Source code(tar.gz)
    Source code(zip)
  • v0.3.12(Aug 15, 2021)

    This release includes the following fixes and improvements:

    • Custom logger doesn't work due to usage of c.Logger.SetOutput #117
    • Expose DMap.name #116
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0-rc.1(Jul 31, 2021)

  • v0.3.11(Jul 19, 2021)

  • v0.4.0-beta.10(Jul 16, 2021)

    This release includes the following fixes and improvements:

    • Add --keep-going to olric-benchmark
    • Create local DMap fragment to run a query on the cluster
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0-beta.9(Jul 11, 2021)

    This release includes the following fixes and improvements:

    • Remove stale tables more effectively. See #106 for details.
    • Initialize DMap every time to run the operation on the cluster.
    • Add RoutingTablePushInterval config directive to control push interval programatically.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.9(Jul 11, 2021)

  • v0.4.0-beta.8(Jun 27, 2021)

    This release includes the following fixes and improvements:

    • Remove dead member tracker. Service discovery plugins will take over its responsibility.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0-beta.7(Jun 24, 2021)

  • v0.4.0-beta.6(Jun 20, 2021)

  • v0.4.0-beta.5(Jun 16, 2021)

    This release includes the following fixes and improvements:

    • Consistency only works with two members... #92 Commit: 549d87c7c3456df1ec9e35c97970c60a1121783f
    Source code(tar.gz)
    Source code(zip)
  • v0.3.8(Jun 16, 2021)

  • v0.4.0-beta.4(Jun 13, 2021)

  • v0.4.0-beta.3(Jun 8, 2021)

    This beta includes the following fixes and improvements:

    • Single member stats for existing metrics collector #82
    • Collect more metrics about the current state of a node #87
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0-beta.2(May 1, 2021)

    This beta includes the following fixes and improvements:

    • Move data structure implementations to their own packages #70

    Bug fixes from v0.3.x tree:

    • Fix 64-bit alignment of 64-bit words accessed atomically on ARM #81
    • olric-load output is useless #80
    • Single member stats for existing metrics collector #82
    • List members of cluster #77
    Source code(tar.gz)
    Source code(zip)
  • v0.3.7(Apr 25, 2021)

  • v0.3.6(Feb 21, 2021)

  • v0.3.5(Feb 17, 2021)

  • v0.3.4(Jan 12, 2021)

  • v0.4.0-beta.1(Dec 16, 2020)

  • v0.3.3(Dec 11, 2020)

  • v0.3.2(Dec 9, 2020)

    This release includes the following fixes and improvements:

    • Add get-entry support to olric-cli #71
    • Go client cannot form Put message properly #72
    • Integration tests of olric-cli package fails due to wrong configuration #73
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Dec 5, 2020)

  • v0.3.0(Nov 26, 2020)

    What is Olric?

    Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

    With Olric, you can instantly create a fast, scalable, shared pool of RAM across a cluster of computers.

    This version includes many new features and bug fixes:

    • Initial implementation of DTopic data structure,
    • Olric Binary Protocol reimplemented. Now it's very easy to define new message types.
    • Improved Kubernetes integration: https://github.com/buraksezer/olric-kubernetes

    See the milestone for more details.

    Thanks to @dunglas and the Caddy community for their support.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0-rc.2(Nov 19, 2020)

  • v0.3.0-rc.1(Nov 10, 2020)

    This release candidate includes the following fixes and improvements:

    • Fix wrong unmarshal method in getPrevOperation https://github.com/buraksezer/olric/commit/195a65e6be16525d0214b8f48dd49f508826f3dc
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0-beta.7(Oct 20, 2020)

    This beta includes the following fixes and improvements:

    • Properly handle memberlist.NodeUpdate events #54
    • Add GetEntry method to DMap API. It exposes key/value pair with the metadata #63
    • Compile and run on Windows #64
    • Use GitHub Actions to also run the tests on Windows and Mac #65
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0-beta.6(Oct 12, 2020)

    What is Olric?

    Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

    With Olric, you can instantly create a fast, scalable, shared pool of RAM across a cluster of computers.

    Try with Docker:

    docker pull olricio/olricd:v0.3.0-beta.6
    

    v0.3.x tree includes many new features:

    • Initial implementation of DTopic data structure,
    • Olric Binary Protocol reimplemented. Now it's very easy to define new message types.
    • Improved Kubernetes integration: https://github.com/buraksezer/olric-kubernetes

    Sample usage:

    This version includes the following fixes and improvements:

    • Add config.Load to load configuration from YAML files in embedded-member mode #62
    • Scoping problem in read-repair code #61
    • Improves redirection logic #54
    • Seamlessly scale up or down on Kubernetes
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0-beta.5(Oct 8, 2020)

  • v0.3.0-beta.4(Sep 30, 2020)

    This beta includes the following fixes and improvements:

    • Prevent data race in replication functions: asyncPutOnCluster and syncPutOnCluster,
    • Prevent redirect loops and warn the user,
    • More integration tests.
    Source code(tar.gz)
    Source code(zip)
Owner
Burak Sezer
Distributed systems and DevOps things
Burak Sezer
Simple, ordered, key-value persistence library for the Go Language

gkvlite gkvlite is a simple, ordered, ACID, key-value persistence library for Go. Overview gkvlite is a library that provides a simple key-value persi

Steve Yen 245 Aug 20, 2021
Key-value database stored in memory with option of persistence

Easy and intuitive command line tool allows you to spin up a database avaliable from web or locally in a few seconds. Server can be run over a custom TCP protocol or over HTTP.

Mario Petričko 7 Jul 11, 2021
High-performance, columnar, in-memory store with bitmap indexing in Go

This package contains a high-performance, columnar, in-memory storage engine that supports fast querying, update and iteration with zero-allocations and bitmap indexing.

Roman Atachiants 817 Oct 17, 2021
Distributed, fault-tolerant key-value storage written in go.

A simple, distributed, fault-tolerant key-value storage inspired by Redis. It uses Raft protocotol as consensus algorithm. It supports the following data structures: String, Bitmap, Map, List.

Igor German 354 Sep 13, 2021
Fault tolerant, sharded key value storage written in GoLang

Ravel is a sharded, fault-tolerant key-value store built using BadgerDB and hashicorp/raft. You can shard your data across multiple clusters with mult

Aditya Meharia 71 Oct 8, 2021
A disk-backed key-value store.

What is diskv? Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storin

Peter Bourgon 1.1k Oct 21, 2021
A key-value db api with multiple storage engines and key generation

Jet is a deadly-simple key-value api. The main goals of this project are : Making a simple KV tool for our other projects. Learn tests writing and git

null 12 May 9, 2021
yakv is a simple, in-memory, concurrency-safe key-value store for hobbyists.

yakv (yak-v. (originally intended to be "yet-another-key-value store")) is a simple, in-memory, concurrency-safe key-value store for hobbyists. yakv provides persistence by appending transactions to a transaction log and restoring data from the transaction log on startup.

Aadhav Vignesh 5 Oct 11, 2021
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 37.6k Oct 23, 2021
decentralized kv store based on pubsub in libp2p

pointers decentralized kv store based on pubsub in libp2p Protocol Specification Over View The pointer would receive updates from pubsub network and b

Rorical 4 Jul 11, 2021
a persistent real-time key-value store, with the same redis protocol with powerful features

a fast NoSQL DB, that uses the same RESP protocol and capable to store terabytes of data, also it integrates with your mobile/web apps to add real-time features, soon you can use it as a document store cause it should become a multi-model db. Redix is used in production, you can use it in your apps with no worries.

Mohammed Al Ashaal 954 Oct 24, 2021
Pogreb is an embedded key-value store for read-heavy workloads written in Go.

Embedded key-value store for read-heavy workloads written in Go

Artem Krylysov 822 Oct 19, 2021
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.

GhostDB is designed to speed up dynamic database or API driven websites by storing data in RAM in order to reduce the number of times an external data source such as a database or API must be read. GhostDB provides a very large hash table that is distributed across multiple machines and stores large numbers of key-value pairs within the hash table.

Jake Grogan 707 Oct 6, 2021
moss - a simple, fast, ordered, persistable, key-val storage library for golang

moss provides a simple, fast, persistable, ordered key-val collection implementation as a 100% golang library.

null 830 Oct 25, 2021
Multithreaded key value pair store using thread safe locking mechanism allowing concurrent reads

Project Amnesia A Multi-threaded key-value pair store using thread safe locking mechanism allowing concurrent reads. Curious to Try it out?? Check out

Nikhil Nayak 5 Oct 7, 2021
RocksDB/LevelDB inspired key-value database in Go

Pebble Nightly benchmarks Pebble is a LevelDB/RocksDB inspired key-value store focused on performance and internal usage by CockroachDB. Pebble inheri

CockroachDB 2.3k Oct 14, 2021
ShockV is a simple key-value store with RESTful API

ShockV is a simple key-value store based on badgerDB with RESTful API. It's best suited for experimental project which you need a lightweight data store.

delihiros 2 Sep 26, 2021
LevelDB key/value database in Go.

This is an implementation of the LevelDB key/value database in the Go programming language. Installation go get github.com/syndtr/goleveldb/leveldb R

Suryandaru Triandana 4.6k Oct 15, 2021
Turn any key/value index into a high-performance two-dimensional spatial index

modular-spatial-index For the demo that this animated gif was generated from

SequentialRead.com 4 Jul 16, 2021