A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Overview

Vasto

Build Status GoDoc Go Report Card codecov

A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption.

Vasto scales embedded RocksDB into a distributed key-value store, adding sharding, replication, and support operations to

  1. create a new keyspace
  2. delele an existing keyspace
  3. grow a keyspace
  4. shrink a keyspace
  5. replace a node in a keyspace

Why

A key-value store is often re-invented. Why there is another one?

Vasto enables developers to setup a distributed key-value store as simple as creating a map object.

The operations, such as creating/deleting the store, partitioning, replications, seamlessly adding/removing servers, etc, are managed by a few commands. Client connection configurations are managed automatically.

In a sense, Vasto is an in-house cloud providing distributed key-value stores as a service, minus the need to balance performance and cloud service costs, plus consistent and low latency.

Architecture

There are one Vasto master and N number of Vasto stores, plus Vasto clients or Vasto proxies/gateways.

  1. The Vasto stores are basically simple wrapper of RocksDB.
  2. The Vasto master manages all the Vasto stores and Vasto clients.
  3. Vasto clients rely on the master to connect to the right Vasto stores.
  4. Vasto gateways use Vasto client libraries to support different APIs.

The master is the brain of the system. Vasto does not use gossip protocols, or other consensus algorithms. Instead, Vasto uses a single master for simple setup, fast failure detection, fast topology changes, and precise coordinations. The master only contains soft states and is only required when topology changes. So even if it ever crashes, a simple restart will recover everything.

The Vasto stores simply pass get/put/delete/scan requests to RocksDB. One Vasto store can host multiple db instances.

Go applications can use the client library directly.

Applications in other languages can talk to the Vasto gateway, which uses the client library and reverse proxy the requests to the Vasto stores. The number of Vasto gateways are unlimited. They can be installed on any application machines to reduce one network hop. Or can be on its dedicated machine to reduce number of connections to the Vasto stores if both the number of stores and the number of clients are very high.

Life cycle

One Vasto cluster has one master and multiple Vasto stores. When the store joins the cluster, it is just empty.

When the master receives a request to create a keyspace with x shards and replication factor = y, the master would

  1. find x stores that meet the requirement and assign it to the keyspace
  2. ask the stores to create the shards, including replicas.
  3. inform the clients of the store locations

When the master receives a request to resize the keyspace from m shards to n shards, the master would

  1. if size increased, find n-m stores that meet the requirement and assign it to the keyspace
  2. ask the stores to create the shards, including replicas.
  3. prepare the data to the new stores
  4. direct the clients traffic to the new stores
  5. remove retiring shards

Hashing algorithm

Vasto used Jumping Consistent Hash to allocate data. This algorithm

  1. requires no storage. The master only need soft state to manage all store servers. It is OK to restart master.
  2. evenly distribute the data into buckets.
  3. when the number of bucket changes, it can also evenly dividing the workload.

With this jumping hash, the cluster resizing is rather simple, flexible, and efficient:

  1. Cluster can resize up or down freely.
  2. Resizing is well coordinated.
  3. Data can be moved via the most efficient SSTable writes.
  4. Clients aware of the cluster change and can redirect traffic only when the new whole new server are ready.

Eventual Consistency and Active-Active Replication

All Vasto stores can be used to read and write. The changes will be propagated to other replicas within a few milliseconds. Only the primary replica participate in the normal operations. The replica are participating when the primary replica is down, or in a different data center.

Vasto assumes the data already has the event time. It should be the time when the event really happens, not the time when the data is feed into Vasto system. If the system fails over to the replica partition, and there are multiple changes to one key, the one with latest event times will win.

Client APIs

See https://godoc.org/github.com/chrislusf/vasto/goclient/vs

Example

    // create a vasto client talking to master at localhost:8278
    vc := vs.NewVastoClient(context.Background(), "client_name", "localhost:8278")
    
    // create a cluster for keyspace ks1, with one server, and one copy of data.
    vc.CreateCluster("ks1", 1, 1)
    
    // get a cluster client for ks1
    cc := vc.NewClusterClient("ks1")

    // operate with the cluster client
    var key, value []byte
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

    // change cluster size to 3 servers
    vc.ResizeCluster("ks1", 3)

    // operate with the existing cluster client
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

Currently only basic go library is provided. The gateway is not ready yet.

Comments
  • some wiki guard mistake,benchmark result. possibilities of other KV engine replacement?

    some wiki guard mistake,benchmark result. possibilities of other KV engine replacement?

    // start one vasto shell vasto shell create cluster ks1 dc1 3 1 use keyspace ks1 desc ks1 dc1

    delete cluster ks1 dc1

    the last line should be delete

    I get my bechmark on a i7 7550/16G laptop time ./vasto bench --keyspace=ks1 -n 1024000 -c 16 benchmarking on cluster with master localhost:8278 put : 35s [===================================================================>] 100% 29145.12 ops/sec put,get : 29058.5 op/s Microseconds per op: Count: 1024000 Average: 527.6445 StdDev: 371.17 Min: 0.0000 Median: 464.7338 Max: 26443.0000 put : 35s [====================================================================] 100% 29142.38 ops/sec get : 31s [===================================================================>] 100% 32221.78 ops/sec put,get : 32110.0 op/s Microseconds per op: Count: 1024000 Average: 470.7078 StdDev: 201.14 Min: 0.0000 Median: 464.2912 Max: 15578.0000 put : 35s [====================================================================] 100% 29142.38 ops/sec get : 31s [====================================================================] 100% 32218.41 ops/sec real 1m7.219s user 0m43.063s sys 1m17.438s

    By the way,is it possible to replace the embed rocksdb with other KV engine? For most of other KY engines,there is no equivalent Merge operation as Roscksdb‘s. Only some simple put/get/delete/iterate/batch write operations

    opened by xvman 5
  • Vasto primary/replica failure restore problem

    Vasto primary/replica failure restore problem

    I tried to test vasto primary/replica failure restore latency with create cluster ks1 dc1 2 2. However,the created primary/replica store is both on the same store/process even I had added more than one store to the cluster,both primary/replica shard will shutdown in machine failure situation. Is it intentional setting? In that case,how can I test the primary/replica failure restore latency?

    I manage to read the code,it seems that vasto client is only send its rocksDB requests to primary shard by default. In case of primary hitch,vasto will update primary/replica using the PeriodicTask to synchronize the new rocksDB requests and replica router information to the client. So the minimum failure restore latency will be time for primary/replica sync plus the PeriodicTask gap? Could you explain the primary/replica sync mechanism if I miss some thing?

    opened by xvman 4
  • Hash sharding scale strategy question

    Hash sharding scale strategy question

    Using consistent hash for sharding will take more time to migrate between shards,and range key queries would be hard to implement. Would you consider key range sharding for another sharding strategy?

    opened by xvman 3
  • go get vasto&chrislusf/gorocksdb fail

    go get vasto&chrislusf/gorocksdb fail

    OS:ubuntu ,go version 1.10.4 go get github.com/chrislusf/vasto

    github.com/chrislusf/gorocksdb

    gopath/src/github.com/chrislusf/gorocksdb/options.go:1045:2: could not determine kind of name for C.rocksdb_options_set_allow_ingest_behind

    I have successful go get&build tecbot/gorocksdb, however I fail to go get chrislusf/gorocksdb and chrislusf/vasto

    opened by xvman 3
  • compile error~~how can I compile and benchmark vasto?

    compile error~~how can I compile and benchmark vasto?

    github.com/chrislusf/gorocksdb

    exec: "gcc": executable file not found in %PATH%

    github.com/chrislusf/vasto/cmd/master

    cmd\master\master_server.go:56: cannot use ms (type *masterServer) as type pb.VastoMasterServer in argument to pb.RegisterVastoMasterServer: *masterServer does not implement pb.VastoMasterServer (wrong type for CompactCluster method) have CompactCluster("context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error) want CompactCluster("golang.org/x/net/context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error)

    opened by xvman 3
  • vasto vs cassandra

    vasto vs cassandra

    @chrislusf I would be obliged if you could compare between vasto and cassandra -- why Vasto will be needed when we can run Cassandra with RocksDB (as engine)

    opened by YumcoderCom 1
  • Crashed with nil pointer when putting a key to a cluster of 2 replicas and one replica is down. (not reproducible)

    Crashed with nil pointer when putting a key to a cluster of 2 replicas and one replica is down. (not reproducible)

    I0129 14:47:31.514926   23375 master_event_processor.go:56] [master] + client  from 127.0.0.1:62432 keyspace(clst)
    E0129 14:50:26.104690   23375 retry.go:22] shard clst.1.0 normal follow 0.0 failed: pull changes: rpc error: code = Unavailable desc = transport is closing
    E0129 14:50:26.105056   23375 retry.go:19] shard clst.1.1 normal follow 0.1 still failed: pull changes: rpc error: code = Unavailable desc = transport is closing
    I0129 14:50:26.105249   23375 master_event_processor.go:61] [master] - client [[email protected]:8379] from 127.0.0.1:62477 keyspace(clst)
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x44642af]
    
    goroutine 218 [running]:
    github.com/chrislusf/vasto/cmd/store.(*storeServer).processPut(0xc000169500, 0xc00020ca00, 0xc000454820, 0x0)
    	/vasto/cmd/store/process_put.go:31 +0x15f
    github.com/chrislusf/vasto/cmd/store.(*storeServer).processRequest(0xc000169500, 0xc000328208, 0x4, 0xc0002f4f60, 0xc0002f4f30)
    	/vasto/cmd/store/store_tcp_server.go:153 +0x2ab
    github.com/chrislusf/vasto/cmd/store.(*storeServer).handleInputOutput(0xc000169500, 0xc0000ee440, 0x1e, 0x1e, 0x1e, 0x0, 0x0, 0x6c1e838, 0x0)
    	/vasto/cmd/store/store_tcp_server.go:91 +0x121
    github.com/chrislusf/vasto/cmd/store.(*storeServer).handleRequest(0xc000169500, 0x46a2440, 0xc000191980, 0x6c1e838, 0xc000196020, 0x0, 0x0)
    	/vasto/cmd/store/store_tcp_server.go:71 +0x117
    github.com/chrislusf/vasto/cmd/store.(*storeServer).handleConnection(0xc000169500, 0x46abbc0, 0xc000196020)
    	/vasto/cmd/store/store_tcp_server.go:47 +0xea
    github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp.func1(0x46abbc0, 0xc000196020, 0xc000304050, 0xc000169500)
    	/vasto/cmd/store/store_tcp_server.go:36 +0xf7
    created by github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp
    	/vasto/cmd/store/store_tcp_server.go:27 +0x14b
    
    opened by lming 1
  • vasto bench fail

    vasto bench fail

    I tried to use the vasto bench to test it. first open one terminal for vasto master or server,then open the other for vasto bench,but vasto bench always showed as below and won't make any progress again:

    benchmarking on cluster with master localhost:8278 put : --- [--------------------------------------------------------------------] 0% NaN ops/sec

    The vasto master or server side just printed out the message such as: I1013 10:05:21.397870 8502 master_server.go:41] Vasto master starts on :8278 I1013 10:05:30.134602 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54859 E1013 10:05:30.135152 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54859 I1013 10:05:30.135119 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54861 E1013 10:05:30.152217 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54861 I1013 10:06:31.644583 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54859 I1013 10:06:31.645002 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54861 I1013 10:06:39.134168 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54864 E1013 10:06:39.134886 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54864 I1013 10:06:53.839102 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54864

    I tried to use vasto bench -c1 parameter, but it did not change anything for the bench result. Is there any settings I miss? In my knowledge, vasto server include both vasto store and master, so it should be fine to open a vasto server for vasto bench to test. I just want to make a benchmark test about the vasto performance,could you provide detail instructions to benchmark vasto ?

    opened by xvman 1
  • vasto start server or store fail

    vasto start server or store fail

    go.test.sh run successful now. when start vasto sever or store,I encounter below errors:

    E1013 00:06:29.768819 8411 store_in_cluster.go:30] read file /tmp/go-build169079065/cluster.config: open /tmp/go-build169079065/cluster.config: no such file or directory

    does it need certrain cluster.config files?

    opened by xvman 1
  • ./go.test.sh error

    ./go.test.sh error

    OS:ubuntu; gcc version 7.3.0; go 1.10.4; and rocksdb master branch with shared lib: librocksdb.so.5.17 after export LD_LIBRARY_PATH=/usr/local/lib, the go.test.sh is running now, but still fail for the below message:

    ok github.com/chrislusf/vasto/storage/binlog 0.097s coverage: 82.4% of statements ok github.com/chrislusf/vasto/storage/codec 0.027s coverage: 94.4% of statements free(): invalid pointer FAIL github.com/chrislusf/vasto/storage/rocks 0.083s

    opened by xvman 1
  • go build error

    go build error

    my enviroment is ubuntu, go 10.4. vasto use the current master branch. go build shows the below message:

    google.golang.org/genproto/googleapis/rpc/status

    ../../../google.golang.org/genproto/googleapis/rpc/status/status.pb.go:111:28: undefined: proto.InternalMessageInfo

    opened by xvman 1
  • Distributed key-value store over RDMA

    Distributed key-value store over RDMA

    Recently I have been thinking about what to do with my Master's degree graduation project. Today, I came across this idea as mentioned in the title. Is it valuable? If so, is it feasible?

    opened by BruceWangNo1 1
  • Understanding rebalancing of keys when new node is added or existing node is removed

    Understanding rebalancing of keys when new node is added or existing node is removed

    This is a question for understanding data movement during adding or removing a node to cluster.

    In my understanding, jump consistent hashing does not hash node on the ring. It seems like their is no ring concept. The output of hash is a number in [0, nodes) range which maps to a node.

    If a new node is added to cluster, how will we know which all keys need to move to this node ? Will we have to go to all nodes, read all keys and see if they belong to this new node ?

    Can you point me to code which does this ?

    In ring based consistent hashing, new node will be in ring and will copy keys from next node to itself. So, it may require going over all keys on only one node. However, in real world, one node maps to many buckets on the ring, so we may have to go to multiple nodes in ring based consistent hashing as well.

    opened by ashishnegi 3
  • what you think about adding ceph crush support?

    what you think about adding ceph crush support?

    I think about distributed kv, but i want to use ceph crushmap to ave ability to specify how much replicas needed and provide failure domains. what do you think? (i'm port some parts of ceph crushmap parsing code to go - https://github.com/unistack-org/go-crush )

    opened by vtolstov 3
Owner
Chris Lu
https://github.com/chrislusf/seaweedfs SeaweedFS the distributed file system and object store for billions of small files ...
Chris Lu
A disk-backed key-value store.

What is diskv? Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storin

Peter Bourgon 1.2k Nov 26, 2022
Membin is an in-memory database that can be stored on disk. Data model smiliar to key-value but values store as JSON byte array.

Membin Docs | Contributing | License What is Membin? The Membin database system is in-memory database smiliar to key-value databases, target to effici

Membin 3 Jun 3, 2021
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 41.9k Nov 23, 2022
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

George Hotz 2.5k Nov 27, 2022
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Burak Sezer 2.6k Nov 28, 2022
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Patrick Mylund Nielsen 6.7k Nov 25, 2022
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

徐佳军 2.6k Nov 28, 2022
Embedded key-value store for read-heavy workloads written in Go

Pogreb Pogreb is an embedded key-value store for read-heavy workloads written in Go. Key characteristics 100% Go. Optimized for fast random lookups an

Artem Krylysov 962 Nov 15, 2022
Fast and simple key/value store written using Go's standard library

Table of Contents Description Usage Cookbook Disadvantages Motivation Benchmarks Test 1 Test 4 Description Package pudge is a fast and simple key/valu

Vadim Kulibaba 331 Nov 17, 2022
Low-level key/value store in pure Go.

Description Package slowpoke is a simple key/value store written using Go's standard library only. Keys are stored in memory (with persistence), value

Vadim Kulibaba 100 Oct 14, 2022
Key-value store for temporary items :memo:

Tempdb TempDB is Redis-backed temporary key-value store for Go. Useful for storing temporary data such as login codes, authentication tokens, and temp

Rafael Jesus 17 Sep 26, 2022
a key-value store with multiple backends including leveldb, badgerdb, postgresql

Overview goukv is an abstraction layer for golang based key-value stores, it is easy to add any backend provider. Available Providers badgerdb: Badger

Mohammed Al Ashaal 52 Jun 7, 2022
A minimalistic in-memory key value store.

A minimalistic in-memory key value store. Overview You can think of Kiwi as thread safe global variables. This kind of library comes in helpful when y

SDSLabs 159 Dec 6, 2021
A simple Git Notes Key Value store

Gino Keva - Git Notes Key Values Gino Keva works as a simple Key Value store built on top of Git Notes, using an event sourcing architecture. Events a

Philips Software 24 Aug 14, 2022
Simple key value database that use json files to store the database

KValDB Simple key value database that use json files to store the database, the key and the respective value. This simple database have two gRPC metho

Francisco Santos 0 Nov 13, 2021
A rest-api that works with golang as an in-memory key value store

Rest API Service in GOLANG A rest-api that works with golang as an in-memory key value store Usage Run command below in terminal in project directory.

sercan aydın 0 Dec 6, 2021
Eagle - Eagle is a fast and strongly encrypted key-value store written in pure Golang.

EagleDB EagleDB is a fast and simple key-value store written in Golang. It has been designed for handling an exaggerated read/write workload, which su

null 9 Sep 28, 2022
A SQLite-based hierarchical key-value store written in Go

camellia ?? A lightweight hierarchical key-value store camellia is a Go library that implements a simple, hierarchical, persistent key-value store, ba

Valerio De Benedetto 32 Nov 9, 2022
Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Morteza Bashsiz 16 Jun 13, 2022