A disk-backed key-value store.

Related tags

Database diskv
Overview

What is diskv?

Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storing arbitrary data on a filesystem by key, and builds several layers of performance-enhancing abstraction on top. The end result is a conceptually simple, but highly performant, disk-backed storage system.

Build Status

Installing

Install Go 1, either from source or with a prepackaged binary. Then,

$ go get github.com/peterbourgon/diskv

Usage

package main

import (
	"fmt"
	"github.com/peterbourgon/diskv"
)

func main() {
	// Simplest transform function: put all the data files into the base dir.
	flatTransform := func(s string) []string { return []string{} }

	// Initialize a new diskv store, rooted at "my-data-dir", with a 1MB cache.
	d := diskv.New(diskv.Options{
		BasePath:     "my-data-dir",
		Transform:    flatTransform,
		CacheSizeMax: 1024 * 1024,
	})

	// Write three bytes to the key "alpha".
	key := "alpha"
	d.Write(key, []byte{'1', '2', '3'})

	// Read the value back out of the store.
	value, _ := d.Read(key)
	fmt.Printf("%v\n", value)

	// Erase the key+value from the store (and the disk).
	d.Erase(key)
}

More complex examples can be found in the "examples" subdirectory.

Theory

Basic idea

At its core, diskv is a map of a key (string) to arbitrary data ([]byte). The data is written to a single file on disk, with the same name as the key. The key determines where that file will be stored, via a user-provided TransformFunc, which takes a key and returns a slice ([]string) corresponding to a path list where the key file will be stored. The simplest TransformFunc,

func SimpleTransform (key string) []string {
    return []string{}
}

will place all keys in the same, base directory. The design is inspired by Redis diskstore; a TransformFunc which emulates the default diskstore behavior is available in the content-addressable-storage example.

Note that your TransformFunc should ensure that one valid key doesn't transform to a subset of another valid key. That is, it shouldn't be possible to construct valid keys that resolve to directory names. As a concrete example, if your TransformFunc splits on every 3 characters, then

d.Write("abcabc", val) // OK: written to <base>/abc/abc/abcabc
d.Write("abc", val)    // Error: attempted write to <base>/abc/abc, but it's a directory

This will be addressed in an upcoming version of diskv.

Probably the most important design principle behind diskv is that your data is always flatly available on the disk. diskv will never do anything that would prevent you from accessing, copying, backing up, or otherwise interacting with your data via common UNIX commandline tools.

Advanced path transformation

If you need more control over the file name written to disk or if you want to support slashes in your key name or special characters in the keys, you can use the AdvancedTransform property. You must supply a function that returns a special PathKey structure, which is a breakdown of a path and a file name. Strings returned must be clean of any slashes or special characters:

func AdvancedTransformExample(key string) *diskv.PathKey {
	path := strings.Split(key, "/")
	last := len(path) - 1
	return &diskv.PathKey{
		Path:     path[:last],
		FileName: path[last] + ".txt",
	}
}

// If you provide an AdvancedTransform, you must also provide its
// inverse:

func InverseTransformExample(pathKey *diskv.PathKey) (key string) {
	txt := pathKey.FileName[len(pathKey.FileName)-4:]
	if txt != ".txt" {
		panic("Invalid file found in storage folder!")
	}
	return strings.Join(pathKey.Path, "/") + pathKey.FileName[:len(pathKey.FileName)-4]
}

func main() {
	d := diskv.New(diskv.Options{
		BasePath:          "my-data-dir",
		AdvancedTransform: AdvancedTransformExample,
		InverseTransform:  InverseTransformExample,
		CacheSizeMax:      1024 * 1024,
	})
	// Write some text to the key "alpha/beta/gamma".
	key := "alpha/beta/gamma"
	d.WriteString(key, "¡Hola!") // will be stored in "<basedir>/alpha/beta/gamma.txt"
	fmt.Println(d.ReadString("alpha/beta/gamma"))
}

Adding a cache

An in-memory caching layer is provided by combining the BasicStore functionality with a simple map structure, and keeping it up-to-date as appropriate. Since the map structure in Go is not threadsafe, it's combined with a RWMutex to provide safe concurrent access.

Adding order

diskv is a key-value store and therefore inherently unordered. An ordering system can be injected into the store by passing something which satisfies the diskv.Index interface. (A default implementation, using Google's btree package, is provided.) Basically, diskv keeps an ordered (by a user-provided Less function) index of the keys, which can be queried.

Adding compression

Something which implements the diskv.Compression interface may be passed during store creation, so that all Writes and Reads are filtered through a compression/decompression pipeline. Several default implementations, using stdlib compression algorithms, are provided. Note that data is cached compressed; the cost of decompression is borne with each Read.

Streaming

diskv also now provides ReadStream and WriteStream methods, to allow very large data to be handled efficiently.

Future plans

  • Needs plenty of robust testing: huge datasets, etc...
  • More thorough benchmarking
  • Your suggestions for use-cases I haven't thought of

Credits and contributions

Original idea, design and implementation: Peter Bourgon Other collaborations: Javier Peletier (Epic Labs)

Issues
  • Switch to actively maintained Red-Black tree implementation?

    Switch to actively maintained Red-Black tree implementation?

    opened by onlyjob 15
  • completeFilename causes invalid memory address or nil pointer dereference

    completeFilename causes invalid memory address or nil pointer dereference

    Hi

    We're running a component in Kubernetes that uses diskv under the hood. The problem is that the process occasionally crashes when it attempts to remove the key from the store. Here is the relevant stack trace:

    github.com/org/vendor/github.com/peterbourgon/diskv.(*Diskv).Erase(0xc42041c2d0, 0x0, 0x1b, 0x0, 0x0) /home/rabbit/org/vendor/github.com/peterbourgon/diskv/diskv.go:409 +0xe7
    github.com/org/vendor/github.com/peterbourgon/diskv.(*Diskv).completeFilename(0xc42041c2d0, 0x0, 0x1b, 0x1b, 0x27fdf00) /home/rabbit/org/vendor/github.com/peterbourgon/diskv/diskv.go:525 +0x98
    path/filepath.Join(0xc42110b970, 0x2, 0x2, 0xc4208e18c0, 0x11) /usr/lib/go/src/path/filepath/path.go:210
    path/filepath.join(0xc42110b970, 0x2, 0x2, 0x0, 0x0) /usr/lib/go/src/path/filepath/path_unix.go:45 +0x96
    strings.Join(0xc42110b970, 0x2, 0x2, 0x18d6236, 0x1, 0xc42110b918, 0x2) /usr/lib/go/src/strings/strings.go:424
    

    Data directory is mounted as a regular host path (/opt/spm/agent) and file names are ksuid-compatible identifiers.

    diskv is initialized with following configuration:

    d := diskv.New(diskv.Options{
    		BasePath:     c.Dir,
    		Transform:    func(s string) []string { return []string{} },
    		CacheSizeMax: 1024 * 1024,
    })
    

    Do you have any pointers or ideas why this would happen?

    bug 
    opened by rabbitstack 11
  • (*File).Chmod on Windows

    (*File).Chmod on Windows

    https://github.com/golang/go/blob/bba88467f86472764a656e61f5f3265ed6853692/src/syscall/syscall_windows.go#L1094

    os.Chmod always return syscall.EWINDOWS error on Windows.

    opened by mchtech 8
  • Suggestion: Improve performance by maintaining structs

    Suggestion: Improve performance by maintaining structs

    Currently diskv maintains in-memory values in []byte format. It would be excellent from a performance standpoint if arbitrary structs could be maintained instead as this would eliminate the overhead of deserialization. I suspect that the performance improvement would be 1-2 orders of magnitude depending on the application.

    enhancement helpwanted 
    opened by WinstonPrivacy 7
  • ensureCacheSpaceWithLock panics with variable sized keys

    ensureCacheSpaceWithLock panics with variable sized keys

    If variable sized keys are being used, then ensureCacheSpaceWithLock() can panic during the cache cleanup routine which loops through safe(). This occurs if the key being inserted is larger than the last key removed.

    It would be better if either this key were inserted anyway (and thus exceeding the cache threshold) or the routine was modified so that it cleared some percent of the total cache size before proceeding.

    bug helpwanted 
    opened by WinstonPrivacy 7
  • Cache expiration

    Cache expiration

    I've used this awesome library in a couple of projects now and I think it is a great tool. One thing I'm wondering about is a way to set a cache expiration policy. The policy would dictate when cache keys become stale and should expire and perhaps be erased. Have you considered any additional features to this library, or perhaps strategies to employ regarding this topic?

    Thanks!

    opened by adampresley 6
  • CAS feature request/offer?

    CAS feature request/offer?

    I was looking for a CAS[1].

    Seems like it could be a pretty small change to diskv. You would need a cryptographic hash function, like say skein or sha256. Then something like:

    d := diskv.New(diskv.Options{
            BasePath:     "my-data-dir",
            Transform:    flatTransform,
            CryptoHash: sha256,
            CacheSizeMax: 1024 * 1024,
        })
    (key, err) := d.put([]byte{'1', '2', '3'}))
    ...
    (value,err) := d.get(key)
    

    I use put/get because they imply (to me anyways) atomic operations, that read/write do not.

    Opinions? Alternatives?

    If I did something like the above and send a pull request, would you consider it?

    [1] http://en.wikipedia.org/wiki/Content-addressable_storage

    opened by spikebike 6
  • Implement atomic write to disk

    Implement atomic write to disk

    Implement atomic write by writing into a temporary file (within the same directory to guarantee that it won't move accross disks), with the same permission and then rename at the last moment to make it available.

    opened by apelisse 5
  • please tag formal releases

    please tag formal releases

    Please consider assigning version numbers and tagging releases. Tags/releases are useful for downstream package maintainers (in Debian and other distributions) to export source tarballs, automatically track new releases and to declare dependencies between packages. Read more in the Debian Upstream Guide.

    Versioning provides additional benefits to encourage vendoring of a particular (e.g. latest stable) release contrary to random unreleased snapshots.

    Versioning provides safety margin for situations like "ops, we've made a mistake but reverted problematic commit in master". Presumably tagged version/release is better tested/reviewed than random snapshot of "master" branch.

    Thank you.

    See also

    • http://semver.org/
    • https://en.wikipedia.org/wiki/Software_versioning
    opened by onlyjob 5
  • Canceling a Keys() walk

    Canceling a Keys() walk

    We need a done channel on the Keys and KeysPrefix iterator. That will be a breaking API to the library however.

    I am happy to send a PR but want to get agreement on the API breakage problem first. Based on discussion from https://github.com/coreos/rocket/pull/209#issuecomment-66217292

    opened by philips 5
  • concurrent access

    concurrent access

    I've a use-case where I have two applications reading/writing to the disk cache. When reading from the disk cache, an application will first read from the in-memory cache, which in some scenario's will be dirty. Is it possible to always read from the physical disk first?

    opened by asanami 5
  • Implement a fix for #66, excessive memory use in siphon.

    Implement a fix for #66, excessive memory use in siphon.

    The siphon will now stop writing to its internal buffer once the size of the buffer exceeds the maximum cache size. Because we write until we exceed the max cache size, we're safe to attempt the cache update even if the buffer only contains partial data, because it's still over the limit & will be rejected.

    opened by floren 1
  • ReadStream with a very large value results in excessive memory use when cache is enabled

    ReadStream with a very large value results in excessive memory use when cache is enabled

    If the cache is enabled, readWithRLock always reads the file using a siphon.

    The siphon code copies every byte it reads into a bytes.Buffer. When the full file has been read, that bytes.Buffer is used to update the cache.

    However, if the underlying file is e.g. a gigabyte in size, the siphon will end up with a bytes.Buffer containing that entire gigabyte. Unless you've set your cache size to over a gigabyte, this gets thrown away as soon as the ReadStream is done.

    The main reason we use ReadStream is so we can deal with very large items without having to stick the entire thing in memory at once. Having discovered this, we'll probably disable the cache, but there are cases where people may wish to have a cache enabled without blowing up their memory!

    opened by floren 2
  • ReadStream/WriteStream can lead to data races

    ReadStream/WriteStream can lead to data races

    If someone calls ReadStream, then proceeds to read from it slowly, then someone else calls WriteStream, the reader will start to get the new resource contents part-way through. This is because createKeyFileWithNoLock calls os.OpenFile with O_TRUNC set when updating a file, and the existing reader ends up pointing at the new data.

    opened by floren 0
  • ReadStream copies values into memory too soon

    ReadStream copies values into memory too soon

    As currently implemented, Diskv.ReadStream is not a purely streaming reader, because it buffers an entire copy of the value (in the siphon) before attempting to put the value into the cache. Unfortunately with large values this is a recipe for memory exhaustion.

    Ideally we would stream the value directly into the cache as the io.ReadCloser that ReadStream returns is consumed, checking the cache size as we go. I started to go down this path, but it creates another race condition because then writing into the cache is not atomic: we cannot know when the ReadCloser will be finished consuming the entry, and it's very possible for others to begin Reading the same key-value pair as we're still writing it. So the next step down that road is for readers to actually take a lock per cache entry (which would then get released once the caller Closes the ReadCloser). This quickly became a web of mutexes and synchronisation hacks which felt very unidiomatic golang.

    Various simple "solutions" exist (e.g. prechecking the size of the file against the cache size before starting to siphon), but they are all inherently racy and could still lead to memory exhaustion under stressful conditions. (We could also just take a global write lock during reads, but that wouldn't be very nice to other readers, would it?)

    @peterbourgon @philips thoughts?

    bug helpwanted 
    opened by jonboulle 7
Releases(v3.0.0)
Owner
Peter Bourgon
The official GitHub account of Dwayne 'The Rock' Johnson
Peter Bourgon
A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Vasto A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption. Vasto sca

Chris Lu 238 Aug 7, 2022
Membin is an in-memory database that can be stored on disk. Data model smiliar to key-value but values store as JSON byte array.

Membin Docs | Contributing | License What is Membin? The Membin database system is in-memory database smiliar to key-value databases, target to effici

Membin 3 Jun 3, 2021
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Patrick Mylund Nielsen 6.4k Aug 9, 2022
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

徐佳军 2.3k Aug 8, 2022
Embedded key-value store for read-heavy workloads written in Go

Pogreb Pogreb is an embedded key-value store for read-heavy workloads written in Go. Key characteristics 100% Go. Optimized for fast random lookups an

Artem Krylysov 914 Aug 2, 2022
Fast and simple key/value store written using Go's standard library

Table of Contents Description Usage Cookbook Disadvantages Motivation Benchmarks Test 1 Test 4 Description Package pudge is a fast and simple key/valu

Vadim Kulibaba 327 Aug 2, 2022
Low-level key/value store in pure Go.

Description Package slowpoke is a simple key/value store written using Go's standard library only. Keys are stored in memory (with persistence), value

Vadim Kulibaba 99 Mar 13, 2022
Key-value store for temporary items :memo:

Tempdb TempDB is Redis-backed temporary key-value store for Go. Useful for storing temporary data such as login codes, authentication tokens, and temp

Rafael Jesus 16 Jan 23, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 40.8k Aug 9, 2022
a key-value store with multiple backends including leveldb, badgerdb, postgresql

Overview goukv is an abstraction layer for golang based key-value stores, it is easy to add any backend provider. Available Providers badgerdb: Badger

Mohammed Al Ashaal 52 Jun 7, 2022
A minimalistic in-memory key value store.

A minimalistic in-memory key value store. Overview You can think of Kiwi as thread safe global variables. This kind of library comes in helpful when y

SDSLabs 159 Dec 6, 2021
A simple Git Notes Key Value store

Gino Keva - Git Notes Key Values Gino Keva works as a simple Key Value store built on top of Git Notes, using an event sourcing architecture. Events a

Philips Software 23 Jan 19, 2022
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

George Hotz 2.3k Aug 5, 2022
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Burak Sezer 2.3k Aug 14, 2022
Simple key value database that use json files to store the database

KValDB Simple key value database that use json files to store the database, the key and the respective value. This simple database have two gRPC metho

Francisco Santos 0 Nov 13, 2021
A rest-api that works with golang as an in-memory key value store

Rest API Service in GOLANG A rest-api that works with golang as an in-memory key value store Usage Run command below in terminal in project directory.

sercan aydın 0 Dec 6, 2021
Eagle - Eagle is a fast and strongly encrypted key-value store written in pure Golang.

EagleDB EagleDB is a fast and simple key-value store written in Golang. It has been designed for handling an exaggerated read/write workload, which su

null 6 Jul 9, 2022
A SQLite-based hierarchical key-value store written in Go

camellia ?? A lightweight hierarchical key-value store camellia is a Go library that implements a simple, hierarchical, persistent key-value store, ba

Valerio De Benedetto 31 Aug 4, 2022
Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Morteza Bashsiz 16 Jun 13, 2022