Embedded key-value store for read-heavy workloads written in Go



Docs Build Status Go Report Card Codecov

Pogreb is an embedded key-value store for read-heavy workloads written in Go.

Key characteristics

  • 100% Go.
  • Optimized for fast random lookups and infrequent bulk inserts.
  • Can store larger-than-memory data sets.
  • Low memory usage.
  • All DB methods are safe for concurrent use by multiple goroutines.


$ go get -u github.com/akrylysov/pogreb


Opening a database

To open or create a new database, use the pogreb.Open() function:

package main

import (


func main() {
    db, err := pogreb.Open("pogreb.test", nil)
    if err != nil {
    defer db.Close()

Writing to a database

Use the DB.Put() function to insert a new key-value pair:

err := db.Put([]byte("testKey"), []byte("testValue"))
if err != nil {

Reading from a database

To retrieve the inserted value, use the DB.Get() function:

val, err := db.Get([]byte("testKey"))
if err != nil {
log.Printf("%s", val)

Iterating over items

To iterate over items, use ItemIterator returned by DB.Items():

it := db.Items()
for {
    key, val, err := it.Next()
    if err == pogreb.ErrIterationDone {
    if err != nil { 
    log.Printf("%s %s", key, val)


The benchmarking code can be found in the pogreb-bench repository.

Results of read performance benchmark of pogreb, goleveldb, bolt and badgerdb on DigitalOcean 8 CPUs / 16 GB RAM / 160 GB SSD + Ubuntu 16.04.3 (higher is better):


Design document.

  • High disk space utilization

    High disk space utilization

    Details https://github.com/ethereum/go-ethereum/pull/20029.

    When storing small keys/values Pogreb wastes too much space by making all writes 512-byte aligned.

    opened by akrylysov 20
  • Some explanation of the internals ?

    Some explanation of the internals ?

    Hello, I am trying to understand the internals of pogreb, but unfortunately I cannot seem to understand the semantics of certain aspects of the database. Namely the the data storage aspects and how they provide for ACID semantics ( if and to the extent supported by the database ) and of course the very impressive performance :) Could you please write a few words on the internals of pogreb ? I am sure that such information would be well received. Thank-you.

    opened by suprafun 15
  • Slice out of bounds

    Slice out of bounds

    I wanted to test this db but I got this error:

    panic: runtime error: slice bounds out of range [:1073742336] with length 1073741824
    goroutine 1 [running]:
    github.com/akrylysov/pogreb/fs.mmap(0xc00008c038, 0x40000200, 0x80000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/fs/os_windows.go:32 +0x259
    github.com/akrylysov/pogreb/fs.(*osfile).Mmap(0xc000068c90, 0x40000200, 0x200, 0x200)
            .../github.com/akrylysov/pogreb/fs/os.go:100 +0x6e
    github.com/akrylysov/pogreb.(*file).append(0xc00004f140, 0xc0001b6800, 0x200, 0x200, 0x0, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/file.go:45 +0xc7
    github.com/akrylysov/pogreb.(*dataFile).writeKeyValue(0xc00004f140, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x3ffffe00, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/datafile.go:44 +0x1a7
    github.com/akrylysov/pogreb.(*DB).put(0xc00004f110, 0xc95a802f, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/db.go:432 +0x260
    github.com/akrylysov/pogreb.(*DB).Put(0xc00004f110, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/db.go:366 +0x171
            .../main.go:27 +0x1b3
    exit status 2


    package main
    import (
    func main() {
    	db, err := pogreb.Open("pogreb.test", nil)
    	if err != nil {
    	defer db.Close()
    	start := time.Now()
    	var pk [8]byte
    	for i := uint64(1); i <= 10000000; i++ {
    		binary.BigEndian.PutUint64(pk[:], i)
    		if err := db.Put(pk[:], pk[:]); err != nil {
    	log.Println("put 10M: ", time.Now().Sub(start).String())

    I think the db needs to do automatic fsync when it reaches 1gb file?

    opened by ghost 9
  • panic after restart

    panic after restart

    After restart

    `panic: runtime error: slice bounds out of range [:8511984455920089209] with capacity 1073741824

    goroutine 1 [running]: github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc0002ea3f0, 0x7620a4c3a4c37679, 0x7620a4c3a4c37879, 0xc0000b7b58, 0xc0000b7af8, 0xc0000b7b48, 0xc0009a9340, 0xc0000b7b50) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/fs/os.go:68 +0xa8 github.com/akrylysov/pogreb.(*bucketHandle).read(0xc0000b77d8, 0x20616c6c, 0x20616c6c61766174) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/bucket.go:76 +0x56 github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc0002f01a0, 0xc000000009, 0xc0000b7b58, 0x8928a1, 0x419b36) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:178 +0xc4 github.com/akrylysov/pogreb.(*DB).put(0xc0002f01a0, 0x9d3cc9e9, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:384 +0x161 github.com/akrylysov/pogreb.(*DB).Put(0xc0002f01a0, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:366 +0x16a gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler.InsertAllQue(0xc0001481c0, 0xc000586000, 0x63, 0x80, 0xc000aae000, 0x9c4) /exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler/pogrebhandler.go:25 +0x14e main.main() /exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/cmd/rpcfeeder/main.go:274 +0x456 exit status 2`

    opened by remotejob 8
  • Make db.sync public

    Make db.sync public

    Use case

    When using multiple databases at once, enabling the background sync feature in all of them would be redundant for most filesystems.

    The current workaround, deciding which of them to enable background sync on, can get needlessly complicated.


    • add a helper for the multi-db use case
    opened by 0joshuaolson1 7
  • Fix data corruption (issue #20)

    Fix data corruption (issue #20)

    Fixes a race condition that could lead to data corruption. See https://github.com/akrylysov/pogreb/issues/20 for more details.

    Adding an extra heap allocation and a copy made the read performance worse. I'll consider adding a new option ReadOnly which eliminates the copy for read-only use cases.


    opened by akrylysov 4
  • Memory mapping all segment files causes memory exhaustion

    Memory mapping all segment files causes memory exhaustion

    We are storing billions of records using Pogreb. It creates many 4GB segment files (.PSG). It is my understanding that those files represent the write-ahead log (WAL) which is only used in case of recover?

    If that is indeed the case, then only the last WAL file needs to be open (for writing)? Currently those files are literally exhausting our memory and use about 80 GB of RAM.


    Using RamMap we found the culprit - memory mapped PSG files: image

    opened by Kleissner 3
  • Open/read does not fail on invalid file

    Open/read does not fail on invalid file

    Recently I realized I was opening the wrong database and it took me an hour to figure it out because (*DB).FileSize() was returning non-zero and (*DB).Count() was returning zero, and there were no errors reported by (*DB).Open(). We have no standard way to figure out if the DB is invalid?

    As a bonus, doing this will also change the target file even if it wasn't a correct/working database file to begin with.

    opened by dsoprea 3
  • murmur hash functions fail on non-Windows machines due to unsafe pointers on go 1.14

    murmur hash functions fail on non-Windows machines due to unsafe pointers on go 1.14

    The Sum32WithSeed function in /hash/murmur32.go fails with "checkptr: unsafe pointer arithmetic" from Go 1.14 onwards, due to the flag -race now being applied automatically.

    This prevents pogreb from working on any non-Windows version running on Go 1.14

    An example of more correct code can be found here

    opened by maximinus 2
  • Documentation Clarification: Rebuilding Daily

    Documentation Clarification: Rebuilding Daily

    In the documentation, you say:

    I needed to rebuild the mapping once a day and then access it in read-only mode.

    From this it makes me wonder whether pogreb is intended to be used that way, or if it was intended to solve the problem of having to do that.

    opened by AusIV 2
  • Data corruption due to slice internals exposed

    Data corruption due to slice internals exposed

    Hi, I tested pogreb out with a very simple fuzzer that I initially wrote for bigCache, with very small adaptations (which explains why the test is a bit wonky, calling it "cache", for example). Here's the program:

    package main
    import (
    const (
    	slotsPerBucket = 28
    	loadFactor     = 0.7
    	indexPostfix   = ".index"
    	lockPostfix    = ".lock"
    	version        = 1 // file format version
    	// MaxKeyLength is the maximum size of a key in bytes.
    	MaxKeyLength = 1 << 16
    	// MaxValueLength is the maximum size of a value in bytes.
    	MaxValueLength = 1 << 30
    	// MaxKeys is the maximum numbers of keys in the DB.
    	MaxKeys = math.MaxUint32
    func removeAndOpen(path string, opts *pogreb.Options) ( *pogreb.DB, error) {
    	os.Remove(path + indexPostfix)
    	os.Remove(path + lockPostfix)
    	return pogreb.Open(path, opts)
    func fuzzDeletePutGet(ctx context.Context) {
    	cache, err := removeAndOpen("test.db", nil)
    	if err != nil {
    	var wg sync.WaitGroup
    	// Deleter
    	go func() {
    		defer wg.Done()
    		for {
    			select {
    			case <-ctx.Done():
    				r := uint8(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    	// Setter
    	go func() {
    		defer wg.Done()
    		val := make([]byte, 1024)
    		for {
    			select {
    			case <-ctx.Done():
    				r := byte(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    				for j := 0; j < len(val); j++ {
    					val[j] = r
    				cache.Put([]byte(key), []byte(val))
    	// Getter
    	go func() {
    		defer wg.Done()
    		var (
    			val    = make([]byte, 1024)
    			hits   = uint64(0)
    			misses = uint64(0)
    		for {
    			select {
    			case <-ctx.Done():
    				r := byte(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    				for j := 0; j < len(val); j++ {
    					val[j] = r
    				if got, err := cache.Get([]byte(key)); got != nil && !bytes.Equal(got, val) {
    					errStr := fmt.Sprintf("got %s ->\n %x\n expected:\n %x\n ", key, got, val)
    				} else {
    					if err == nil {
    					} else {
    				if total := hits + misses; total%1000000 == 0 {
    					percentage := float64(100) * float64(hits) / float64(total)
    					fmt.Printf("Hits %d (%.2f%%) misses %d \n", hits, percentage, misses)
    func main() {
    	sigs := make(chan os.Signal, 1)
    	ctx, cancel := context.WithCancel(context.Background())
    	signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
    	fmt.Println("Press ctrl-c to exit")
    	go fuzzDeletePutGet(ctx)

    The program has three workers :

    • One that randomly deletes a key
    • One that randomly writes a key, where there's a well defined correlation between key and value.
    • One that randomly checks if a key/value mapping is consistent.

    When I ran it, it errorred out after about 4M or 5M tests:

    GOROOT=/rw/usrlocal/go #gosetup
    GOPATH=/home/user/go #gosetup
    /rw/usrlocal/go/bin/go build -o /tmp/___go_build_fuzzer_go /home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go #gosetup
    /tmp/___go_build_fuzzer_go #gosetup
    Press ctrl-c to exit
    Hits 1000000 (100.00%) misses 0 
    Hits 2000000 (100.00%) misses 0 
    Hits 3000000 (100.00%) misses 0 
    Hits 4000000 (100.00%) misses 0 
    Hits 5000000 (100.00%) misses 0 
    panic: got thekey112 ->
    goroutine 10 [running]:
    main.fuzzDeletePutGet.func3(0xc00001a650, 0x6ee480, 0xc0000601c0, 0xc00008b110)
    	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:108 +0x656
    created by main.fuzzDeletePutGet
    	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:88 +0x17a

    Looking into it a bit, I found that although the Get method is properly mutex:ed, the value is in fact a pointer to a slice, and not copied out into a new buffer.

    I hacked on a little fix:

    diff --git a/db.go b/db.go
    index 967bbf0..961add9 100644
    --- a/db.go
    +++ b/db.go
    @@ -288,7 +288,12 @@ func (db *DB) Get(key []byte) ([]byte, error) {
            if err != nil {
                    return nil, err
    -       return retValue, nil
    +       var safeRetValue []byte
    +       if retValue != nil{
    +               safeRetValue = make([]byte, len(retValue))
    +               copy(safeRetValue, retValue)
    +       }
    +       return safeRetValue, nil
     // Has returns true if the DB contains the given key.

    And with the attached fix, I couldn't reproduce it any longer (at least not for 10M+ tests.

    The benchmarks without and with the hacky fix are:

    BenchmarkGet-6   	10000000	       166 ns/op
    BenchmarkGet-6   	10000000	       182 ns/op

    Now, I'm not totally sure if the testcase is fair, as I'm not 100% sure what concurrency-guarantees pogreb has. My test has both a setter and a deleter, so basically two writers and one reader, which might not be a supported setup? (on the other hand, I'm guessing this flaw should be reproducible even with only one writer)

    opened by holiman 2
  • How to ommit pogreb output before the result of get ?

    How to ommit pogreb output before the result of get ?

    How to ommit this pogreb output before get my result?

    ❯ go run main.go getkv prm2 pogreb: moving non-segment files... pogreb: moved 00000-1.psg.pmt to 00000-1.psg.pmt.bac pogreb: moved db.pmt to db.pmt.bac pogreb: moved index.pmt to index.pmt.bac pogreb: moved main.pix to main.pix.bac pogreb: moved overflow.pix to overflow.pix.bac pogreb: error reading segment meta 0: EOF pogreb: started recovery pogreb: rebuilding index... pogreb: removing recovery backup files... pogreb: removed 00000-1.psg.pmt.bac pogreb: removed db.pmt.bac pogreb: removed index.pmt.bac pogreb: removed main.pix.bac pogreb: removed overflow.pix.bac pogreb: successfully recovered database conten123Test

    opened by waldirborbajr 0
  • Need information on few points about pogreb

    Need information on few points about pogreb

    Hi Team

    Need to know whether pogreb can support TB of data in the data store? And retrieve more than 500k values for a specific hash key using prefix iteration?

    Thanks Vishal

    opened by vishaljangid1729 0
  • Its safe for multiple go instance writes?

    Its safe for multiple go instance writes?

    Hi, from documentation its clear that storage can work with multiple goroutines inside one singleton application. But can it work in scaled applications?

    For example, i have N instances of go application. Each have X goroutines. N * X functions will write data to db file in parallel, its safe?

    opened by fe3dback 1
  • 4 billion records max?

    4 billion records max?

    I just realized that index.numKeys is a 32-bit uint, and there's MaxKeys = math.MaxUint32 😲

    I think it would make sense to change it to 64-bit (any reason why we wouldn't support max 64-bit number of records)? I assume it would break existing dbs (but is still necessary)?

    At least it should be clearly stated as limitation in the readme I would suggest.

    Our use case is to store billions of records. We've reached already 2 billion records with Pogreb - which means in a matter of weeks we'll hit the current upper limit 😢

    opened by Kleissner 8
  • v0.10.1(May 1, 2021)

  • v0.10.0(Feb 10, 2021)


    • Memory-mapped file access can now be disabled by setting Options.FileSystem to fs.OS.


    • The default file system implementation is changed to fs.OSMMap.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(Jan 1, 2021)


    • Write-ahead log doesn't rely on wall-clock time anymore. It prevents potential race conditions during compaction and recovery.


    • Fix recovery writing extra delete records.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Apr 9, 2020)

  • v0.9.0(Mar 8, 2020)

    This release replaces the unstructured data file for storing key-value pairs with a write-ahead log.

    • In the event of a crash or a power loss the database is automatically recovered.
    • Optional background compaction allows reclaiming disk space occupied by overwritten or deleted keys.
    • Fix disk space overhead when storing small keys and values.
    Source code(tar.gz)
    Source code(zip)
Artem Krylysov
Artem Krylysov
An embedded key/value database for Go.

bbolt bbolt is a fork of Ben Johnson's Bolt key/value store. The purpose of this fork is to provide the Go community with an active maintenance and de

etcd-io 5.9k Sep 22, 2022
An embedded, hardened key/value database for Go.

Bolt Bolt is a pure Go key/value store inspired by Howard Chu's LMDB project. The goal of the project is to provide a simple, fast, and reliable datab

null 0 Nov 4, 2021
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

徐佳军 2.5k Sep 21, 2022
Fast and simple key/value store written using Go's standard library

Table of Contents Description Usage Cookbook Disadvantages Motivation Benchmarks Test 1 Test 4 Description Package pudge is a fast and simple key/valu

Vadim Kulibaba 330 Sep 22, 2022
Eagle - Eagle is a fast and strongly encrypted key-value store written in pure Golang.

EagleDB EagleDB is a fast and simple key-value store written in Golang. It has been designed for handling an exaggerated read/write workload, which su

null 7 Aug 20, 2022
A SQLite-based hierarchical key-value store written in Go

camellia ?? A lightweight hierarchical key-value store camellia is a Go library that implements a simple, hierarchical, persistent key-value store, ba

Valerio De Benedetto 33 Aug 11, 2022
Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on commands and key-regexes written by GO

Welcome to NIPO Nipo is a powerful, fast, multi-thread, clustered and in-memory key-value database, with ability to configure token and acl on command

Morteza Bashsiz 16 Jun 13, 2022
A disk-backed key-value store.

What is diskv? Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storin

Peter Bourgon 1.2k Sep 23, 2022
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Patrick Mylund Nielsen 6.5k Sep 21, 2022
Low-level key/value store in pure Go.

Description Package slowpoke is a simple key/value store written using Go's standard library only. Keys are stored in memory (with persistence), value

Vadim Kulibaba 99 Mar 13, 2022
Key-value store for temporary items :memo:

Tempdb TempDB is Redis-backed temporary key-value store for Go. Useful for storing temporary data such as login codes, authentication tokens, and temp

Rafael Jesus 16 Jan 23, 2022
A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

Vasto A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption. Vasto sca

Chris Lu 239 Aug 31, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 41.3k Sep 22, 2022
a key-value store with multiple backends including leveldb, badgerdb, postgresql

Overview goukv is an abstraction layer for golang based key-value stores, it is easy to add any backend provider. Available Providers badgerdb: Badger

Mohammed Al Ashaal 52 Jun 7, 2022
A minimalistic in-memory key value store.

A minimalistic in-memory key value store. Overview You can think of Kiwi as thread safe global variables. This kind of library comes in helpful when y

SDSLabs 159 Dec 6, 2021
Membin is an in-memory database that can be stored on disk. Data model smiliar to key-value but values store as JSON byte array.

Membin Docs | Contributing | License What is Membin? The Membin database system is in-memory database smiliar to key-value databases, target to effici

Membin 3 Jun 3, 2021
A simple Git Notes Key Value store

Gino Keva - Git Notes Key Values Gino Keva works as a simple Key Value store built on top of Git Notes, using an event sourcing architecture. Events a

Philips Software 24 Aug 14, 2022
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

George Hotz 2.4k Sep 25, 2022
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Burak Sezer 2.3k Sep 21, 2022