Pogreb is an embedded key-value store for read-heavy workloads written in Go.



Docs Build Status Go Report Card Codecov

Pogreb is an embedded key-value store for read-heavy workloads written in Go.

Key characteristics

  • 100% Go.
  • Optimized for fast random lookups and infrequent bulk inserts.
  • Can store larger-than-memory data sets.
  • Low memory usage.
  • All DB methods are safe for concurrent use by multiple goroutines.


$ go get -u github.com/akrylysov/pogreb


Opening a database

To open or create a new database, use the pogreb.Open() function:

package main

import (


func main() {
    db, err := pogreb.Open("pogreb.test", nil)
    if err != nil {
    defer db.Close()

Writing to a database

Use the DB.Put() function to insert a new key-value pair:

err := db.Put([]byte("testKey"), []byte("testValue"))
if err != nil {

Reading from a database

To retrieve the inserted value, use the DB.Get() function:

val, err := db.Get([]byte("testKey"))
if err != nil {
log.Printf("%s", val)

Iterating over items

To iterate over items, use ItemIterator returned by DB.Items():

it := db.Items()
for {
    key, val, err := it.Next()
    if err == pogreb.ErrIterationDone {
    if err != nil { 
    log.Printf("%s %s", key, val)


The benchmarking code can be found in the pogreb-bench repository.

Results of read performance benchmark of pogreb, goleveldb, bolt and badgerdb on DigitalOcean 8 CPUs / 16 GB RAM / 160 GB SSD + Ubuntu 16.04.3 (higher is better):


Design document.

  • High disk space utilization

    High disk space utilization

    Details https://github.com/ethereum/go-ethereum/pull/20029.

    When storing small keys/values Pogreb wastes too much space by making all writes 512-byte aligned.

    opened by akrylysov 20
  • Some explanation of the internals ?

    Some explanation of the internals ?

    Hello, I am trying to understand the internals of pogreb, but unfortunately I cannot seem to understand the semantics of certain aspects of the database. Namely the the data storage aspects and how they provide for ACID semantics ( if and to the extent supported by the database ) and of course the very impressive performance :) Could you please write a few words on the internals of pogreb ? I am sure that such information would be well received. Thank-you.

    opened by suprafun 15
  • Slice out of bounds

    Slice out of bounds

    I wanted to test this db but I got this error:

    panic: runtime error: slice bounds out of range [:1073742336] with length 1073741824
    goroutine 1 [running]:
    github.com/akrylysov/pogreb/fs.mmap(0xc00008c038, 0x40000200, 0x80000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/fs/os_windows.go:32 +0x259
    github.com/akrylysov/pogreb/fs.(*osfile).Mmap(0xc000068c90, 0x40000200, 0x200, 0x200)
            .../github.com/akrylysov/pogreb/fs/os.go:100 +0x6e
    github.com/akrylysov/pogreb.(*file).append(0xc00004f140, 0xc0001b6800, 0x200, 0x200, 0x0, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/file.go:45 +0xc7
    github.com/akrylysov/pogreb.(*dataFile).writeKeyValue(0xc00004f140, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x3ffffe00, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/datafile.go:44 +0x1a7
    github.com/akrylysov/pogreb.(*DB).put(0xc00004f110, 0xc95a802f, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/db.go:432 +0x260
    github.com/akrylysov/pogreb.(*DB).Put(0xc00004f110, 0xc000089eb0, 0x8, 0x8, 0xc000089eb0, 0x8, 0x8, 0x0, 0x0)
            .../github.com/akrylysov/pogreb/db.go:366 +0x171
            .../main.go:27 +0x1b3
    exit status 2


    package main
    import (
    func main() {
    	db, err := pogreb.Open("pogreb.test", nil)
    	if err != nil {
    	defer db.Close()
    	start := time.Now()
    	var pk [8]byte
    	for i := uint64(1); i <= 10000000; i++ {
    		binary.BigEndian.PutUint64(pk[:], i)
    		if err := db.Put(pk[:], pk[:]); err != nil {
    	log.Println("put 10M: ", time.Now().Sub(start).String())

    I think the db needs to do automatic fsync when it reaches 1gb file?

    opened by ghost 9
  • panic after restart

    panic after restart

    After restart

    `panic: runtime error: slice bounds out of range [:8511984455920089209] with capacity 1073741824

    goroutine 1 [running]: github.com/akrylysov/pogreb/fs.(*osfile).Slice(0xc0002ea3f0, 0x7620a4c3a4c37679, 0x7620a4c3a4c37879, 0xc0000b7b58, 0xc0000b7af8, 0xc0000b7b48, 0xc0009a9340, 0xc0000b7b50) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/fs/os.go:68 +0xa8 github.com/akrylysov/pogreb.(*bucketHandle).read(0xc0000b77d8, 0x20616c6c, 0x20616c6c61766174) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/bucket.go:76 +0x56 github.com/akrylysov/pogreb.(*DB).forEachBucket(0xc0002f01a0, 0xc000000009, 0xc0000b7b58, 0x8928a1, 0x419b36) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:178 +0xc4 github.com/akrylysov/pogreb.(*DB).put(0xc0002f01a0, 0x9d3cc9e9, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:384 +0x161 github.com/akrylysov/pogreb.(*DB).Put(0xc0002f01a0, 0xc00039c4b0, 0x10, 0x10, 0xc00068f000, 0x2927, 0x4b09, 0x0, 0x0) /exwindoz/home/juno/gowork/pkg/mod/github.com/akrylysov/[email protected]/db.go:366 +0x16a gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler.InsertAllQue(0xc0001481c0, 0xc000586000, 0x63, 0x80, 0xc000aae000, 0x9c4) /exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/pkg/pogrebhandler/pogrebhandler.go:25 +0x14e main.main() /exwindoz/home/juno/gowork/src/gitlab.com/remotejob/mlfactory-feederv4/cmd/rpcfeeder/main.go:274 +0x456 exit status 2`

    opened by remotejob 8
  • Make db.sync public

    Make db.sync public

    Use case

    When using multiple databases at once, enabling the background sync feature in all of them would be redundant for most filesystems.

    The current workaround, deciding which of them to enable background sync on, can get needlessly complicated.


    • add a helper for the multi-db use case
    opened by 0joshuaolson1 7
  • Fix data corruption (issue #20)

    Fix data corruption (issue #20)

    Fixes a race condition that could lead to data corruption. See https://github.com/akrylysov/pogreb/issues/20 for more details.

    Adding an extra heap allocation and a copy made the read performance worse. I'll consider adding a new option ReadOnly which eliminates the copy for read-only use cases.


    opened by akrylysov 4
  • Memory mapping all segment files causes memory exhaustion

    Memory mapping all segment files causes memory exhaustion

    We are storing billions of records using Pogreb. It creates many 4GB segment files (.PSG). It is my understanding that those files represent the write-ahead log (WAL) which is only used in case of recover?

    If that is indeed the case, then only the last WAL file needs to be open (for writing)? Currently those files are literally exhausting our memory and use about 80 GB of RAM.


    Using RamMap we found the culprit - memory mapped PSG files: image

    opened by Kleissner 3
  • Open/read does not fail on invalid file

    Open/read does not fail on invalid file

    Recently I realized I was opening the wrong database and it took me an hour to figure it out because (*DB).FileSize() was returning non-zero and (*DB).Count() was returning zero, and there were no errors reported by (*DB).Open(). We have no standard way to figure out if the DB is invalid?

    As a bonus, doing this will also change the target file even if it wasn't a correct/working database file to begin with.

    opened by dsoprea 3
  • murmur hash functions fail on non-Windows machines due to unsafe pointers on go 1.14

    murmur hash functions fail on non-Windows machines due to unsafe pointers on go 1.14

    The Sum32WithSeed function in /hash/murmur32.go fails with "checkptr: unsafe pointer arithmetic" from Go 1.14 onwards, due to the flag -race now being applied automatically.

    This prevents pogreb from working on any non-Windows version running on Go 1.14

    An example of more correct code can be found here

    opened by maximinus 2
  • Documentation Clarification: Rebuilding Daily

    Documentation Clarification: Rebuilding Daily

    In the documentation, you say:

    I needed to rebuild the mapping once a day and then access it in read-only mode.

    From this it makes me wonder whether pogreb is intended to be used that way, or if it was intended to solve the problem of having to do that.

    opened by AusIV 2
  • Data corruption due to slice internals exposed

    Data corruption due to slice internals exposed

    Hi, I tested pogreb out with a very simple fuzzer that I initially wrote for bigCache, with very small adaptations (which explains why the test is a bit wonky, calling it "cache", for example). Here's the program:

    package main
    import (
    const (
    	slotsPerBucket = 28
    	loadFactor     = 0.7
    	indexPostfix   = ".index"
    	lockPostfix    = ".lock"
    	version        = 1 // file format version
    	// MaxKeyLength is the maximum size of a key in bytes.
    	MaxKeyLength = 1 << 16
    	// MaxValueLength is the maximum size of a value in bytes.
    	MaxValueLength = 1 << 30
    	// MaxKeys is the maximum numbers of keys in the DB.
    	MaxKeys = math.MaxUint32
    func removeAndOpen(path string, opts *pogreb.Options) ( *pogreb.DB, error) {
    	os.Remove(path + indexPostfix)
    	os.Remove(path + lockPostfix)
    	return pogreb.Open(path, opts)
    func fuzzDeletePutGet(ctx context.Context) {
    	cache, err := removeAndOpen("test.db", nil)
    	if err != nil {
    	var wg sync.WaitGroup
    	// Deleter
    	go func() {
    		defer wg.Done()
    		for {
    			select {
    			case <-ctx.Done():
    				r := uint8(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    	// Setter
    	go func() {
    		defer wg.Done()
    		val := make([]byte, 1024)
    		for {
    			select {
    			case <-ctx.Done():
    				r := byte(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    				for j := 0; j < len(val); j++ {
    					val[j] = r
    				cache.Put([]byte(key), []byte(val))
    	// Getter
    	go func() {
    		defer wg.Done()
    		var (
    			val    = make([]byte, 1024)
    			hits   = uint64(0)
    			misses = uint64(0)
    		for {
    			select {
    			case <-ctx.Done():
    				r := byte(rand.Int())
    				key := fmt.Sprintf("thekey%d", r)
    				for j := 0; j < len(val); j++ {
    					val[j] = r
    				if got, err := cache.Get([]byte(key)); got != nil && !bytes.Equal(got, val) {
    					errStr := fmt.Sprintf("got %s ->\n %x\n expected:\n %x\n ", key, got, val)
    				} else {
    					if err == nil {
    					} else {
    				if total := hits + misses; total%1000000 == 0 {
    					percentage := float64(100) * float64(hits) / float64(total)
    					fmt.Printf("Hits %d (%.2f%%) misses %d \n", hits, percentage, misses)
    func main() {
    	sigs := make(chan os.Signal, 1)
    	ctx, cancel := context.WithCancel(context.Background())
    	signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
    	fmt.Println("Press ctrl-c to exit")
    	go fuzzDeletePutGet(ctx)

    The program has three workers :

    • One that randomly deletes a key
    • One that randomly writes a key, where there's a well defined correlation between key and value.
    • One that randomly checks if a key/value mapping is consistent.

    When I ran it, it errorred out after about 4M or 5M tests:

    GOROOT=/rw/usrlocal/go #gosetup
    GOPATH=/home/user/go #gosetup
    /rw/usrlocal/go/bin/go build -o /tmp/___go_build_fuzzer_go /home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go #gosetup
    /tmp/___go_build_fuzzer_go #gosetup
    Press ctrl-c to exit
    Hits 1000000 (100.00%) misses 0 
    Hits 2000000 (100.00%) misses 0 
    Hits 3000000 (100.00%) misses 0 
    Hits 4000000 (100.00%) misses 0 
    Hits 5000000 (100.00%) misses 0 
    panic: got thekey112 ->
    goroutine 10 [running]:
    main.fuzzDeletePutGet.func3(0xc00001a650, 0x6ee480, 0xc0000601c0, 0xc00008b110)
    	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:108 +0x656
    created by main.fuzzDeletePutGet
    	/home/user/go/src/github.com/akrylysov/pogreb/fuzz/fuzzer.go:88 +0x17a

    Looking into it a bit, I found that although the Get method is properly mutex:ed, the value is in fact a pointer to a slice, and not copied out into a new buffer.

    I hacked on a little fix:

    diff --git a/db.go b/db.go
    index 967bbf0..961add9 100644
    --- a/db.go
    +++ b/db.go
    @@ -288,7 +288,12 @@ func (db *DB) Get(key []byte) ([]byte, error) {
            if err != nil {
                    return nil, err
    -       return retValue, nil
    +       var safeRetValue []byte
    +       if retValue != nil{
    +               safeRetValue = make([]byte, len(retValue))
    +               copy(safeRetValue, retValue)
    +       }
    +       return safeRetValue, nil
     // Has returns true if the DB contains the given key.

    And with the attached fix, I couldn't reproduce it any longer (at least not for 10M+ tests.

    The benchmarks without and with the hacky fix are:

    BenchmarkGet-6   	10000000	       166 ns/op
    BenchmarkGet-6   	10000000	       182 ns/op

    Now, I'm not totally sure if the testcase is fair, as I'm not 100% sure what concurrency-guarantees pogreb has. My test has both a setter and a deleter, so basically two writers and one reader, which might not be a supported setup? (on the other hand, I'm guessing this flaw should be reproducible even with only one writer)

    opened by holiman 2
  • Extremely slow read speed while put speed is fine on Debian Machine

    Extremely slow read speed while put speed is fine on Debian Machine

    Hi there I am currently testing if pogreb fits my needs and am very impressed by its speed however I recently ran some benchmarks ( pogreb-benchmark ) on a Debian Server ./pogreb-bench -n 10_000_000 -p ./pogreb_test/ and am experiencing extremely slow read speed

    put: 503.882s 19845 ops/s I don't have a full duration for read speed since it would take too long to finish but it read about 630000 in 1500s

    I also made same test on Macbook where everything works great Any idea how this is possible? What can I do to pinpoint the issue?

    Edit: I tried it without mmap: put: 65.852s 151855 ops/s get: 25.389s 393876 ops/s

    However the issue persists at n=100_000_000

    Any idea why this is faster

    Thanks a lot!

    opened by realsirjoe 2
  • add ReadOnly config option for read-only filesystems

    add ReadOnly config option for read-only filesystems

    This PR adds a ReadOnly config option to be able to put the database on a read-only filesystem. Enabling this config options disables the Lockfile mechanism and sets all file access flags to O_RDONLY.

    opened by maitai 0
  • Its safe for multiple go instance writes?

    Its safe for multiple go instance writes?

    Hi, from documentation its clear that storage can work with multiple goroutines inside one singleton application. But can it work in scaled applications?

    For example, i have N instances of go application. Each have X goroutines. N * X functions will write data to db file in parallel, its safe?

    opened by fe3dback 1
  • 4 billion records max?

    4 billion records max?

    I just realized that index.numKeys is a 32-bit uint, and there's MaxKeys = math.MaxUint32 😲

    I think it would make sense to change it to 64-bit (any reason why we wouldn't support max 64-bit number of records)? I assume it would break existing dbs (but is still necessary)?

    At least it should be clearly stated as limitation in the readme I would suggest.

    Our use case is to store billions of records. We've reached already 2 billion records with Pogreb - which means in a matter of weeks we'll hit the current upper limit 😢

    opened by Kleissner 8
  • v0.10.1(May 1, 2021)

  • v0.10.0(Feb 10, 2021)


    • Memory-mapped file access can now be disabled by setting Options.FileSystem to fs.OS.


    • The default file system implementation is changed to fs.OSMMap.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(Jan 1, 2021)


    • Write-ahead log doesn't rely on wall-clock time anymore. It prevents potential race conditions during compaction and recovery.


    • Fix recovery writing extra delete records.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Apr 9, 2020)

  • v0.9.0(Mar 8, 2020)

    This release replaces the unstructured data file for storing key-value pairs with a write-ahead log.

    • In the event of a crash or a power loss the database is automatically recovered.
    • Optional background compaction allows reclaiming disk space occupied by overwritten or deleted keys.
    • Fix disk space overhead when storing small keys and values.
    Source code(tar.gz)
    Source code(zip)
Artem Krylysov
Artem Krylysov
Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Olric Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service. With

Burak Sezer 2.7k Jan 4, 2023
An embedded key/value database for Go.

Bolt Bolt is a pure Go key/value store inspired by Howard Chu's LMDB project. The goal of the project is to provide a simple, fast, and reliable datab

BoltDB 13.3k Dec 30, 2022
rosedb is a fast, stable and embedded key-value (k-v) storage engine based on bitcask.

rosedb is a fast, stable and embedded key-value (k-v) storage engine based on bitcask. Its on-disk files are organized as WAL(Write Ahead Log) in LSM trees, optimizing for write throughput.

roseduan 3.4k Dec 28, 2022
NutsDB a simple, fast, embeddable and persistent key/value store written in pure Go.

A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

徐佳军 2.7k Jan 9, 2023
A disk-backed key-value store.

What is diskv? Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storin

Peter Bourgon 1.2k Jan 1, 2023
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order

etcd-io 42.2k Dec 28, 2022
a persistent real-time key-value store, with the same redis protocol with powerful features

a fast NoSQL DB, that uses the same RESP protocol and capable to store terabytes of data, also it integrates with your mobile/web apps to add real-time features, soon you can use it as a document store cause it should become a multi-model db. Redix is used in production, you can use it in your apps with no worries.

Mohammed Al Ashaal 1.1k Dec 25, 2022
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.

GhostDB is designed to speed up dynamic database or API driven websites by storing data in RAM in order to reduce the number of times an external data source such as a database or API must be read. GhostDB provides a very large hash table that is distributed across multiple machines and stores large numbers of key-value pairs within the hash table.

Jake Grogan 734 Jan 6, 2023
CrankDB is an ultra fast and very lightweight Key Value based Document Store.

CrankDB is a ultra fast, extreme lightweight Key Value based Document Store.

Shrey Batra 30 Apr 12, 2022
yakv is a simple, in-memory, concurrency-safe key-value store for hobbyists.

yakv (yak-v. (originally intended to be "yet-another-key-value store")) is a simple, in-memory, concurrency-safe key-value store for hobbyists. yakv provides persistence by appending transactions to a transaction log and restoring data from the transaction log on startup.

Aadhav Vignesh 5 Feb 24, 2022
Multithreaded key value pair store using thread safe locking mechanism allowing concurrent reads

Project Amnesia A Multi-threaded key-value pair store using thread safe locking mechanism allowing concurrent reads. Curious to Try it out?? Check out

Nikhil Nayak 7 Oct 29, 2022
ShockV is a simple key-value store with RESTful API

ShockV is a simple key-value store based on badgerDB with RESTful API. It's best suited for experimental project which you need a lightweight data store.

delihiros 2 Sep 26, 2021
A rest-api that works with golang as an in-memory key value store

In Store A rest-api that works with golang as an in-memory key value store Usage Fist of all, clone the repo with the command below. You must have gol

Eyüp Arslan 0 Oct 24, 2021
Distributed key-value store

Keva Distributed key-value store General Demo Start the server docker-compose up --build Insert data curl -XPOST http://localhost:5555/storage/test1

Yaroslav Gaponov 0 Nov 15, 2021
Simple in memory key-value store.

Simple in memory key-value store. Development This project is written in Go. Make sure you have Go installed (download). Version 1.17 or higher is req

Mustafa Navruz 0 Nov 6, 2021
A simple in-memory key-value store application

vtec vtec, is a simple in-memory key-value store application. vtec provides persistence by appending transactions to a json file and restoring data fr

Ahmet Tek 3 Jun 22, 2022
Biscuit is a multi-region HA key-value store for your AWS infrastructure secrets.

Biscuit Biscuit is a simple key-value store for your infrastructure secrets. Is Biscuit right for me? Biscuit is most useful to teams already using AW

Doug 557 Nov 10, 2022
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Patrick Mylund Nielsen 6.8k Jan 3, 2023
KV - a toy in-memory key value store built primarily in an effort to write more go and check out grpc

KV KV is a toy in-memory key value store built primarily in an effort to write more go and check out grpc. This is still a work in progress. // downlo

Ali Mir 0 Dec 30, 2021