Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

Overview

Ristretto

Go Doc Go Report Card Coverage Tests

Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph.

Use Discuss Issues for reporting issues about this repository.

Features

  • High Hit Ratios - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - we use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many goroutines as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal Config values and you're off and running.

Status

Ristretto is usable but still under active development. We expect it to be production ready in the near future.

Table of Contents

Usage

Example

func main() {
	cache, err := ristretto.NewCache(&ristretto.Config{
		NumCounters: 1e7,     // number of keys to track frequency of (10M).
		MaxCost:     1 << 30, // maximum cost of cache (1GB).
		BufferItems: 64,      // number of keys per Get buffer.
	})
	if err != nil {
		panic(err)
	}

	// set a value with a cost of 1
	cache.Set("key", "value", 1)
	
	// wait for value to pass through buffers
	time.Sleep(10 * time.Millisecond)

	value, found := cache.Get("key")
	if !found {
		panic("missing value")
	}
	fmt.Println(value)
	cache.Del("key")
}

Config

The Config struct is passed to NewCache when creating Ristretto instances (see the example above).

NumCounters int64

NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the MaxCost value.

MaxCost int64

MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

MaxCost could be anything as long as it matches how you're using the cost values when calling Set.

BufferItems int64

BufferItems is the size of the Get buffers. The best value we've found for this is 64.

If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this.

Metrics bool

Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead.

OnEvict func(hashes [2]uint64, value interface{}, cost int64)

OnEvict is called for every eviction.

KeyToHash func(key interface{}) [2]uint64

KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of defaults depending on the underlying interface type.

Note that if you want 128bit hashes you should use the full [2]uint64, otherwise just fill the uint64 at the 0 position and it will behave like any 64bit hash.

Cost func(value interface{}) int64

Cost is an optional function you can pass to the Config in order to evaluate item cost at runtime, and only for the Set calls that aren't dropped (this is useful if calculating item cost is particularly expensive and you don't want to waste time on items that will be dropped anyways).

To signal to Ristretto that you'd like to use this Cost function:

  1. Set the Cost field to a non-nil function.
  2. When calling Set for new items or item updates, use a cost of 0.

Benchmarks

The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto.

Hit Ratios

Search

This trace is described as "disk read accesses initiated by a large commercial search engine in response to various web search requests."

Database

This trace is described as "a database server running at a commercial site running an ERP application on top of a commercial database."

Looping

This trace demonstrates a looping access pattern.

CODASYL

This trace is described as "references to a CODASYL database for a one hour period."

Throughput

All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb of RAM.

Mixed

Read

Write

FAQ

How are you achieving this performance? What shortcuts are you taking?

We go into detail in the Ristretto blog post, but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent admission policy and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.

Is Ristretto distributed?

No, it's just like any other Go library that you can import into your project and use in a single process.

Issues
  • cmSketch not benefitting from four rows

    cmSketch not benefitting from four rows

    I believe the commit f823dc4a broke the purpose of the cm sketch to have four rows. The bitwise-xor and mask are not creating the mix of indexes expected by the design. The same Increment/Estimate results, the same values, are achieved from a single row with or without the bitwise-xor. The earlier implementation seems to have been a good and very fast approximation to a distinct hash for each row except when the high 32 bits were all zero. One solution to fixing the earlier version could be to bitwise-or a single bit into the top 32 half. I can provide my testing and benchmarking on this offline if interested.

    For small values of row length as used in the current unit tests, this doesn't matter. As the row length gets larger, the gap between the earlier algorithm and the current one widens and I believe becomes significant.

    optimization priority/P1 status/accepted 
    opened by FrankReh 19
  • When using `string` as keys, there could be hash collisions, and thus `Get` could return the wrong data

    When using `string` as keys, there could be hash collisions, and thus `Get` could return the wrong data

    This is more a "design question" than an actual "issue", however given its implications I think it is an important question which impacts either the design or the usage constraints.

    Given that the KeyToHash (1) function supports string (and []byte), and it returns uint64, it is possible that there are hash collisions.

    However the cache.Get (2) method doesn't check if the found entry (if any) actually has the given key. (In fact the store doesn't even support storing the key.)

    (1) https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L20 (2) https://github.com/dgraph-io/ristretto/blob/master/cache.go#L83

    kind/bug priority/P1 status/accepted 
    opened by cipriancraciun 14
  • MaxCost sometimes ignored

    MaxCost sometimes ignored

    I have written a simple program that Gets and Sets entries in Ristretto. The program sets a MaxCost (in bytes) and passes a value for the cost to Set based on the size in bytes of the value.

    Code can be found here: https://gist.github.com/erikdubbelboer/bd904c51f00079421fd3eb2a061a50c0

    As you can see from the source Ristretto should be limited to 1GB of memory. The program doesn't do anything strange.

    In most cases this program works fine and Ristretto limits the memory usage correctly. But in the case of the exact code as above the memory usage keeps growing.

    Also keep in mind that runtime.GC() is run every second so it's not just uncollected allocations we seeing here. Not running runtime.GC() has the same result just more variation in the amount allocated.

    This is the output when run for a couple of seconds (just go run ristretto-bug.go):

    time: 1s alloc: 1033.72 MiB hits: 43 misses: 2614
    time: 2s alloc: 1035.74 MiB hits: 71 misses: 3230
    time: 3s alloc: 1038.76 MiB hits: 98 misses: 3210
    time: 4s alloc: 1048.77 MiB hits: 87 misses: 3236
    time: 5s alloc: 1065.79 MiB hits: 109 misses: 3229
    ...
    time: 45s alloc: 5893.52 MiB hits: 642 misses: 2773
    time: 46s alloc: 6049.54 MiB hits: 704 misses: 2689
    

    The output of http://localhost:6060/debug/pprof/heap?debug=1 shows that all memory is allocated inside strings.Repeat.

    My guess is that in some cases Ristretto keeps references to memory for entries that aren't part of the cache anymore.

    kind/bug 
    opened by erikdubbelboer 13
  • Port TinyLFU eviction policy from Caffeine

    Port TinyLFU eviction policy from Caffeine

    The implementation included in this change is non-concurrent. It allows injection of admission policies, but provides none yet. Further, storage of key/value is left as a separate exercise; this tracks only whether a key should be cached or evicted. Some small improvements have been made over Caffeine to reduce allocation of list nodes.


    This change is Reviewable

    opened by ckarenz 10
  • Question - Keys added and Keys evicted graphs are similar??

    Question - Keys added and Keys evicted graphs are similar??

    Hey Folks,

    I am testing Ristretto in QA environment. I am still trying to understand few things. Once thing that I noticed, line for Keys Added and Keys Evicted are super close to each other.

    Screen Shot 2020-06-26 at 10 05 49 AM

    I have a feeling that I am doing something wrong here.

    Case size = 10gig

    kind/question 
    opened by varun06 9
  • Improve memory performance

    Improve memory performance

    • Use an int64 instead of a time.Time struct to represent the time.
    • By default, include the cost of the storeItem in the cost calculation.

    Related to DGRAPH-1378


    This change is Reviewable

    opened by martinmr 8
  • New release?

    New release?

    The last release was 112 days ago. Some useful commits have been merged into master since then. Maybe it's time for a new release?

    Changes since the last version:

    • 49dc42c Add Anurag as codeowner (#158) (Anurag)
    • 7a3f2d3 z: use MemHashString and xxhash.Sum64String (#153) (Koichi Shiraishi)
    • 9c31bb2 Check conflict key before updating expiration map. (#154) (Martin Martinez Rivera)
    • 62cb731 Fix race condition in Cache.Clear (#133) (Martin Martinez Rivera)
    • 9af1934 Docs and whitespace changes for readability. (#143) (Martin Martinez Rivera)
    status/accepted 
    opened by erikdubbelboer 8
  • Why is ristretto using so much memory?

    Why is ristretto using so much memory?

    package main
    
    import (
    	"fmt"
    	"math/rand"
    	"strconv"
    
    	"github.com/dgraph-io/ristretto"
    )
    
    const (
    	maxCnt  = 1e7
    	maxLoop = 1e8
    )
    
    func main() {
    	cahchePool, err := ristretto.NewCache(&ristretto.Config{
    		NumCounters: 1 * 1e7,
    		MaxCost:     200 * (1 << 20), // allocate 200M, but running it need 2GB, why?(when i run this program, it kill by OOM)
    		BufferItems: 64,
    		Metrics:     true,
    	})
    
    	if err != nil {
    		panic(err)
    	}
    
    	key := "def48b5abb6388d3cbbb18e550f47b4cYB6eRI3cK1VN2qCEHp8kvuMuH20dq10cYDcG2e"
    	for i := 0; i < maxLoop; i++ {
    		suffix := strconv.Itoa(rand.Intn(maxCnt))
    		cahchePool.Set(key+suffix, suffix, int64(len(key+suffix)))
    		if (i % maxCnt) == 0 {
    			fmt.Println(cahchePool.Metrics)
    		}
    	}
    
    	cnt := 0
    	for i := 0; i < maxCnt; i++ {
    		if _, found := cahchePool.Get(key + strconv.Itoa(i)); found {
    			cnt++
    		}
    	}
    	fmt.Println("cnt:", cnt, "\n", cahchePool.Metrics)
    }
    
    
    investigate priority/P2 status/needs-attention 
    opened by caofengl 8
  • Ristretto causing memory corruption

    Ristretto causing memory corruption

    After adding Ristretto to our micro-service orchestrator written in go, we have been having data corruption issues. The cached data has randomly been inaccurate, forcing me to remove this in-memory cache implementation. Not sure how I would go about logging this information. I can try and take some time over the weekend to create an example repo to exhibit the issue.

    kind/bug 
    opened by apoggi-carecloud 8
  • Add ability to hook into the

    Add ability to hook into the "eviction" event

    Please provide another member onEvict in the Config struct which would be a function that would be called on eviction if it is present. This is needed so that users would be able to do some more, extra house-keeping from their side.

    kind/enhancement 
    opened by GiedriusS 8
  • feat: support msync for windows

    feat: support msync for windows

    Hi, I encountered a windows power loss caused problem while using badger, and see that msync on windows is not supported.

    I add the implementation and test it in a windows10 VM, it just works.

    see: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-flushviewoffile


    This change is Reviewable

    opened by chux0519 7
  • remove glog dependency

    remove glog dependency

    updated go mod deps

    glog adds global flags so it is not a good library (adding to an existing program causes a panic if a flag name conflicts)


    This change is Reviewable

    opened by jhawk28 3
  • fix: remove polluting google log library

    fix: remove polluting google log library

    For more context please see discussion https://discuss.dgraph.io/t/ristretto-0-1-0-pollutes-the-flags-namespace/14561

    By reverting #263 not only is the number of dependencies this library has greatly reduced, but projects using this library will not unavoidably panic.


    This change is Reviewable

    opened by aeneasr 5
  • Fixes closing of Cache and Policy to be concurrent safe

    Fixes closing of Cache and Policy to be concurrent safe

    Before this change, calling Close on the cache while a Set call is in progress could occasionally result in a panic when the Set method tried to write to the (now closed) setBuf channel. This commit changes Cache and Policy to keep references to the written-to channels in an internals struct, which is nil-ed out on close and checked in each call site, ensuring that nothing is attempting to write to a closed channel.


    This change is Reviewable

    opened by josephschorr 9
Releases(v0.1.0)
  • v0.1.0(Jun 3, 2021)

    0.1.0 - 2021-06-03

    This release contains bug fixes and improvements to Ristretto. It also contains major updates to the z package. The z package contains types such as Tree (B+ tree), Buffer, Mmap file, etc. All these types are used in Badger and Dgraph to improve performance and reduce memory requirements.

    Changed

    • Make item public. Add a new onReject call for rejected items. (#180)

    Added

    • Use z.Buffer backing for B+ tree (#268)
    • expose GetTTL function (#270)
    • docs(README): Ristretto is production-ready. (#267)
    • Add IterateKV (#265)
    • feat(super-flags): Add GetPath method in superflags (#258)
    • add GetDuration to SuperFlag (#248)
    • add Has, GetFloat64, and GetInt64 to SuperFlag (#247)
    • move SuperFlag to Ristretto (#246)
    • add SuperFlagHelp tool to generate flag help text (#251)
    • allow empty defaults in SuperFlag (#254)
    • add mmaped b+ tree (#207)
    • Add API to allow the MaxCost of an existing cache to be updated. (#200)
    • Add OnExit handler which can be used for manual memory management (#183)
    • Add life expectancy histogram (#182)
    • Add mechanism to wait for items to be processed. (#184)

    Fixed

    • change expiration type from int64 to time.Time (#277)
    • fix(buffer): make buffer capacity atleast defaultCapacity (#273)
    • Fixes for z.PersistentTree (#272)
    • Initialize persistent tree correctly (#271)
    • use xxhash v2 (#266)
    • update comments to correctly reflect counter space usage (#189)
    • enable riscv64 builds (#264)
    • Switch from log to glog (#263)
    • Use Fibonacci for latency numbers
    • cache: fix race when clearning a cache (#261)
    • Check for keys without values in superflags (#259)
    • chore(perf): using tags instead of runtime callers to improve the performance of leak detection (#255)
    • fix(Flags): panic on user errors (#256)
    • fix SuperFlagHelp newline (#252)
    • fix(arm): Fix crashing under ARMv6 due to memory mis-alignment (#239)
    • Fix incorrect unit test coverage depiction (#245)
    • chore(histogram): adding percentile in histogram (#241)
    • fix(windows): use filepath instead of path (#244)
    • fix(MmapFile): Close the fd before deleting the file (#242)
    • Fixes CGO_ENABLED=0 compilation error (#240)
    • fix(build): fix build on non-amd64 architectures (#238)
    • fix(b+tree): Do not double the size of btree (#237)
    • fix(jemalloc): Fix the stats of jemalloc (#236)
    • Don't print stuff, only return strings.
    • Bring memclrNoHeapPointers to z (#235)
    • increase number of buffers from 32 to 64 in allocator (#234)
    • Set minSize to 1MB.
    • Opt(btree): Use Go memory instead of mmap files
    • Opt(btree): Lightweight stats calculation
    • Put padding internally to z.Buffer
    • Chore(z): Add SetTmpDir API to set the temp directory (#233)
    • Add a BufferFrom
    • Bring z.Allocator and z.AllocatorPool back
    • Fix(z.Allocator): Make Allocator use Go memory
    • Updated ZeroOut to use a simple for loop. (#231)
    • Add concurrency back
    • Add a test to check concurrency of Allocator.
    • Fix(buffer): Expose padding by z.Buffer's APIs and fix test (#222)
    • AllocateSlice should Truncate if the file is not big enough (#226)
    • Zero out allocations for structs now that we're reusing Allocators.
    • Fix the ristretto substring
    • Deal with nil z.AllocatorPool
    • Create an AllocatorPool class.
    • chore(btree): clean NewTree API (#225)
    • fix(MmapFile): Don't error out if fileSize > sz (#224)
    • feat(btree): allow option to reset btree and mmaping it to specified file. (#223)
    • Use mremap on Linux instead of munmap+mmap (#221)
    • Reuse pages in B+ tree (#220)
    • fix(allocator): make nil allocator return go byte slice (#217)
    • fix(buffer): Make padding internal to z.buffer (#216)
    • chore(buffer): add a parent directory field in z.Buffer (#215)
    • Make Allocator concurrent
    • Fix infinite loop in allocator (#214)
    • Add trim func
    • Use allocator pool. Turn off freelist.
    • Add freelists to Allocator to reuse.
    • make DeleteBelow delete values that are less than lo (#211)
    • Avoid an unnecessary Load procedure in IncrementOffset.
    • Add Stats method in Btree.
    • chore(script): fix local test script (#210)
    • fix(btree): Increase buffer size if needed. (#209)
    • chore(btree): add occupancy ratio, search benchmark and compact bug fix (#208)
    • Add licenses, remove prints, and fix a bug in compact
    • Add IncrementOffset API for z.buffers (#206)
    • Show count when printing histogram (#201)
    • Zbuffer: Add LenNoPadding and make padding 8 bytes (#204)
    • Allocate Go memory in case allocator is nil.
    • Add leak detection via leak build flag and fix a leak during cache.Close.
    • Add some APIs for allocator and buffer
    • Sync before truncation or close.
    • Handle nil MmapFile for Sync.
    • Public methods must not panic after Close() (#202)
    • Check for RD_ONLY correctly.
    • Modify MmapFile APIs
    • Add a bunch of APIs around MmapFile
    • Move APIs for mmapfile creation over to z package.
    • Add ZeroOut func
    • Add SliceOffsets
    • z: Add TotalSize method on bloom filter (#197)
    • Add Msync func
    • Buffer: Use 256 GB mmap size instead of MaxInt64 (#198)
    • Add a simple test to check next2Pow
    • Improve memory performance (#195)
    • Have a way to automatically mmap a growing buffer (#196)
    • Introduce Mmapped buffers and Merge Sort (#194)
    • Add a way to access an allocator via reference.
    • Use jemalloc.a to ensure compilation with the Go binary
    • Fix up a build issue with ReadMemStats
    • Add ReadMemStats function (#193)
    • Allocator helps allocate memory to be used by unsafe structs (#192)
    • Improve histogram output
    • Move Closer from y to z (#191)
    • Add histogram.Mean() method (#188)
    • Introduce Calloc: Manual Memory Management via jemalloc (#186)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Jul 6, 2020)

    Changed

    Added

    Fixed

    • z: use MemHashString and xxhash.Sum64String ([#153][])
    • Check conflict key before updating expiration map. ([#154][])
    • Fix race condition in Cache.Clear ([#133][])
    • Improve handling of updated items ([#168][])
    • Fix droppedSets count while updating the item ([#171][])
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Feb 24, 2020)

    Added

    • Sets with TTL. (#122)

    Fixed

    • Fix the way metrics are handled for deletions. (#111)
    • Support nil *Cache values in Clear and Close. (#119)
    • Delete item immediately. (#113)
    • Remove key from policy after TTL eviction. (#130)
    Source code(tar.gz)
    Source code(zip)
Owner
Dgraph
The Only Native GraphQL Database With A Graph Backend.
Dgraph
Ristretto - A high performance memory-bound Go cache

Ristretto Ristretto is a fast, concurrent cache library built with a focus on pe

Outcaste, Inc. 63 Jul 29, 2022
cyhone 112 Aug 2, 2022
Cache library for golang. It supports expirable Cache, LFU, LRU and ARC.

GCache Cache library for golang. It supports expirable Cache, LFU, LRU and ARC. Features Supports expirable Cache, LFU, LRU and ARC. Goroutine safe. S

Jun Kimura 2k Aug 4, 2022
Package cache is a middleware that provides the cache management for Flamego.

cache Package cache is a middleware that provides the cache management for Flamego. Installation The minimum requirement of Go is 1.16. go get github.

Flamego 11 Jul 14, 2022
A mem cache base on other populator cache, add following feacture

memcache a mem cache base on other populator cache, add following feacture add lazy load(using expired data, and load it asynchronous) add singlefligh

zhq 1 Oct 28, 2021
Cache - A simple cache implementation

Cache A simple cache implementation LRU Cache An in memory cache implementation

Stanislav Petrashov 1 Jan 25, 2022
Gin-cache - Gin cache middleware with golang

Gin-cache - Gin cache middleware with golang

Anson 38 Mar 16, 2022
fastcache - fast thread-safe inmemory cache for big number of entries in Go

Fast thread-safe inmemory cache for big number of entries in Go. Minimizes GC overhead

VictoriaMetrics 1.5k Aug 3, 2022
Fast key-value cache written on pure golang

GoCache Simple in-memory key-value cache with default or specific expiration time. Install go get github.com/DylanMrr/GoCache Features Key-value stor

Dmitriy Eginov 1 Nov 16, 2021
groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

groupcache Summary groupcache is a distributed caching and cache-filling library, intended as a replacement for a pool of memcached nodes in many case

Go 11.6k Aug 8, 2022
☔️ A complete Go cache library that brings you multiple ways of managing your caches

Gocache Guess what is Gocache? a Go cache library. This is an extendable cache library that brings you a lot of features for caching data. Overview He

Vincent Composieux 1.4k Aug 4, 2022
An in-memory cache library for golang. It supports multiple eviction policies: LRU, LFU, ARC

GCache Cache library for golang. It supports expirable Cache, LFU, LRU and ARC. Features Supports expirable Cache, LFU, LRU and ARC. Goroutine safe. S

Jun Kimura 318 May 31, 2021
An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.

go-cache go-cache is an in-memory key:value store/cache similar to memcached that is suitable for applications running on a single machine. Its major

Patrick Mylund Nielsen 6.4k Aug 9, 2022
gdcache is a pure non-intrusive distributed cache library implemented by golang

gdcache is a pure non-intrusive distributed cache library implemented by golang, you can use it to implement your own distributed cache

Jovan 9 Mar 24, 2022
A REST-API service that works as an in memory key-value store with go-minimal-cache library.

A REST-API service that works as an in memory key-value store with go-minimal-cache library.

HarunBuyuktepe 1 Oct 25, 2021
A multi-level cache library with stampede prevention for Go

HybridCache A multi-level cache library with cache stampede prevention for Go import "github.com/cshum/hybridcache" // Redis cache adapter based on R

Adrian Shum 116 May 17, 2022
An in-memory key:value store/cache library written in Go 1.18 generics

go-generics-cache go-generics-cache is an in-memory key:value store/cache that is suitable for applications running on a single machine. This in-memor

Kei Kamikawa 202 Aug 6, 2022
Gocodecache - An in-memory cache library for code value master in Golang

gocodecache An in-memory cache library for code master in Golang. Installation g

Takuo Oki 3 Jun 23, 2022
A zero-dependency cache library for storing data in memory with generics.

Memory Cache A zero-dependency cache library for storing data in memory with generics. Requirements Golang 1.18+ Installation go get -u github.com/rod

Rodrigo Brito 11 May 26, 2022