A Go implementation of an in-memory bloom filter, with support for boltdb and badgerdb as optional data persistent storage.

Overview

Sprout

A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set. Bloom filters are fast and space efficient. They allow for false positives, but mitigate the probability with an expected false positive rate. An error rate of 0.001 implies that the probability of a false positive is 1 in 1000. Bloom filters don't store the elements themselves, but instead use a set of hash functions to determine the presence of an element.

To fulfil the false positive rate, bloom filters can be initialized with a capacity. The capacity is the number of elements that can be inserted into the bloom filter, and this cannot be changed.

Sprout implements a bloom filter in Go. The bits of the filter are stored in a memory-mapped file. Sprout also allows attaching a persistent storage (boltdb and badgerdb) to store the key value pairs.

Sprout also implement a scalable bloom filter described in a paper written by P. Almeida, C.Baquero, N. Preguiça, D. Hutchison.

A scalable bloom filter allows you to grow the filter beyond the initial filter capacity, while preserving the desired false positive rate.

Storage Size

Bloom filters are space efficient, as they only store the bits returned from the hash function. For a filter with a capacity of 2,000,000 and a error rate of 0.001, the storage size is approximately 3.4MB. That implies that there are approximately 1.8 bytes (~14 bits) per element. The number of bits per element is as a result of the number of hash functions, which is derived from the capacity and the error rate.

Scalable Bloom Filters: A scalable bloom filter initialized with a capacity of 200,000 and an error rate of 0.001, when grown to a capacity of 2,000,000, the total storage size is approximately 4.3MB.

Comparison to Key-Value stores

Adding 2 million elements (with a single byte value)

Database Size
BoltDB 108MB
BadgerDB 128MB
Sprout 3.4MB
Sprout (Sbf) 4.3MB

Installation

go get github.com/dsa0x/sprout

Usage

Sprout contains implementation of both the normal and scalable bloom filter via the methods below:

opts := &sprout.BloomOptions{
		Err_rate: 0.001,
		Capacity: 100000,
	}
// Normal Bloom Filter
bf := sprout.NewBloom(opts)

// Scalable Bloom Filter
sbf := sprout.NewScalableBloom(opts)

With a persistent store

Sprout supports boltdb and badgerdb as persistent storage. Using them is very simple. Sprout exposes methods that initializes the database and then they can be attached to the bloom filter.

Using Boltdb

// initialize boltdb
db := sprout.NewBolt("/tmp/test.db", 0600)

// the bolt store can be configured as defined in the boltdb documentations
opts := bbolt.Options{
		Timeout: 10 * time.Second,
	}
db = sprout.NewBolt("/tmp/test.db", 0600, opts)
defer db.Close()

opts := &sprout.BloomOptions{
		Err_rate: 0.01,
		Path:     "bloom.db",
		Capacity: 100,
	}
bf := sprout.NewBloom(opts)

Example

package main

import (
	"fmt"

	"github.com/dgraph-io/badger/v3"
	"github.com/dsa0x/sprout"
)

func main() {

	opts := &sprout.BloomOptions{
		Err_rate: 0.001,
		Path:     "bloom.db",
		Capacity: 100000,
	}
	bf := sprout.NewBloom(opts)
	defer bf.Close()
	bf.Add([]byte("foo"))
	fmt.Println(bf.Contains([]byte("foo")))

	// with a persistent store
	badgerOpts := badger.DefaultOptions("/tmp/store.db")
	db := sprout.NewBadger(badgerOpts)
	opts.Database = db
	bf := sprout.NewScalableBloom(opts)

	bf.Put([]byte("key"), []byte("bar"))
	fmt.Printf("%s\n", bf.Get([]byte("key")))
}

References

  1. P. Almeida, C.Baquero, N. Preguiça, D. Hutchison
  2. Austin Appleby Murmur hash Source Code
Releases(v0.0.3)
Owner
Samsondeen
Samsondeen
A simple Bloom Filter implementation in Go

This is a simple Bloom filter implementation written in Go. For the theory behind Bloom filters, read http://en.wikipedia.org/wiki/Bloom_filter This

Damian Gryski 16 Apr 26, 2018
A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set

A Go implementation of a bloom filter, with support for boltdb and badgerdb as optional in-memory persistent storage.

Samsondeen 26 Jul 4, 2022
Cuckoo Filter: Practically Better Than Bloom

Cuckoo Filter Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient

Seif Lotfy 917 Aug 1, 2022
go/golang: fast bit set Bloom filter

package implements a fast bloom filter with real 'bitset' and JSONMarshal/JSONUnmarshal to store/reload the Bloom filter.

Andreas Briese 121 Aug 2, 2022
Leaked password check library with bloom filter

Leaked password check With this library you can check the password is probably leaked or not. Pre generated bitset DB includes 6 Million leaked passwo

Kaan Karakaya 41 Dec 2, 2021
A versioned, snapshottable (immutable) AVL+ tree for persistent data.

IAVL+ Tree Note: Requires Go 1.13+ A versioned, snapshottable (immutable) AVL+ tree for persistent data. The purpose of this data structure is to prov

null 0 Nov 24, 2021
Go package implementing Bloom filters

Bloom filters A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item

Will Fitzgerald 1.6k Aug 6, 2022
Go package implementing Bloom filters

Bloom filters A Bloom filter is a concise/compressed representation of a set, where the main requirement is to make membership queries; i.e., whether

Go language BitSet and Bloom filter 1.6k Aug 1, 2022
Sync distributed sets using bloom filters

goSetReconciliation An implementation to sync distributed sets using bloom filters. Based on the paper "Low complexity set reconciliation using Bloom

Elton SV 25 Jan 4, 2022
A parser generator where rules defined as go structs and code generation is optional

A parser generator where rules defined as go structs and code generation is optional. The concepts are introduced in the simple example below.

Arnaud Delobelle 6 Jul 1, 2022
BTree provides a simple, ordered, in-memory data structure for Go programs.

BTree implementation for Go This package provides an in-memory B-Tree implementation for Go, useful as an ordered, mutable data structure. The API is

Google 2.9k Aug 10, 2022
Fast in-memory key:value store/cache with TTL

MCache library go-mcache - this is a fast key:value storage. Its major advantage is that, being essentially a thread-safe . map[string]interface{} wit

O.J 83 Jul 27, 2022
An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

Rene Kroon 469 Aug 7, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

SmartyStreets (Archives) 291 Jul 2, 2022
An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

Rene Kroon 470 Aug 10, 2022
dagger is a fast, concurrency safe, mutable, in-memory directed graph library with zero dependencies

dagger is a blazing fast, concurrency safe, mutable, in-memory directed graph implementation with zero dependencies

Coleman Word 261 Jul 23, 2022
My clean Go solution that's faster than 100%, takes up less memory than 100%.

Remove-element My very clean Go solution that's faster than 100% of all solutions on Leetcode. Leetcode Challenge: "Given an integer array nums and an

Daniel Pickens 0 Dec 24, 2021
A memory-efficient trie for testing the existence/prefixes of string only(for now).

Succinct Trie A memory-efficient trie for testing the existence/prefixes of string only(for now). Install go get -u github.com/nobekanai/sutrie Docume

野辺かない 2 Mar 10, 2022
ZSet is an in-memory Redis like sorted set datastructure

zset Getting Started Installing To start using hash, install Go and run go get: $ go get -u github.com/arriqaaq/zset This will retrieve the library. U

Farhan 5 Jul 19, 2022