cuckoo-filter go implement. config by you 布谷鸟过滤器的Go实现,可以定制化过滤器参数

Overview

cuckoo-filter

cuckoo-filter go implement. Config by you

transplant from efficient/cuckoofilter

中文文档

Overview

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use:

"Cuckoo Filter: Practically Better Than Bloom" in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose.

  1. Bucket size(b): Number of fingerprints per bucket
  2. Fingerprints size(f): Fingerprints bits size of hashtag

In other implementation:

In this implementation, you can adjust b and f to any value you want, and the Semi-sorting Buckets mentioned in paper is also avaliable, which can save 1 bit per item.

Why custom is important?

According to paper

  • Different bucket size result in different filter loadfactor, which means occupancy rate of filter
  • Different bucket size is suitable for different target false positive rate
  • To keep a false positive rate, bigger bucket size, bigger fingerprint size

Given a target false positive rate of r

when r > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 < r ≤ 0.002, four entries per bucket minimizes space.

with a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

as the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8

To know more about parameter choosing, refer to paper's section 5

Note: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits

Example usage:

package main

import (
	"fmt"
	"github.com/linvon/cuckoo-filter"
)

func main() {
	cf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked)
	fmt.Println(cf.Info())
	fmt.Println(cf.FalsePositiveRate())

	a := []byte("A")
	cf.Add(a)
	fmt.Println(cf.Contain(a))
	fmt.Println(cf.Size())

	b := cf.Encode()
	ncf, _ := cuckoo.Decode(b)
	fmt.Println(ncf.Contain(a))

	cf.Delete(a)
	fmt.Println(cf.Size())
}
You might also like...
Golang code-generators used to implement Kubernetes-style API types.

code-generator Golang code-generators used to implement Kubernetes-style API types. Purpose These code-generators can be used in the context of Custom

EGo lets you build, debug und run Go apps on Intel SGX - as simple as conventional Go programming!

EGo lets you build, debug und run Go apps on Intel SGX - as simple as conventional Go programming!

Procswap is a simple application that allows you to prioritize processes on a Windows machine.
Procswap is a simple application that allows you to prioritize processes on a Windows machine.

Procswap is a simple application that allows you to prioritize processes on a Windows machine.

go-i18n is a Go package and a command that helps you translate Go programs into multiple languages.

go-i18n is a Go package and a command that helps you translate Go programs into multiple languages.

this is an api that execute your deno code and send you the output

this a simple api that execute your deno code and send you the output, has not limit per request example request: in deno: const rawResponse = await f

Clean-Swift source and test code auto-generator. It can save you time typing 500-600 lines of code.
Clean-Swift source and test code auto-generator. It can save you time typing 500-600 lines of code.

Clean-Swift source & test code auto generator Overview Run Output Basic Usage make config.yaml target_project_name: Miro // target project name copyri

this allows you to get the real link of bit.ly
this allows you to get the real link of bit.ly

check the real url from a url shortener (bit.ly) Also you can use it as an API example with deno const rawResponse = await fetch("https://anti-url-s

With Pasteback you can store common terms and paste them back to clipboard

Pasteback With Pasteback you can store common terms and paste them back to clipboard when needed. There is one central spot to put every string into y

JIN the coolest fighting game ever made that uses the M.U.G.E.N engine so heres how you can build it

JIN the coolest fighting game ever made that uses the M.U.G.E.N engine so heres how you can build it

Comments
  • Introduce DecodeFrom and EncodeReader

    Introduce DecodeFrom and EncodeReader

    The buckets byte slice is the biggest part of the the memory used by the filter, and might be several of GBs.

    Common usage of a filter is in an environment with limited RAM size based on the filter size, load it to memory on startup and dump it to disk on teardown. Currently the Encode and Decode methods duplicates the byte slice, which makes the memory usage at the loading and dumping time to be (at least) twice the filter size.

    This commit introduces a new method for dumping the filter using a reader of the internal byte slice, and a method for loading the filter based on already fetched encoded bytes (from disk, network) and use them internaly instead of making a copy.

    • formatting.
    opened by EladGabay 10
  • 是否有选择配置项的金手指

    是否有选择配置项的金手指

    我通过穷举找出了200w数据性能最优的配置 Size: 200w, TableType: 1, TagsPreBucket: 2, BisPreIterm: 11, Space: 11mb x轴坐标是没一点代表1w个key,y轴坐标代表时间消耗/纳秒 image

    咨询下是否可以根据数据量得出配置,或者配置的范围公式,也有可能是我文档没有看透,希望作者指导一下

    opened by GuoxinL 3
  • readme: Make it clear that b=4 in TableTypePacked

    readme: Make it clear that b=4 in TableTypePacked

    When creating a filter it's not clear that if the type isTableTypePacked the provided tagsPerBucket is ignored and always b=4. Therefore clarify it in the README.

    Should consider making it more clear in the code (e.g. 2 different new filter functions per type with no option to adjust b for "packed" type)

    opened by EladGabay 0
Releases(v0.4.0)
Owner
Linvon
When you get older, your wild heart will live for younger days.
Linvon
How much you spend for glovo. Make config file and launch yourself

how_much_you_spend How much you spend for glovo. Make config file and launch yourself, you are welcome! Put config file in the same folder as executab

null 0 Nov 9, 2021
A tool to filter URLs by parameter count or size

GoFilter A tool to filter URLs by parameter count or size. This tool requires unique sorted URL list. For example: cat hosts.txt | sort -u > sorted &&

Ayberk ESER 7 Sep 10, 2021
Goterators - A util library that Supports aggregate & transforms functions Go. Such as filter, map, reduce, find, exist

Goterators Goterators is util library that Supports aggregate & transforms functions Go, including: for-each find exist reduce filter map API and func

Thuc Le 97 Nov 28, 2022
Slice - provides generic Map, Reduce and Filter functions for Go.

slice slice is a simple Go package to provide generic versions of Map, Reduce and Filter on slices. I mainly wrote it as an exercise to get more famil

Andreas Krennmair 23 Sep 19, 2022
Utility to add network config file in apk

Utility to add network config file in apk. Which bypass the proxy intercept restriction for user installed burpsuit CA certificate.

dwij patel 4 Aug 19, 2022
Envoy utility to process envoy config for fast development and debugging.

envoyconf-tools Envoy is a proxy, really awesome and we are devs who often use it, face errors and struggle to debug it, when envoy config's source is

VOrishirne 3 Oct 31, 2021
Use Golang to implement PHP's common built-in functions.

PHP2Go Use Golang to implement PHP's common built-in functions. About 140+ functions have been implemented. Install go get github.com/syyongx/php2go R

syyong.x 1.5k Nov 25, 2022
golang implement gossip algorithm

golang implement gossip algorithm

Gabriella Munger 4 Dec 1, 2022
Go implement D*Lite with SDL2

dstarlite-gosdl Go implement D*Lite with SDL2 使用Go SDL2库实现的D*Lite算法 注意,尽可能使用 go mod ,自动下载依赖库(确保已经安装了SDL2库) 推荐使用Linux或者macOS运行 git clone https://github

joxrays 2 Apr 2, 2022
An implement of CNI depending on neutron API

An implement of CNI depending on neutron API

null 2 Dec 27, 2021