Adaptive Radix Trees implemented in Go

Overview

An Adaptive Radix Tree Implementation in Go

Build Status Coverage Status Go Report Card GoDoc

This library provides a Go implementation of the Adaptive Radix Tree (ART).

Features:

  • Lookup performance surpasses highly tuned alternatives
  • Support for highly efficient insertions and deletions
  • Space efficient
  • Performance is comparable to hash tables
  • Maintains the data in sorted order, which enables additional operations like range scan and prefix lookup
  • O(k) search/insert/delete operations, where k is the length of the key
  • Minimum / Maximum value lookups
  • Ordered iteration
  • Prefix-based iteration
  • Support for keys with null bytes, any byte array could be a key

Usage

package main

import (
    "fmt"
    "github.com/plar/go-adaptive-radix-tree"
)

func main() {

    tree := art.New()

    tree.Insert(art.Key("Hi, I'm Key"), "Nice to meet you, I'm Value")
    value, found := tree.Search(art.Key("Hi, I'm Key"))
    if found {
        fmt.Printf("Search value=%v\n", value)
    }

    tree.ForEach(func(node art.Node) bool {
        fmt.Printf("Callback value=%v\n", node.Value())
        return true
    })

    for it := tree.Iterator(); it.HasNext(); {
        value, _ := it.Next()
        fmt.Printf("Iterator value=%v\n", value.Value())
    }
}

// Output:
// Search value=Nice to meet you, I'm Value
// Callback value=Nice to meet you, I'm Value
// Iterator value=Nice to meet you, I'm Value

Documentation

Check out the documentation on godoc.org

Performance

plar/go-adaptive-radix-tree outperforms kellydunn/go-art by avoiding memory allocations during search operations. It also provides prefix based iteration over the tree.

Benchmarks were performed on datasets extracted from different projects:

  • The "Words" dataset contains a list of 235,886 english words. [2]
  • The "UUIDs" dataset contains 100,000 uuids. [2]
  • The "HSK Words" dataset contains 4,995 words. [4]
go-adaptive-radix-tree # Average time Bytes per operation Allocs per operation
Tree Insert Words 9 117,888,698 ns/op 37,942,744 B/op 1,214,541 allocs/op
Tree Search Words 26 44,555,608 ns/op 0 B/op 0 allocs/op
Tree Insert UUIDs 18 59,360,135 ns/op 18,375,723 B/op 485,057 allocs/op
Tree Search UUIDs 54 21,265,931 ns/op 0 B/op 0 allocs/op
go-art
Tree Insert Words 5 272,047,975 ns/op 81,628,987 B/op 2,547,316 allocs/op
Tree Search Words 10 129,011,177 ns/op 13,272,278 B/op 1,659,033 allocs/op
Tree Insert UUIDs 10 140,309,246 ns/op 33,678,160 B/op 874,561 allocs/op
Tree Search UUIDs 20 82,120,943 ns/op 3,883,131 B/op 485,391 allocs/op

To see more benchmarks just run

$ make benchmark

References

[1] The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases (Specification)

[2] C99 implementation of the Adaptive Radix Tree

[3] Another Adaptive Radix Tree implementation in Go

[4] HSK Words. HSK(Hanyu Shuiping Kaoshi) - Standardized test of Standard Mandarin Chinese proficiency.

Issues
  • Range Scan?

    Range Scan?

    is there support for doing range scans?

    For example:

    start := []byte("foo_10")
    end := []byte("foo_20")
    trie.ForEachRange(start, end, func(node art.Node) bool {
        // do something with node
        return true
    })
    

    I think only prefix iteration is support currently? Would it be hard to add range iteration?

    opened by prologic 9
  • Optimization: proposal for reducing memory usage

    Optimization: proposal for reducing memory usage

    • as the base part of node, numChildren and prefixLen could be modified to uint8 to save 6 bytes each node except leaf;

    type node struct { numChildren int prefixLen int prefix prefix }

    • as the Kind type, int is also unnecessary, uint8 is enough too, this optimization will save 3 bytes for each art-node type Kind int
    opened by CodingCrush 7
  • Support for deletion of node inside of ForEach

    Support for deletion of node inside of ForEach

    Hello,

    I've been working with @prologic to try to bring bitcask up to version 1.0.

    One of the things we've been struggling with is allowing a user to iterate over all keys in the tree and allow deletion by value, rather than by key (i.e. delete a key from within the ForEach callback function). The current implementation causes the ForEach to skip a node that would have been iterated over at some point after the deleted node (sometimes the next node, sometimes the one after that, etc.)

    Would you consider letting us modify the iterator to pull the "next" value before the callback so that no nodes get skipped? Or would this be better done in a fork?

    Sorry if my ask here is not clear, let me know if you need any further clarification.

    opened by taigrr 6
  • possible to limit this to

    possible to limit this to

    1. possible to make this into an lru cache with fixed size? so that it can automatically remove items based on fifo

    2. how does this compare with patricia trie? https://github.com/tchap/go-patricia any benchmarks on speed and mem storage size used against patricia?

    great work by the way

    opened by hiqsociety 6
  • Feature: Storage persistence

    Feature: Storage persistence

    Would it be possible to use something like flatbuffers or cap'n'proto as backing for the tree which would allow it to be persisted into disk, hence it would be loadable from disk to memory without the need to rebuild it?

    opened by ghost 6
  • Keys don't support null bytes

    Keys don't support null bytes

    Great job on building this library!

    I was looking at the example in dump_tree.go and puzzled by the use of null byte "keys" for leaf nodes (since there is no extra byte in their key to pivot on). I'm not sure how that can result in a correct outcome.

    I may still be missing something but this example shows why I think it's wrong unless it's a conscious decision not to allow null bytes in keys which isn't documented.

    package main
    
    import (
    	"fmt"
    
    	art "github.com/plar/go-adaptive-radix-tree"
    )
    
    func main() {
    	tree := art.New()
    	terms := []string{"A", "a", "aa", "aa\x00"}
    	for _, term := range terms {
    		tree.Insert(art.Key(term), term)
    	}
    	//fmt.Println(tree)
    
    	// Should find "aa"
    	fmt.Println(tree.Search(art.Key("aa")))
    	// Should find "aa\x00"
    	fmt.Println(tree.Search(art.Key("aa\x00")))
    
    	// Expected Output (note the null byte doesn't print):
    	// aa true
    	// aa true
    	//
    	// Actual Output:
    	// aa true
    	// <nil> false
    
    	// Dump output shows both the leaf "aa" stored with "key" of 0 as well as the
    	// child with key 0:
    	// ...
    	//    │   ├── Node4 (0xc00000e2c0)
    	//    │   │   prefix(0): [0 0 0 0 0 0 0 0 0 0] [··········]
    	//    │   │   keys: [0 0 97 0] [··a·]
    	//    │   │   children(3): [0xc00000e280 0xc00000e2b0 0xc00000e2e0 <nil>]
    	//    │   │   ├── Leaf (0xc00000e280)
    	//    │   │   │   key: [97 97] [aa]
    	//    │   │   │   val: aa
    	//    │   │   │
    	//    │   │   ├── Leaf (0xc00000e2b0)
    	//    │   │   │   key: [97 97 0] [aa·]
    	//    │   │   │   val: aa
    	// ...
    }
    

    Perhaps I'm missing something about the design that makes this expected behaviour? If leaves are going to be stored in this way (with an implicit null "key" indicating end of key/leaf). Then it seems to rule out null bytes in keys entirely? That seems like a limitation that's at least worth documenting.

    It seems to stem from charAt returning 0 if the index is out of range of the key: https://github.com/plar/go-adaptive-radix-tree/blob/ae4b22ebc44e591df19fbc197c60b1405d52b526/node.go#L68-L70

    I wonder if this is a relic of the fact this code is based on libart - in C strings are generally null-terminated. I've not tested but it would seem that libart has a bug here too: https://github.com/armon/libart/blob/73c46356e6752cc8b0b774a5d0f4d2a6832a5ed3/src/art.c#L616

    In the case where the key being inserted is a prefix of the existing internal node, this will attempt to create a new leaf and insert it at the key[offset+prefix_diff] index in the internal node. In the case described that it's reading off the end of the array since the leaf key is a prefix of the current node which means either a random char is used from some other bit of memory, a segfault occurs, or (possibly why it's not been noticed) if the string was allocated with a null terminator then the off-by-one read just always returns a null byte which is equivalent behaviour to this Go implementation.

    I don't know how actively this library is maintained and I don't intend to use it directly but was referring to the implementation as I'm writing an immutable ART in Go and was confused by this.

    If the answer is "don't use null byte" that's cool, just wanted to make sure I understood the intention!

    Thanks!

    bug 
    opened by banks 4
  • what does the benchmark really mean?

    what does the benchmark really mean?

    1. insert words are all words? how many total?
    2. search words (26) what does that mean? just search for 1 word or many many words?

    | Tree Insert Words | 9 | 117,888,698 ns/op | 37,942,744 B/op | 1,214,541 allocs/op | | Tree Search Words | 26 | 44,555,608 ns/op | 0 B/op | 0 allocs/op | | Tree Insert UUIDs | 18 | 59,360,135 ns/op | 18,375,723 B/op | 485,057 allocs/op | | Tree Search UUIDs | 54 | 21,265,931 ns/op | 0 B/op | 0 allocs/op |

    opened by hiqsociety 1
  • Wildcard support

    Wildcard support

    ART trees are useful for using with HTTP Routers, but this implementation lacks wildcards.

    http.Router.Register("foo/{bar}/quz", "custom value")
    

    That above should match a path like foo/good-day/quz. I could have made a fork which implements wildcards adding a node exclusively for the wildcards, and adding an member to identify which branch is a wildcard to fallback if none matches, but there is too much verbose code I don't know what the fuck what does what and why.

    opened by DarkGhostHunter 1
  • Possible mistake in

    Possible mistake in "type node48 struct "

    In node.go, will be keys 48 bytes long?

    type node48 struct {
    	node
    	children [node48Max]*artNode
    	// keys     [node256Max]byte
    	keys    [node48Max]byte
    	present [4]uint64 // need 256 bits for keys
    }
    
    invalid 
    opened by candrade2020 1
  • SIMD support?

    SIMD support?

    Does this library use SIMD instructions as per the original ART paper? It seems Go does in fact support SIMD instructions; See for example this package

    opened by prologic 1
  • [Question]: Interpreting the benchmark results

    [Question]: Interpreting the benchmark results

    Quick question; How do you interpret the benchmark results? It's not your typical go test -bench output and the numbers look (at first glance) absurd :)

    opened by prologic 1
Releases(v1.0.4)
Owner
Pavel Larkin
Pavel Larkin
A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

radixs A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees. This implemen

Bruno Moura 0 Feb 14, 2022
A generic Go library for implementations of tries (radix trees), state commitments and proofs of inclusion

trie.go Go library for implementations of tries (radix trees), state commitments and proof of inclusion for large data sets. It implements a generic 2

IOTA 4 Apr 27, 2022
A radix sorting library for Go (golang)

zermelo A radix sorting library for Go. Trade memory for speed! import "github.com/shawnsmithdev/zermelo" func foo(large []uint64) zermelo.Sort(l

Shawn Smith 48 Jul 30, 2022
An immutable radix tree implementation in Golang

go-immutable-radix Provides the iradix package that implements an immutable radix tree. The package only provides a single Tree implementation, optimi

HashiCorp 825 Jul 29, 2022
A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Petar Maymounkov 727 Jul 24, 2022
A go implementation of Verkle trees

go-verkle A very experimental implementation of Verkle trees. When production-ready, the code is to be split between go-kzg and go-ethereum. Notes Nod

Guillaume Ballet 86 Jul 31, 2022
Generic types that are missing from Go, including sets, trees, sorted lists, etc.

go-typ Generic types that are missing from Go, including sets, trees, sorted lists, etc. All code is implemented with 0 dependencies and in pure Go co

null 14 Jul 26, 2022
High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Darren Elwood 127 Jul 18, 2022
fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

fim fim is a collection of some popular frequent itemset mining algorithms implemented in Go. fim contains the implementations of the following algori

Paul Fedorow 7 Jul 14, 2022
Data structure,Algorithms implemented in Go (for education)

Data structure,Algorithms implemented in Go (for education) List of Content : 1. Math - 2. String - 3. Conversions - 4. Sort - 5. Search - 6. Data str

SkShahriarAhmedRaka 4 Jun 10, 2022
radix: a go radix tree with nearest matching

radix implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Ahmed W. 6 Feb 6, 2022
Provides the radix package that implements a radix tree.

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

null 0 Oct 26, 2021
Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Armon Dadgar 740 Aug 5, 2022
Golang in-memory database built on immutable radix trees

go-memdb Provides the memdb package that implements a simple in-memory database built on immutable radix trees. The database provides Atomicity, Consi

HashiCorp 2.5k Aug 6, 2022
A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

radixs A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees. This implemen

Bruno Moura 0 Feb 14, 2022
A generic Go library for implementations of tries (radix trees), state commitments and proofs of inclusion

trie.go Go library for implementations of tries (radix trees), state commitments and proof of inclusion for large data sets. It implements a generic 2

IOTA 4 Apr 27, 2022
A fast string sorting algorithm (MSD radix sort)

Your basic radix sort A fast string sorting algorithm This is an optimized sorting algorithm equivalent to sort.Strings in the Go standard library. Fo

Algorithms to Go 177 Jul 27, 2022
Pure is a fast radix-tree based HTTP router

package pure Pure is a fast radix-tree based HTTP router that sticks to the native implementations of Go's "net/http" package; in essence, keeping the

Go Playgound 128 Aug 8, 2022
A radix sorting library for Go (golang)

zermelo A radix sorting library for Go. Trade memory for speed! import "github.com/shawnsmithdev/zermelo" func foo(large []uint64) zermelo.Sort(l

Shawn Smith 48 Jul 30, 2022
An immutable radix tree implementation in Golang

go-immutable-radix Provides the iradix package that implements an immutable radix tree. The package only provides a single Tree implementation, optimi

HashiCorp 825 Jul 29, 2022
✨ #PTerm is a modern go module to beautify console output. Featuring charts, progressbars, tables, trees, and many more 🚀 It's completely configurable and 100% cross-platform compatible.

?? PTerm | Pretty Terminal Printer A golang module to print pretty text Show Demo Code PTerm.sh | Installation | Documentation | Quick Start | Example

null 2.9k Jul 29, 2022
A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Petar Maymounkov 727 Jul 24, 2022
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Ryan Bressler 713 Jul 21, 2022
depth is tool to retrieve and visualize Go source code dependency trees.

depth is tool to retrieve and visualize Go source code dependency trees. Install Download the appropriate binary for your platform from the Rele

Kyle Banks 767 Jul 30, 2022
A go implementation of Verkle trees

go-verkle A very experimental implementation of Verkle trees. When production-ready, the code is to be split between go-kzg and go-ethereum. Notes Nod

Guillaume Ballet 86 Jul 31, 2022
Search for Go code using syntax trees

gogrep GO111MODULE=on go get mvdan.cc/gogrep Search for Go code using syntax trees. Work in progress. gogrep -x 'if $x != nil { return $x, $*_ }' In

Daniel Martí 471 Jul 27, 2022
📸 Clean your image folder using perceptual hashing and BK-trees using Go!

Image Cleaner ?? ?? ➡ ?? This tool can take your image gallery and create a new folder with image-alike-cluster folders. It uses a perceptual image ha

lord_santanna 7 Jun 15, 2022