BTree provides a simple, ordered, in-memory data structure for Go programs.

Overview
Issues
  • Incorrect deletion result from current B-Tree implementation

    Incorrect deletion result from current B-Tree implementation

    Description

    Deletion process produces unexpected result based on current implementation.

    Steps to reproduce the issue: 1 Prepare the following test case in btree_test.go

    package btree
    
    import (
        "flag"
        "fmt"
        "testing"
        "os"
    )
    
    var btreeDegree = flag.Int("degree", 3, "B-Tree degree")
    
    func TestForBTreeDeletion(t *testing.T) {
        tree := New(*btreeDegree)
        fmt.Println(tree)
    
        for i := 0; i < 21; i++ {
            if x := tree.ReplaceOrInsert(Int(i)); x != nil {
                t.Fatal("insert found item", i)
            }
        }
        fmt.Println("Before deletion")
        tree.Print(os.Stdout)
    
        for i := 0; i < 3; i++ {
            if x := tree.Delete(Int(i)); x == nil {
                t.Fatal("insert didn't find item", i)
            }
            fmt.Println("After deleting ", i)
            tree.Print(os.Stdout)
        }
    
    }
    

    2 Add a tree.Print method for debugging purpose in btree.go

    // Print is for debugging purpose on CLI
    func (t *BTree) Print(w io.Writer) {
        t.root.print(w, 0)
    }
    

    3 Run the test suite

    go test -v
    

    Describe the results you received:

    Before deletion  *[<===== This is the original tree with degree of 3 after inserting 0 to 20]*
    NODE:[8]
      NODE:[2 5]
        NODE:[0 1]
        NODE:[3 4]
        NODE:[6 7]
      NODE:[11 14 17]
        NODE:[9 10]
        NODE:[12 13]
        NODE:[15 16]
        NODE:[18 19 20]
    
    After deleting  0 *[<====== After deleting 0, `NODE[0 1]` underflows and causes redistribution ]*
    NODE:[11]
      NODE:[5 8]
        NODE:[1 2 3 4]     **[<====== This NODE has enough items so that deleting 1 shouldn't cause an underflow]**
        NODE:[6 7]
        NODE:[9 10]
      NODE:[14 17]
        NODE:[12 13]
        NODE:[15 16]
        NODE:[18 19 20]
    
    After deleting  1  [<===== WRONG: The underlying B-tree is redistributed]
    NODE:[5 8 11 14 17]
      NODE:[2 3 4]
      NODE:[6 7]
      NODE:[9 10]
      NODE:[12 13]
      NODE:[15 16]
      NODE:[18 19 20]
    
    After deleting  2
    NODE:[5 8 11 14 17]
      NODE:[3 4]
      NODE:[6 7]
      NODE:[9 10]
      NODE:[12 13]
      NODE:[15 16]
      NODE:[18 19 20]
    

    Describe the results you expected:

    See my comment inline in the above section.

    To double check the behavior, try to run a visualized b-tree insertion from this website. 1 Selecting max degree as 6 2 Manually insert 0 to 20 (inclusively). [Same underlying tree as above] 3 Manually delete 0 [Correct: Underflow causes redistribution] 4 Delete 1 [Correct: No redistribution. Whereas in the issue reported, the tree is redistributed]

    Additional information you deem important (e.g. issue happens only occasionally):

    Output of go version: go version go1.6.2 linux/amd64

    Additional environment details (AWS, VirtualBox, physical, etc.):

    opened by cookieisaac 9
  • backwards iteration

    backwards iteration

    Added DescendRange, DescendLessOrEqual, DescendGreaterThan, and Descend functions. These are modeled after the existing Ascend.. methods.

    Also modified the iteration function to remove the from() and to() in favor of adding a start and stop parameters instead. There is a modest performance ~10% boost by avoiding superfluous function calls.

    before:

    BenchmarkAscend-8          10000        216529 ns/op
    BenchmarkAscendRange-8      3000        433896 ns/op
    

    after:

    BenchmarkAscend-8          10000        196143 ns/op
    BenchmarkAscendRange-8      3000        396469 ns/op
    

    The benchmark code is not included in the source, but I did create a gist at iteration_test.go

    I hope you find this update useful and thanks a ton for such a fantastic library!

    opened by tidwall 6
  • Small patch to allow resetting the btree.

    Small patch to allow resetting the btree.

    I am serializing snapshots of the btree to be restored later. For saving I only need a read lock, for restoring I obtain a write lock on the tree. I was able to amortize all the costs of serialization and cheapen the restore process with a generous free list capacity. For larger restores the process of removing all the items from an existing tree is taxing. You have to visit N nodes to store in a temporary list (I cached this and reset[0:0] each restore), then iterate the temporary list to make N calls to delete.

    I tried some other possibility such as creating an entirely new tree and replace the existing one, but without first deleting all the items in the tree my FreeList is empty so the allocations kill me instead. I could probably work around this with rotating between two trees and deleting in the background, but this would be much simpler if accepted. It provides around 2.5-3x speedup for removing all items from a btree and lowers my latency below the downstream tick rate.

    Small note: I included "Inserts" in the benchmarks because I was running into that issue that pops up sometimes with the benchmark timers where it runs forever on the Reset() test otherwise and I nothing I tried fixed it.

    BenchmarkDeleteAll-24 100 11680090 ns/op 523832 B/op 709 allocs/op BenchmarkReset-24 300 6648212 ns/op 367997 B/op 723 allocs/op

    This patch provides around a 3 time speed up for my restore process

    opened by cstockton 5
  • Added GetAt(i), which returns the ith item.

    Added GetAt(i), which returns the ith item.

    Added a method that returns the ith item in the btree. In order to do this, each node has an idea of the number of items in the tree of which it is the root. I've updated all relevant insert/delete functions to maintain each node's length. The performance is unchanged.

    Looking forward to suggestions that can improve this feature. Cheers.

    opened by darfire 5
  •  Improve the iterate seek to O(logN)

    Improve the iterate seek to O(logN)

    Hi, In our scenario, we frequently use the iterate, and the Less compare is expenses when key is large. This PR mainly improve the iterate location. uses binary search to quickly locates the start item. before:

    BenchmarkSeek-4          3000000               452 ns/op
    BenchmarkSeek-4          3000000               513 ns/op
    BenchmarkSeek-4          3000000               454 ns/op
    BenchmarkSeek-4          3000000               450 ns/op
    

    after:

    BenchmarkSeek-4         10000000               225 ns/op
    BenchmarkSeek-4         10000000               222 ns/op
    BenchmarkSeek-4         10000000               223 ns/op
    BenchmarkSeek-4         10000000               224 ns/op
    
    opened by nolouch 4
  • set removed items to nil

    set removed items to nil

    This fixes an issue where the Go garbage collector does not release some items which have been removed from the BTree because a reference to the item still exists in the items and children slices.

    opened by tidwall 4
  • Expose FreeList

    Expose FreeList

    This change allows multiple btrees to share the same node freelist rather than each btree having its own node freelist.

    We are developing an application that uses 32 GB of memory. Because of the amount of memory that we use we want the memory footprint of our application to change very little. Therefore, we make special effort to pre-allocate what we need at application startup and use free lists to reuse memory rather than allocating memory as we go and relying on the garbage collector to free memory.

    I was delighted to discover that this btree package uses freelists to provide some relief for the GC. However, our application consists of two btrees. As our application runs, we are constantly moving thousands of btree items at a time between the two btrees. At the beginning, one of the btrees has all the items while the second btree has none. Sometime later, the second btree may have all the items while the first btree has none. Sometimes, the items may be split up evenly between the two btrees. The sum of the sizes of the 2 btrees always remains constant.

    If each btree has its own free list, we would still have to allocate large amounts of space off the heap for the second btree as it acquires items for the first time. Moreover, the current freelists are limited in size to 32 btree nodes which is too small for our application.

    By having our two btrees share the same freelist we put even less strain on the GC. As the first btree gives up items, it places nodes on the common freelist. The second btree can reuse the old nodes from the first btree rather than allocating new ones which means fewer heap allocations and less strain on the GC.

    We understand that a separate freelist for each btree suffices for most applications, but in our particular case, we benefit from having our two btrees share the same freelist which is what this pull request allows.

    Thank you in advance for considering this pull request.

    opened by keep94 4
  • Performance improvements to iteration

    Performance improvements to iteration

    Created a few more iterators to be used by the public interface for performance improvements. I did this in two commits, the second one changes the private interface signatures from funcs (technically same signature as ItemIterator) to Item, but since the public interface never actually exposes funcs for ascending I figured this would be okay.

    Before:

    ~/.../github.com/cstockton/gbtree$ go test -test.bench=.*
    1
    PASS
    BenchmarkInsert  3000000           411 ns/op
    BenchmarkDelete  3000000           424 ns/op
    BenchmarkGet     5000000           363 ns/op
    BenchmarkIterateAscend      3000        474049 ns/op
    BenchmarkIterateAscendLessThan      5000        274409 ns/op
    BenchmarkIterateAscendGreaterOrEqual        5000        277208 ns/op
    BenchmarkIterateAscendRange    10000        216873 ns/op
    ok      github.com/cstockton/gbtree 16.429s
    

    After:

    ~/.../github.com/cstockton/btree$ go test -test.bench=.*
    1
    PASS
    BenchmarkInsert  3000000           408 ns/op
    BenchmarkDelete  3000000           422 ns/op
    BenchmarkGet     5000000           360 ns/op
    BenchmarkIterateAscend      3000        431754 ns/op
    BenchmarkIterateAscendLessThan      5000        257723 ns/op
    BenchmarkIterateAscendGreaterOrEqual        5000        257560 ns/op
    BenchmarkIterateAscendRange    10000        205764 ns/op
    ok      github.com/cstockton/btree  15.984s
    
    opened by cstockton 4
  • Bounds issue in insertAt

    Bounds issue in insertAt

    https://github.com/google/btree/blob/be84af90a1f71c9eeac820a4cdacb863122396d6/btree.go#L204

    I'm curious if this bounds issue is important? I'm using this implementation as a learning opportunity for Go and ran into this while studying the code.

    The check whether the index is less than the length seems like it would prevent an unnecessary copy. But instead, this would result in a bounds error when attempting to set the item.

    opened by dynajoe 3
  • add mod file

    add mod file

    • go modules will be enabled in 1.13.
    • after merge it would be nice if you would tag this repo: e.g. v1.0.0
      • this will allow consumers something more consistent and readable than v0.0.0-20181229203832-0af3f3b09a0a.
    opened by arnottcr 3
  • panic: runtime error: index out of range using Ascend iterator

    panic: runtime error: index out of range using Ascend iterator

    I'm hitting a Index error using Ascend to walk the tree.

    panic: runtime error: index out of range
    
    goroutine 20 [running]:
    github.com/google/btree.(*node).iterate(0xc420115d00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x440000, 0xc42008ef00, 0x300000002)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:524 +0x7df
    github.com/google/btree.(*node).iterate(0xc4201780c0, 0x1, 0x0, 0x0, 0x0, 0x0, 0xc420030000, 0xc42008ef00, 0xc4200b00d8)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*node).iterate(0xc4201f4e00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0xc42008ed08)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*node).iterate(0xc420382900, 0x1, 0x0, 0x0, 0x0, 0x0, 0xc420080000, 0xc42008ef00, 0xc4200100a0)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*node).iterate(0xc4207c14c0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0xc420103200)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*node).iterate(0xc420d11e40, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0x0)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*node).iterate(0xc423456040, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42008ef00, 0x0)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:512 +0x519
    github.com/google/btree.(*BTree).Ascend(0xc420132020, 0xc42008ef00)
            /media/scratch/demo/go/src/github.com/google/btree/btree.go:777 +0x5b
    main.stage.func1(0xc420018330, 0xc4201280c0, 0x5a6520, 0xc4201180c0, 0xc420128060, 0xc420116100)
            /media/scratch/demo/demo.go:166 +0x28a
    created by main.stage
            /media/scratch/demo/demo.go:130 +0xd3
    

    My stage function is putting O(0.5M) object pointers into the BTree:

    Sketched implementation:

    const (
        BTREE_DEGREE = 7
    )
    
    type myTreeItem struct {
        Object *myObject
        Order int
    }
    
    func (a myTreeItem) Less(b btree.Item) bool {
        return a.Order < b.Order
    }
    
    processChannel := make(chan *myObject, 1000)
    bt := btree.New(BTREE_DEGREE)
    
    // Insert Objects
    order := 0
    for obj := range objects {
        ci := &myTreeItem{
            Object: obj,
            Order: order,
        }
        i := bt.ReplaceOrInsert(ci)
        order += 1
    }
    
    // Traverse
    bt.Ascend(func (i btree.Item) bool {
        processChannel <- i.(*myTreeItem).Object
        return true
    })
    

    Any ideas?

    opened by missionsix 3
  • should call Less() fewer times when iterating

    should call Less() fewer times when iterating

    when calling DescendRange methods, an iteration is performed.

    All items between [lessOrEqual, greaterThan ) are iterated.

    Suppose there're n items in the range, I though Less function is called log(n) times.

    However, Less is called more than n times as a matter of

    if stop != nil && !n.items[i].Less(stop) {
    				return hit, false
    			}
    

    Why make comparisons this way? I though if parent node can decide all its children are in the range, most Less(stop) call can be optimized out.

    Are there anything wrong with this idea?

    opened by caterchong 0
  • Generic approach

    Generic approach

    With support for generics, we can replace interfaces with generic types. The use of generics avoids variables escaping to heap when calling the B-tree functions yielding 20-30% improvement.

    Below are some comparisons between benchmarks (taken from original repository) done with benchstat:

    name                                 old time/op    new time/op    delta
    Insert-8                                121ns ± 1%      89ns ± 1%   -27.04%  (p=0.008 n=5+5)
    Seek-8                                  115ns ± 1%      78ns ± 0%   -31.56%  (p=0.008 n=5+5)
    DeleteInsert-8                          248ns ± 0%     185ns ± 1%   -25.51%  (p=0.008 n=5+5)
    DeleteInsertCloneEachTime-8            1.10µs ± 5%    0.61µs ± 1%   -44.45%  (p=0.008 n=5+5)
    Delete-8                                138ns ± 1%     101ns ± 1%   -26.62%  (p=0.008 n=5+5)
    Get-8                                   102ns ± 1%      71ns ± 0%   -30.46%  (p=0.008 n=5+5)
    Ascend-8                               40.2µs ± 1%    31.7µs ± 0%   -21.18%  (p=0.008 n=5+5)
    Descend-8                              39.3µs ± 1%    30.7µs ± 1%   -21.91%  (p=0.008 n=5+5)
    AscendRange-8                          72.3µs ± 1%    57.6µs ± 1%   -20.39%  (p=0.008 n=5+5)
    DescendRange-8                         92.9µs ± 1%    77.6µs ± 1%   -16.45%  (p=0.008 n=5+5)
    
    name                                 old alloc/op   new alloc/op   delta
    Insert-8                                35.6B ± 4%     18.4B ± 3%   -48.31%  (p=0.008 n=5+5)
    Seek-8                                  7.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
    DeleteInsert-8                          0.00B          0.00B           ~     (all equal)
    DeleteInsertCloneEachTime-8            2.98kB ± 5%    1.91kB ± 1%   -36.16%  (p=0.008 n=5+5)
    Delete-8                                0.00B          0.00B           ~     (all equal)
    Get-8                                   0.00B          0.00B           ~     (all equal)
    Ascend-8                                0.00B          0.00B           ~     (all equal)
    Descend-8                               0.00B          0.00B           ~     (all equal)
    AscendRange-8                           0.00B          0.00B           ~     (all equal)
    DescendRange-8                          0.00B          0.00B           ~     (all equal)
    
    name                                 old allocs/op  new allocs/op  delta
    Insert-8                                 0.00           0.00           ~     (all equal)
    Seek-8                                   0.00           0.00           ~     (all equal)
    DeleteInsert-8                           0.00           0.00           ~     (all equal)
    DeleteInsertCloneEachTime-8              11.0 ± 0%      11.0 ± 0%      ~     (all equal)
    Delete-8                                 0.00           0.00           ~     (all equal)
    Get-8                                    0.00           0.00           ~     (all equal)
    Ascend-8                                 0.00           0.00           ~     (all equal)
    Descend-8                                0.00           0.00           ~     (all equal)
    AscendRange-8                            0.00           0.00           ~     (all equal)
    DescendRange-8                           0.00           0.00           ~     (all equal)
    
    

    In case of plain Insert Benchmark, the results are even better:

    name      old time/op    new time/op    delta
    Insert-8     200ns ± 2%     137ns ± 1%   -31.20%  (p=0.008 n=5+5)
    
    name      old alloc/op   new alloc/op   delta
    Insert-8     60.0B ± 0%     27.0B ± 0%   -55.00%  (p=0.008 n=5+5)
    
    name      old allocs/op  new allocs/op  delta
    Insert-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
    
    func BenchmarkInsert(b *testing.B) {
        b.StopTimer()
        tr := btree.New(32)
        b.StartTimer()
        for i := 0; i < b.N; i++ {
            tr.ReplaceOrInsert(btree.Int(i))
        }
    }
    

    Unfortunately the functions that return nil do not work ex. func (t *BTree) Delete(item Item) Item. We can't simply return zero value for generic type, because it's still a valid value, so I decided to change the API, so that:

    func (t *BTree[T]) Delete(item T) (T, bool) { ... }
    

    we also return bool which indicates whether anything was really deleted. Of course, we can implement B-tree on pointers to generic types, but it harms performance in many cases (like having B-tree for numeric types).

    Changes of API destroy backward compatibility, but because of reasons mentioned above, I believe that this approach is justified.

    Here is link to forked repository with generic B-tree: https://github.com/Michal-Leszczynski/btree

    How do you suggest we further proceed with this effort.

    opened by Michal-Leszczynski 3
  • Added support for Linux on power

    Added support for Linux on power

    Hi, I had added ppc64le(Linux on Power) support on travis-ci in the branch and looks like its been successfully added. I believe it is ready for the final review and merge. The travis ci build logs can be verified from the link below.

    https://travis-ci.com/github/ujjwalsh/btree https://travis-ci.com/github/ujjwalsh/btree/builds/191584661 Please have a look.

    Regards, ujjwal

    opened by ujjwalsh 1
  • Question on iteration

    Question on iteration

    I'm using btree to provide an in-memory key-value store for use as a non-persistent substitute for other key-value databases in a cacheing application. The interface that I need to implement using this package requires that I provide an iterator which exposes Next() and Prev() methods.

    I have done this using the AscendGreaterOrEqual and DescendLessOrEqual methods as follows: I store a pointer to the current item, and use it as a reference to ascend or descend from, when Next() or Prev() are called, respectively.

    While, I believe that internally, iteration within the provided method calls is O(1), in my case, I believe that the lookup of the current item is always O(logN). Since there is not another obvious, O(1) way (to me at least) to restart iteration from a given Item, and move forward or backward one position in the manner I need, and which is common in most key-value store interfaces, I was wondering if the authors thought it might be useful to expose another way to control iteration in a step-by-step manner - O(1) - which keeps track of the current item.

    opened by jvsteiner 8
  • Marshaling btree on disk

    Marshaling btree on disk

    Hey there Google Team! Are there any plans to support the BinaryMarshaler interface? This would be super useful if someone want's to store B-Tree's to disk.

    opened by Fohlen 0
Releases(v1.36.0)
Owner
Google
Google ❤️ Open Source
Google
Recursively searches a map[string]interface{} structure for another map[string]interface{} structure

msirecurse Recursively searches a map[string]interface{} structure for existence of a map[string]interface{} structure Motivation I wrote this package

Fred Moyer 1 Mar 3, 2022
Go package implementing an indexable ordered multimap

PACKAGE package skiplist import "github.com/glenn-brown/skiplist" Package skiplist implements fast indexable ordered multimaps. This sk

Glenn Brown 24 May 14, 2021
A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Set Set is a simple Set data structure implementation in Go (Golang) using LinkedHashMap. This library allow you to get a set of int64 or string witho

Studio Sol Comunicação Digital Ltda 21 Apr 6, 2022
Bitset data structure

Your basic bit Set data structure for positive numbers A bit array, or bit set, is an efficient set data structure. It consists of an array that compa

Algorithms to Go 120 May 11, 2022
Probabilistic set data structure

Your basic Bloom filter Golang probabilistic set data structure A Bloom filter is a fast and space-efficient probabilistic data structure used to test

Algorithms to Go 71 May 11, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

stirlingx 651 May 14, 2022
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Trie Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. Usage Create a Trie with: t := trie.New() Add Keys with:

Derek Parker 584 May 10, 2022
Set data structure for Go

Archived project. No maintenance. This project is not maintained anymore and is archived.. Please create your own map[string]Type or use one of the ot

Fatih Arslan 645 May 6, 2022
Set data structure for Go

Archived project. No maintenance. This project is not maintained anymore and is archived.. Please create your own map[string]Type or use one of the ot

Fatih Arslan 645 May 6, 2022
Generates data structure definitions from JSON files for any kind of programming language

Overview Archivist generates data structure definitions from JSON files for any kind of programming language. It also provides a library for golang to

Kingsgroup 44 May 4, 2022
A data structure for storing points.

ptree This package provides an in-memory data structure for storing points. Under the hood it stores points in a tree structure where nodes are spatia

Josh 17 Apr 18, 2022
Data Structure Libraries and Algorithms implementation

Algorithms Data Structure Libraries and Algorithms implementation in C++ Disclaimer This repository is meant to be used as a reference to learn data s

Priyank Chheda 635 May 5, 2022
Trie data structure implementation in Golang 🌳

Gotri Gotri is an Unicode character based Trie/prefix tree implementation in Go, with the suggestion/auto-complete feature for character searching. Si

Monir Zaman 9 Mar 3, 2022
Disjoint Set data structure implementation in Go

dsu Implementation of the Disjoint-Set data structure. The Disjoint-Set, Also called a Union-Find or Merge-Find set, is a data structure that stores a

Iheb Haboubi 6 Jan 29, 2022
Data structure,Algorithms implemented in Go (for education)

Data structure,Algorithms implemented in Go (for education) List of Content : 1. Math - 2. String - 3. Conversions - 4. Sort - 5. Search - 6. Data str

SkShahriarAhmedRaka 4 Feb 5, 2022
Go implementation of the van Emde Boas tree data structure: Priority queue for positive whole numbers in O(log log u) time.

vEB Go implementation of the van Emde Boas tree data structure: Priority queue for positive whole numbers in O(log log u) time. Supports the following

null 4 Mar 7, 2022
Implementation of the MaxStack Data Structure in Go

MaxStack-Golang Implementation of the MaxStack Data Structure in Go This repository contains the design of a max stack data structure that supports th

Girish P Mallya 0 Nov 10, 2021
publish a tree-like data structure from a backend to a front-end

tree-publish publish a tree-like data structure from a backend to a front-end. It needs to be a tree in order to publish the data as JSON document. If

Lutz Behnke 0 Dec 20, 2021
A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set

A Go implementation of a bloom filter, with support for boltdb and badgerdb as optional in-memory persistent storage.

Samsondeen 24 Feb 23, 2022