Roaring bitmaps in Go (golang)

Overview

roaring Build Status GoDoc GoDoc Go Report Card Build Status Go-CI Go-ARM-CI Go-Windows-CI

This is a go version of the Roaring bitmap data structure.

Roaring bitmaps are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Apache Druid (Incubating), LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, Pilosa, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin. The YouTube SQL Engine, Google Procella, uses Roaring bitmaps for indexing.

Roaring bitmaps are found to work well in many important applications:

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)

The roaring Go library is used by

This library is used in production in several systems, it is part of the Awesome Go collection.

There are also Java and C/C++ versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. We have a format specification.

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Copyright 2016-... by the authors.

When should you use a bitmap?

Sets are a fundamental abstraction in software. They can be implemented in various ways, as hash sets, as trees, and so forth. In databases and search engines, sets are often an integral part of indexes. For example, we may need to maintain a set of all documents or rows (represented by numerical identifier) that satisfy some property. Besides adding or removing elements from the set, we need fast functions to compute the intersection, the union, the difference between sets, and so on.

To implement a set of integers, a particularly appealing strategy is the bitmap (also called bitset or bit vector). Using n bits, we can represent any set made of the integers from the range [0,n): the ith bit is set to one if integer i is present in the set. Commodity processors use words of W=32 or W=64 bits. By combining many such words, we can support large values of n. Intersections, unions and differences can then be implemented as bitwise AND, OR and ANDNOT operations. More complicated set functions can also be implemented as bitwise operations.

When the bitset approach is applicable, it can be orders of magnitude faster than other possible implementation of a set (e.g., as a hash set) while using several times less memory.

However, a bitset, even a compressed one is not always applicable. For example, if the you have 1000 random-looking integers, then a simple array might be the best representation. We refer to this case as the "sparse" scenario.

When should you use compressed bitmaps?

An uncompressed BitSet can use a lot of memory. For example, if you take a BitSet and set the bit at position 1,000,000 to true and you have just over 100kB. That is over 100kB to store the position of one bit. This is wasteful even if you do not care about memory: suppose that you need to compute the intersection between this BitSet and another one that has a bit at position 1,000,001 to true, then you need to go through all these zeroes, whether you like it or not. That can become very wasteful.

This being said, there are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integers from [0,n) where n is small (e.g., n=64 or n=128). If you are able to uncompressed BitSet and it does not blow up your memory usage, then compressed bitmaps are probably not useful to you. In fact, if you do not need compression, then a BitSet offers remarkable speed.

The sparse scenario is another use case where compressed bitmaps should not be used. Keep in mind that random-looking data is usually not compressible. E.g., if you have a small set of 32-bit random integers, it is not mathematically possible to use far less than 32 bits per integer, and attempts at compression can be counterproductive.

How does Roaring compares with the alternatives?

Most alternatives to Roaring are part of a larger family of compressed bitmaps that are run-length-encoded bitmaps. They identify long runs of 1s or 0s and they represent them with a marker word. If you have a local mix of 1s and 0, you use an uncompressed word.

There are many formats in this family:

  • Oracle's BBC is an obsolete format at this point: though it may provide good compression, it is likely much slower than more recent alternatives due to excessive branching.
  • WAH is a patented variation on BBC that provides better performance.
  • Concise is a variation on the patented WAH. It some specific instances, it can compress much better than WAH (up to 2x better), but it is generally slower.
  • EWAH is both free of patent, and it is faster than all the above. On the downside, it does not compress quite as well. It is faster because it allows some form of "skipping" over uncompressed words. So though none of these formats are great at random access, EWAH is better than the alternatives.

There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and "uncompress" the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case...

Roaring solves this problem. It works in the following manner. It divides the data into chunks of 216 integers (e.g., [0, 216), [216, 2 x 216), ...). Within a chunk, it can use an uncompressed bitmap, a simple list of integers, or a list of runs. Whatever format it uses, they all allow you to check for the present of any one value quickly (e.g., with a binary search). The net result is that Roaring can compute many operations much faster than run-length-encoded formats like WAH, EWAH, Concise... Maybe surprisingly, Roaring also generally offers better compression ratios.

References

  • Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
  • Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 2016. http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549

Dependencies

Dependencies are fetched automatically by giving the -t flag to go get.

they include

  • github.com/willf/bitset
  • github.com/mschoch/smat
  • github.com/glycerine/go-unsnap-stream
  • github.com/philhofer/fwd
  • github.com/jtolds/gls

Note that the smat library requires Go 1.6 or better.

Installation

  • go get -t github.com/RoaringBitmap/roaring

Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring.New()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // computes union of the three bitmaps in parallel using 4 workers  
    roaring.ParOr(4, rb1, rb2, rb3)
    // computes intersection of the three bitmaps in parallel using 4 workers  
    roaring.ParAnd(4, rb1, rb2, rb3)


    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.New()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
    // you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

	rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	buf := new(bytes.Buffer)
	size,err:=rb.WriteTo(buf)
	if err != nil {
		t.Errorf("Failed writing")
	}
	newrb:= New()
	size,err=newrb.ReadFrom(buf)
	if err != nil {
		t.Errorf("Failed reading")
	}
	if ! rb.Equals(newrb) {
		t.Errorf("Cannot retrieve serialized version")
	}

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

64-bit Roaring

By default, roaring is used to stored unsigned 32-bit integers. However, we also offer an extension dedicated to 64-bit integers. It supports roughly the same functions:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring/roaring64"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring64==")
    rb1 := roaring64.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring64.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring64.New()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)



    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring64.New()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
    // you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}

Only the 32-bit roaring format is standard and cross-operable between Java, C++, C and Go. There is no guarantee that the 64-bit versions are compatible.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring and http://godoc.org/github.com/RoaringBitmap/roaring64

Goroutine safety

In general, it should not generally be considered safe to access the same bitmaps using different goroutines--they are left unsynchronized for performance. Should you want to access a Bitmap from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *Bitmap around (in Go style; so there is only ever one owner), or by using sync.Mutex to serialize operations on Bitmaps.

Coverage

We test our software. For a report on our test coverage, see

https://coveralls.io/github/RoaringBitmap/roaring?branch=master

Benchmark

Type

     go test -bench Benchmark -run -

To run benchmarks on Real Roaring Datasets run the following:

go get github.com/RoaringBitmap/real-roaring-datasets
BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -

Iterative use

You can use roaring with gore:

  • go get -u github.com/motemen/gore
  • Make sure that $GOPATH/bin is in your $PATH.
  • go get github.com/RoaringBitmap/roaring
$ gore
gore version 0.2.6  :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"

Fuzzy testing

You can help us test further the library with fuzzy testing:

     go get github.com/dvyukov/go-fuzz/go-fuzz
     go get github.com/dvyukov/go-fuzz/go-fuzz-build
     go test -tags=gofuzz -run=TestGenerateSmatCorpus
     go-fuzz-build github.com/RoaringBitmap/roaring
     go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200

Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.

Alternative in Go

There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Mailing list/discussion group

https://groups.google.com/forum/#!forum/roaring-bitmaps

Issues
  • RunContainer32 implements a run-length

    RunContainer32 implements a run-length

    (interval based) uint32 set. Intersect and Union are complete and tested.

    There is more to do to integrate with roaring.Bitmap, but this is a nice chunk of functionality. A good start.

    enhancement help wanted 
    opened by glycerine 88
  • Add allocation-free iterators

    Add allocation-free iterators

    Currently, iterators are implemented by instantiating sub-iterators that run over the containers. This approach requires us to create many more temporary objects than necessary.

    A much simpler approach is the one taken by the CRoaring library where no memory allocation is done... https://github.com/RoaringBitmap/CRoaring/blob/master/src/roaring.c#L1156

    This should reduce the amount of code and reduce the pressure on the garbage collector, which should improve performance in real-life systems.

    performance 
    opened by lemire 45
  • Implement hashing (#198)

    Implement hashing (#198)

    This is a proof of concept for hashing roaring bitmaps.
    It works by writing to a customizable hash.Hash, for each key, a sequence composed of the given key as an LE 16 bit number followed by the values represented by its matching container in the same format, followed by a 0x0000 delimiter.
    Since no empty containers are kept in the bitmap there's no conflict with a container starting with the 0 value, because a separator following a key is simply impossible.
    Because we store the key and the the values following it is impossible for two containers corresponding to different sets but with the same lower parts to end up passing the same input for the hasher, so any collisions that could happen are caused by the hashing algorithm itself, rather than by two different sets creating the same data.

    opened by Oppen 44
  • Add 64 bit implementation of roaring.Bitmap

    Add 64 bit implementation of roaring.Bitmap

    Fixes #136

    This PR is very much a work-in-progress. I've tried to stay with the original 32 bit method signatures as much as possible. Serialization is explicitly not part of this PR.

    • [x] fix failing tests
    • [x] augment tests with 64 bit values
    • [x] verify copy-on-write
    opened by jacobmarble 33
  • some perf tuning

    some perf tuning

    faster newRunContainer32FromBitmapContainer; countTrailingZeros is done in assembly, replacing numberOfTrailingZeros.

    there's more too do, but this is worthwhile to merge; it speeds up the newRunContainer constructor by 4x.

    opened by glycerine 28
  • Unification of bitmap unserialization by using auxiliary byteInput approach

    Unification of bitmap unserialization by using auxiliary byteInput approach

    This pull request is not aimed to be merged, it should be treated as a proposal for enhancement.

    Currently, bitmap package has two cases that I would like to consider:

    • Methods readFrom/fromBuffer are responsible for restoring bitmap's state from io.Reader and byte slice. They have pretty the same code base and each of them has own cons.

    • fromBuffer makes a user keep (to hold reference) the provided byte slice and also to leave it constant.

    • readFrom loses the performance due to the necessity to create objects from the read byte data.

    This pull request tries to solve these issues. For unification was implemented the auxiliary class byteInput for io utils. Also, in roaringArray was added special field sentry that holds the references of allocated/type converted slices (here I am not sure about this approach, in theory, it should work and all we need to ask a user to keep the provided buf immutable).

    opened by alldroll 27
  • Implement run containers

    Implement run containers

    For better compression when there are long runs of consecutive values, the Java version has run containers (as of version 0.5). They should be implemented :

    https://github.com/lemire/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/RunContainer.java

    These can be created by the "runOptimize" function...

    https://github.com/lemire/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/RoaringBitmap.java#L1286

    These containers can improve the compression ratio in some cases.

    enhancement help wanted Priority 
    opened by lemire 24
  • Faster ParOr

    Faster ParOr

    This commit introduces a faster ParOr. The algorithm is described informally in #140.

    Benchmark data shows significant improvement compared to previous algorithm.

    Benchmarks run on 15' 2017 MacBook Pro Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

    benchmark                                           old ns/op     new ns/op     delta
    BenchmarkRealDataParOr/census-income_srt-8          269673        194486        -27.88%
    BenchmarkRealDataParOr/census-income-8              395797        310505        -21.55%
    BenchmarkRealDataParOr/census1881_srt-8             723821        402359        -44.41%
    BenchmarkRealDataParOr/census1881-8                 718247        573550        -20.15%
    BenchmarkRealDataParOr/dimension_003-8              7466247       2197374       -70.57%
    BenchmarkRealDataParOr/dimension_008-8              3062717       833368        -72.79%
    BenchmarkRealDataParOr/dimension_033-8              474489        302080        -36.34%
    BenchmarkRealDataParOr/uscensus2000-8               1912128       1092854       -42.85%
    BenchmarkRealDataParOr/weather_sept_85_srt-8        370585        178577        -51.81%
    BenchmarkRealDataParOr/weather_sept_85-8            1272372       1142709       -10.19%
    BenchmarkRealDataParOr/wikileaks-noquotes_srt-8     290843        123167        -57.65%
    BenchmarkRealDataParOr/wikileaks-noquotes-8         330745        152330        -53.94%
    

    The previous algorithm is exposed in the public API as ParHeapOr somehow resembling the convention in fastaggregation.go.

    Resolves #140

    It's the second revision of the pull request. It fixed bugs found in #164

    I've compared the cardinalities on /Real data/ dataset https://gist.github.com/maciej/add052d10c5edad2eb74eceda7484f07 semin-manually. They are the same, so I'm leaving #169 for later

    opened by maciej 22
  • Support CRoaring's frozen format for interoperability

    Support CRoaring's frozen format for interoperability

    Doing this would mean the pure Go implementation can be used to create bitmaps to later be used by a more restricted environment, such as one using Python or C, while still being able to memory map the contents. Using the pure Go version introduces several advantages versus using GoCRoaring, such as better memory accounting and parallel operations, as well as the ability to use scratch-images when deploying in Docker. Is there any interest in this functionality? I could try to implement it, I just want to know if it'll be rejected in principle.

    Note this isn't about performance and there will be a penalty; there would be a comment in the documentation noting that this should only be used for interoperability, never where performance matters.

    opened by Oppen 18
  • runContainer16/32 and msgpack2/snappy serialization

    runContainer16/32 and msgpack2/snappy serialization

    https://github.com/RoaringBitmap/roaring/pull/70 had the problem that the tests wouldn't all pass because the serialization wasn't done for runContainer16. In this serialization_msgp branch I finished it off using the serialization methods (msgpack2 for serialization and snappy for compression) discussed in https://github.com/RoaringBitmap/roaring/issues/76.

    Fixes #37 Fixes #76

    All tests are green. From the checklist on #70:

    • [x] serialization of Bitmaps including the runContainer is done in this PR.
    • [x] rangeOfOnes should generate a run container (Done: https://github.com/glycerine/roaring/pull/1)
    • [x] Implement RunOptimize
    • [x] Add isFull method; and
    • [ ] convert full bitmaps to run containers when they are full
    • [ ] Upgrade unit tests so that they call RunOptimize, including the smat tests
    • [ ] Make sure we pass the updated unit tests

    these routines still need tests for the runContainer implementation

    • [ ] fillLeastSignificant16bits()
    • [ ] getShortIterator()
    • [ ] getSizeInBytes()
    • [ ] iaddRange()
    • [ ] iremoveRange()

    I open this as a distinct branch in case nobody wants the msgpack2/snappy serialization.

    Of course I think it's great. But if there are back-wards compatibility concerns then it may be a non-starter.

    opened by glycerine 18
  • Use Go 1.9 math/bits package

    Use Go 1.9 math/bits package

    This is a quick PR replacing bodies of popcount and countTrailingZeros functions to use Go 1.9 math/bits. Approaches fixing #101.

    @lemire are there any other functions you had in mind to replace?

    I'll post results of running go test -bench Benchmark -run - on relevant (of the Roaring functions that use the popcnt and countTrailingZeros) benchmarks from the current master and the PR branch soon. Do you think it's sufficient or should I write some specific benchmarks?

    I have not inlined the popcount and countTrailingZeros functions themselves.

    popcount gets inlined in popcntSlice:

    TEXT %22%22.popcntSlice(SB) gofile../Users/maciejb/gowork/src/github.com/RoaringBitmap/roaring/popcnt.go
      popcnt.go:10		0x4b630			65488b0c2500000000	MOVQ GS:0, CX			[5:9]R_TLS_LE
      popcnt.go:10		0x4b639			483b6110		CMPQ 0x10(CX), SP		
      popcnt.go:10		0x4b63d			0f8685000000		JBE 0x4b6c8			
      popcnt.go:10		0x4b643			4883ec30		SUBQ $0x30, SP			
      popcnt.go:10		0x4b647			48896c2428		MOVQ BP, 0x28(SP)		
      popcnt.go:10		0x4b64c			488d6c2428		LEAQ 0x28(SP), BP		
      popcnt.go:10		0x4b651			31c0			XORL AX, AX			
      popcnt.go:10		0x4b653			488b4c2438		MOVQ 0x38(SP), CX		
      popcnt.go:10		0x4b658			31d2			XORL DX, DX			
      popcnt.go:12		0x4b65a			eb0a			JMP 0x4b666			
      popcnt.go:12		0x4b65c			4883c108		ADDQ $0x8, CX			
      popcnt.go:12		0x4b660			48ffc0			INCQ AX				
      popcnt.go:13		0x4b663			4801f2			ADDQ SI, DX			
      popcnt.go:12		0x4b666			488b5c2440		MOVQ 0x40(SP), BX		
      popcnt.go:12		0x4b66b			4839d8			CMPQ BX, AX			
      popcnt.go:12		0x4b66e			7d49			JGE 0x4b6b9			
      popcnt.go:12		0x4b670			488b31			MOVQ 0(CX), SI			
      popcnt.go:7		0x4b673			0fb63d00000000		MOVZX 0(IP), DI			[3:7]R_PCREL:runtime.support_popcnt
      popcnt.go:7		0x4b67a			4084ff			TESTL DI, DI			
      popcnt.go:7		0x4b67d			7407			JE 0x4b686			
      popcnt.go:7		0x4b67f			f3480fb8f6		POPCNT SI, SI			
      popcnt.go:7		0x4b684			ebd6			JMP 0x4b65c			
      popcnt.go:7		0x4b686			4889442418		MOVQ AX, 0x18(SP)		
      popcnt.go:7		0x4b68b			4889542410		MOVQ DX, 0x10(SP)		
      popcnt.go:7		0x4b690			48894c2420		MOVQ CX, 0x20(SP)		
      popcnt.go:7		0x4b695			48893424		MOVQ SI, 0(SP)			
      popcnt.go:7		0x4b699			e800000000		CALL 0x4b69e			[1:5]R_CALL:math/bits.OnesCount64
      popcnt.go:7		0x4b69e			488b742408		MOVQ 0x8(SP), SI		
      popcnt.go:7		0x4b6a3			488b442418		MOVQ 0x18(SP), AX		
      popcnt.go:7		0x4b6a8			488b4c2420		MOVQ 0x20(SP), CX		
      popcnt.go:7		0x4b6ad			488b542410		MOVQ 0x10(SP), DX		
      popcnt.go:7		0x4b6b2			488b5c2440		MOVQ 0x40(SP), BX		
      popcnt.go:7		0x4b6b7			eba3			JMP 0x4b65c			
      popcnt.go:15		0x4b6b9			4889542450		MOVQ DX, 0x50(SP)		
      popcnt.go:15		0x4b6be			488b6c2428		MOVQ 0x28(SP), BP		
      popcnt.go:15		0x4b6c3			4883c430		ADDQ $0x30, SP			
      popcnt.go:15		0x4b6c7			c3			RET				
      popcnt.go:10		0x4b6c8			e800000000		CALL 0x4b6cd			[1:5]R_CALL:runtime.morestack_noctxt
      popcnt.go:10		0x4b6cd			e95effffff		JMP %22%22.popcntSlice(SB)	
    

    countTrailingZeros gets inlined even "deeper" with no calls to either countTrailingZeros or bits.TrailingZeros64 in the output assembly:

    TEXT %22%22.(*bitmapContainer).minimum(SB) gofile../Users/maciejb/gowork/src/github.com/RoaringBitmap/roaring/bitmapcontainer.go
      bitmapcontainer.go:50	0x3f291			488b442408		MOVQ 0x8(SP), AX	
      bitmapcontainer.go:50	0x3f296			31c9			XORL CX, CX		
      bitmapcontainer.go:51	0x3f298			eb03			JMP 0x3f29d		
      bitmapcontainer.go:51	0x3f29a			48ffc1			INCQ CX			
      bitmapcontainer.go:51	0x3f29d			488b5008		MOVQ 0x8(AX), DX	
      bitmapcontainer.go:51	0x3f2a1			488b5810		MOVQ 0x10(AX), BX	
      bitmapcontainer.go:51	0x3f2a5			4839d9			CMPQ BX, CX		
      bitmapcontainer.go:51	0x3f2a8			7d27			JGE 0x3f2d1		
      bitmapcontainer.go:52	0x3f2aa			488b14ca		MOVQ 0(DX)(CX*8), DX	
      bitmapcontainer.go:53	0x3f2ae			4885d2			TESTQ DX, DX		
      bitmapcontainer.go:53	0x3f2b1			74e7			JE 0x3f29a		
      ctz.go:40		0x3f2b3			480fbcc2		BSFQ DX, AX		
      bitmapcontainer.go:55	0x3f2b7			48c1e106		SHLQ $0x6, CX		
      ctz.go:40		0x3f2bb			480fbcd2		BSFQ DX, DX		
      ctz.go:40		0x3f2bf			ba40000000		MOVL $0x40, DX		
      ctz.go:40		0x3f2c4			480f44c2		CMOVE DX, AX		
      bitmapcontainer.go:55	0x3f2c8			4801c8			ADDQ CX, AX		
      bitmapcontainer.go:55	0x3f2cb			6689442410		MOVW AX, 0x10(SP)	
      bitmapcontainer.go:55	0x3f2d0			c3			RET			
      bitmapcontainer.go:58	0x3f2d1			66c7442410ffff		MOVW $-0x1, 0x10(SP)	
      bitmapcontainer.go:58	0x3f2d8			c3			RET			
    
    opened by maciej 17
  • Fault When Creating Frozen Bitmap From MMapped

    Fault When Creating Frozen Bitmap From MMapped

    We have been trying to use Frozen Bitmaps for our internal system and were met with unanticipated faults. After working with @a392n328 we were able to write up this minimal failure, which is in #362. The actual error is

    === RUN   TestFrozenCases
    unexpected fault address 0x15fb008
    fatal error: fault
    [signal SIGBUS: bus error code=0x2 addr=0x15fb008 pc=0x11ed56a]
    

    I can't figure out what exactly could be causing this.

    opened by jacksonrnewhouse 2
  • Example test for Frozen Bitmap fault when using Frozen Bitmaps as regular/mutable bitmaps with a read-only buffer

    Example test for Frozen Bitmap fault when using Frozen Bitmaps as regular/mutable bitmaps with a read-only buffer

    This is a test that should not pass but does, as it asserts a panic exactly when FrozenBitmaps and MMap is used. @a392n328 helped me diagnose these issues.

    opened by jacksonrnewhouse 4
  • What to do about big endian architectures' support?

    What to do about big endian architectures' support?

    During evaluation of #198 the issue of big endian code not being tested by the CI pipelines was raised.
    This means, despite code being in place to support it, the architectures should be considered unsupported because they're untested.

    There are a few options to deal with this:

    • Properly considering those archs unsupported, in which case the reasonable way to deal with it is to remove the code and just assume little endian;
    • Add pipelines to run on at least one such architecture.

    The Go toolchain supports a few BE archs, in particular PowerPC64 with GOARCH=ppc64, and qemu can be used to run commands on the emulated architecture, and I successfully cross-compiled and run the tests for ppc64 that way locally:

    $ GOARCH=ppc64 go test -c -o ppc64_test
    $ qemu-ppc64 ./ppc64_test 
    ==roaring==
    {1,2,3,4,5,100,1000}
    {3,4,1000}
    {}
    Cardinality:  7
    Contains 3?  true
    1
    3
    4
    5
    1000
    
    Wrote  22  bytes
    I wrote the content to a byte stream and read it back.
    size before run optimize: 1810 bytes, and after: 38 bytes.
    2022/01/13 19:39:53 rtest N= 15
    2022/01/13 19:39:53 rtest N= 1024
    2022/01/13 19:39:54 rtest N= 4096
    2022/01/13 19:39:54 rtest N= 65536
    2022/01/13 19:39:55 rtest N= 1048576
    2022/01/13 19:40:04 rtest N= 15
    2022/01/13 19:40:04 rtest N= 100
    2022/01/13 19:40:04 rtest N= 512
    2022/01/13 19:40:04 rtest N= 1023
    2022/01/13 19:40:04 rtest N= 1025
    2022/01/13 19:40:04 rtest N= 4095
    2022/01/13 19:40:04 rtest N= 4096
    2022/01/13 19:40:04 rtest N= 4097
    2022/01/13 19:40:04 rtest N= 65536
    2022/01/13 19:40:05 rtest N= 1048576
    2022/01/13 19:41:26 rtest N= 15
    2022/01/13 19:41:26 rtest N= 1024
    2022/01/13 19:41:26 rtest N= 4096
    2022/01/13 19:41:26 rtest N= 65536
    2022/01/13 19:41:26 rtest N= 1048576
    2022/01/13 19:41:35 rtest N= 15
    2022/01/13 19:41:35 rtest N= 100
    2022/01/13 19:41:35 rtest N= 512
    2022/01/13 19:41:35 rtest N= 1023
    2022/01/13 19:41:35 rtest N= 1025
    2022/01/13 19:41:35 rtest N= 4095
    2022/01/13 19:41:35 rtest N= 4096
    2022/01/13 19:41:35 rtest N= 4097
    2022/01/13 19:41:35 rtest N= 65536
    2022/01/13 19:41:36 rtest N= 1048576
    200100
    PASS
    

    In case we choose to support those by adding pipelines to check them, we should decide whether we want to make them first or second class citizens with regards to performance. Most mainstream architectures being little endian, it may make sense to trade off performance in BE if this improves performance for the LE case.
    This question is practical rather than hypothetical, as it was suggested for hashing to cast slices in a way that would mean extra allocations for BE, in exchange for the ability to pass bigger blocks to the hasher.

    For a start, it would be interesting to know whether there are currently any users of BE roaring.

    question discussion 
    opened by Oppen 6
  • Running unit tests downstream fails

    Running unit tests downstream fails

    Running tests using a downstream project with go test all gives:

    ==roaring==
    {1,2,3,4,5,100,1000}
    {3,4,1000}
    {}
    Cardinality:  7
    Contains 3?  true
    1
    3
    4
    5
    1000
    
    Wrote  22  bytes
    I wrote the content to a byte stream and read it back.
    size before run optimize: 1810 bytes, and after: 38 bytes.
    2021/09/11 16:24:26 rtest N= 15
    2021/09/11 16:24:26 rtest N= 1024
    2021/09/11 16:24:26 rtest N= 4096
    2021/09/11 16:24:26 rtest N= 65536
    2021/09/11 16:24:26 rtest N= 1048576
    2021/09/11 16:24:27 rtest N= 15
    2021/09/11 16:24:27 rtest N= 100
    2021/09/11 16:24:27 rtest N= 512
    2021/09/11 16:24:27 rtest N= 1023
    2021/09/11 16:24:27 rtest N= 1025
    2021/09/11 16:24:27 rtest N= 4095
    2021/09/11 16:24:27 rtest N= 4096
    2021/09/11 16:24:27 rtest N= 4097
    2021/09/11 16:24:27 rtest N= 65536
    2021/09/11 16:24:27 rtest N= 1048576
    2021/09/11 16:24:38 rtest N= 15
    2021/09/11 16:24:38 rtest N= 1024
    2021/09/11 16:24:38 rtest N= 4096
    2021/09/11 16:24:38 rtest N= 65536
    2021/09/11 16:24:38 rtest N= 1048576
    2021/09/11 16:24:40 rtest N= 15
    2021/09/11 16:24:40 rtest N= 100
    2021/09/11 16:24:40 rtest N= 512
    2021/09/11 16:24:40 rtest N= 1023
    2021/09/11 16:24:40 rtest N= 1025
    2021/09/11 16:24:40 rtest N= 4095
    2021/09/11 16:24:40 rtest N= 4096
    2021/09/11 16:24:40 rtest N= 4097
    2021/09/11 16:24:40 rtest N= 65536
    2021/09/11 16:24:40 rtest N= 1048576
    --- FAIL: TestSerializationToFile038 (0.00s)
        serialization_test.go:70: 
            	Error Trace:	serialization_test.go:70
            	Error:      	Received unexpected error:
            	            	open myfile.bin: permission denied
            	Test:       	TestSerializationToFile038
        serialization_test.go:75: 
            	Error Trace:	serialization_test.go:75
            	Error:      	Received unexpected error:
            	            	invalid argument
            	Test:       	TestSerializationToFile038
        serialization_test.go:76: 
            	Error Trace:	serialization_test.go:76
            	Error:      	Not equal: 
            	            	expected: int64(0)
            	            	actual  : uint64(0x1e)
            	Test:       	TestSerializationToFile038
        serialization_test.go:83: 
            	Error Trace:	serialization_test.go:83
            	Error:      	Received unexpected error:
            	            	open myfile.bin: no such file or directory
            	Test:       	TestSerializationToFile038
        serialization_test.go:91: 
            	Error Trace:	serialization_test.go:91
            	Error:      	Should be true
            	Test:       	TestSerializationToFile038
        serialization_test.go:87: 
            	Error Trace:	serialization_test.go:87
            	            				serialization_test.go:92
            	Error:      	Received unexpected error:
            	            	remove myfile.bin: no such file or directory
            	Test:       	TestSerializationToFile038
    --- FAIL: TestSerializationBasic4WriteAndReadFile040 (0.00s)
        serialization_test.go:123: 
            	Error Trace:	serialization_test.go:123
            	Error:      	Received unexpected error:
            	            	open testdata/all3.classic: permission denied
            	Test:       	TestSerializationBasic4WriteAndReadFile040
        serialization_test.go:129: 
            	Error Trace:	serialization_test.go:129
            	Error:      	Received unexpected error:
            	            	invalid argument
            	Test:       	TestSerializationBasic4WriteAndReadFile040
        serialization_test.go:130: 
            	Error Trace:	serialization_test.go:130
            	Error:      	Not equal: 
            	            	expected: int64(0)
            	            	actual  : uint64(0xbbb8)
            	Test:       	TestSerializationBasic4WriteAndReadFile040
        serialization_test.go:135: 
            	Error Trace:	serialization_test.go:135
            	Error:      	Received unexpected error:
            	            	open testdata/all3.classic: no such file or directory
            	Test:       	TestSerializationBasic4WriteAndReadFile040
        serialization_test.go:142: 
            	Error Trace:	serialization_test.go:142
            	Error:      	Received unexpected error:
            	            	error in roaringArray.readFrom: could not read initial cookie: invalid argument
            	Test:       	TestSerializationBasic4WriteAndReadFile040
        serialization_test.go:143: 
            	Error Trace:	serialization_test.go:143
            	Error:      	Should be true
            	Test:       	TestSerializationBasic4WriteAndReadFile040
    200100
    --- FAIL: TestBitmap_FromBuffer (0.02s)
        --- FAIL: TestBitmap_FromBuffer/all3.classic_bitmap (0.00s)
            serialization_test.go:461: 
                	Error Trace:	serialization_test.go:461
                	Error:      	Received unexpected error:
                	            	open testdata/all3.classic: no such file or directory
                	Test:       	TestBitmap_FromBuffer/all3.classic_bitmap
            serialization_test.go:466: 
                	Error Trace:	serialization_test.go:466
                	Error:      	Received unexpected error:
                	            	error in roaringArray.readFrom: could not read initial cookie: unexpected EOF
                	Test:       	TestBitmap_FromBuffer/all3.classic_bitmap
    FAIL
    FAIL	github.com/RoaringBitmap/roaring	29.203s
    

    I expect there's some way to tell unit tests to more reliably find their test data. There may also be a special directory name that will be included in packages/modules.

    opened by anacrolix 1
Releases(v1.2.1)
Owner
Roaring bitmaps: A better compressed bitset
Roaring bitmaps are compressed bitmaps. They can be hundreds of times faster. (Picture credit: tambako)
Roaring bitmaps: A better compressed bitset
Sroar - 64-bit Roaring Bitmaps in Go

sroar: Serialized Roaring Bitmaps This is a fork of dgraph-io/sroar, being maint

Outcaste, Inc. 26 Jun 6, 2022
CLRS study. Codes are written with golang.

algorithms CLRS study. Codes are written with golang. go version: 1.11 Heap BinaryHeap on array BinaryHeap on linkedlist LeftistHeap FibonacciHeap Tre

Apollo Li 647 Jun 30, 2022
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Ham

Hugo Bollon 331 Jun 29, 2022
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

null 2.2k Jun 26, 2022
A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Set Set is a simple Set data structure implementation in Go (Golang) using LinkedHashMap. This library allow you to get a set of int64 or string witho

Studio Sol Comunicação Digital Ltda 21 Apr 6, 2022
skiplist for golang

skiplist reference from redis zskiplist Usage package main import ( "fmt" "github.com/gansidui/skiplist" "log" ) type User struct { score float6

gansidui 79 May 7, 2022
An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

Rene Kroon 442 Jun 25, 2022
High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Darren Elwood 126 Jun 8, 2022
Double-ARray Trie System for golang

Darts This is a GO implementation of Double-ARray Trie System. It's a clone of the C++ version Darts can be used as simple hash dictionary. You can al

Andy Song 93 Apr 10, 2022
Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Armon Dadgar 734 Jun 19, 2022
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

360 Enterprise Security Group, Endpoint Security, inc. 12.1k Jun 29, 2022
HyperLogLog and HyperLogLog++ implementation in Go/Golang.

HyperLogLog and HyperLogLog++ Implements the HyperLogLog and HyperLogLog++ algorithms. HyperLogLog paper: http://algo.inria.fr/flajolet/Publications/F

Clark DuVall 408 Jun 24, 2022
Golang library for querying and parsing OFX

OFXGo OFXGo is a library for querying OFX servers and/or parsing the responses. It also provides an example command-line client to demonstrate the use

Aaron Lindsay 102 Jun 22, 2022
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Geoffrey J. Teale 5.3k Jun 26, 2022
Golang FrodoKEM implementation

FrodoKEM in Golang Golang implementation of FrodoKEM: a Practical quantum-secure key encapsulation from generic lattices (https://frodokem.org). This

Ed Riccardi 39 Mar 3, 2022
Golang LRU cache

golang-lru This provides the lru package which implements a fixed-size thread safe LRU cache. It is based on the cache in Groupcache. Documentation Fu

HashiCorp 2.8k Jun 30, 2022
Meow hash for Golang

meow Golang implementation of the Meow hash, an extremely fast non-cryptographic hash. Warning The official implemention is in flux, therefore this on

Michael McLoughlin 122 May 9, 2022
A Golang lock-free thread-safe HashMap optimized for fastest read access.

hashmap Overview A Golang lock-free thread-safe HashMap optimized for fastest read access. Usage Set a value for a key in the map: m := &HashMap{} m.S

Cornel 1.2k Jun 30, 2022
Trie data structure implementation in Golang 🌳

Gotri Gotri is an Unicode character based Trie/prefix tree implementation in Go, with the suggestion/auto-complete feature for character searching. Si

Monir Zaman 9 Jun 17, 2022