indexing library for Go

Overview

Bluge Bluge

PkgGoDev Tests Lint

modern text indexing in go - blugelabs.com

Features

  • Supported field types:
    • Text, Numeric, Date, Geo Point
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
  • BM25 Similarity/Scoring with pluggable interfaces
  • Search result match highlighting
  • Extendable Aggregations:
    • Bucketing
      • Terms
      • Numeric Range
      • Date Range
    • Metrics
      • Min/Max/Count/Sum
      • Avg/Weighted Avg
      • Cardinality Estimation (HyperLogLog++)
      • Quantile Approximation (T-Digest)

Indexing

    config := bluge.DefaultConfig(path)
    writer, err := bluge.OpenWriter(config)
    if err != nil {
        log.Fatalf("error opening writer: %v", err)
    }
    defer writer.Close()

    doc := bluge.NewDocument("example").
        AddField(bluge.NewTextField("name", "bluge"))

    err = writer.Update(doc.ID(), doc)
    if err != nil {
        log.Fatalf("error updating document: %v", err)
    }

Querying

    reader, err := writer.Reader()
    if err != nil {
        log.Fatalf("error getting index reader: %v", err)
    }
    defer reader.Close()

    query := bluge.NewMatchQuery("bluge").SetField("name")
    request := bluge.NewTopNSearch(10, query).
        WithStandardAggregations()
    documentMatchIterator, err := reader.Search(context.Background(), request)
    if err != nil {
        log.Fatalf("error executing search: %v", err)
    }
    match, err := documentMatchIterator.Next()
    for err == nil && match != nil {
        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })
        if err != nil {
            log.Fatalf("error loading stored fields: %v", err)
        }
        match, err = documentMatchIterator.Next()
    }
    if err != nil {
        log.Fatalf("error iterator document matches: %v", err)
    }

License

Apache License Version 2.0

Comments
  • Install fails due to willf/bitset

    Install fails due to willf/bitset

    $ go get -u github.com/blugelabs/bluge
    go get: github.com/willf/[email protected] updating to
    	github.com/willf/[email protected]: parsing go.mod:
    	module declares its path as: github.com/bits-and-blooms/bitset
    	        but was required as: github.com/willf/bitset
    
    opened by tmm1 8
  • Fix possible runtime panic on DecRef call

    Fix possible runtime panic on DecRef call

    The following error was hit when porting from bleve to bluge

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x17000ce]
    
    goroutine 16 [running]:
    github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0004b59e0, 0x10ca11a, 0x8000000000000000)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
    github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc0005a0780, 0x3, 0x8ffd3)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
    github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc000069200, 0xc000606660, 0xc0000402a0)
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	/gocode/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
    

    It seems like might be possible that https://github.com/blugelabs/bluge/blob/fe1f453e701a72cb3ee01b13b8a5e5f6e2b0f6cc/index/writer.go#L523 could throw a panic also as I don't see any other place segmentWrapper is initialized. It looks like a path exists where err is nil and the returned closer also is.

    I am using the bluge.InMemoryOnlyConfig.

    opened by michaeljs1990 7
  • bleve vs bluge question

    bleve vs bluge question

    Hello,

    I hope you don't mind me asking these questions. :-)

    My understanding is that bluge is the replacement for bleve. Could you let me know why you chose to stop development of bleve and start bluge (sorry if this explanation exists elsewhere, I haven't been able to find it). How is the design or implementation of bluge an improvement over bleve? I am asking as a long time user of Lucene, and wanting a performant Go replacement for certain projects. I would appreciate you sharing some of the design direction regarding bluge, and perhaps the use cases where bluge is/will be an improvement over bleve.

    constructively, :-) Glen

    opened by gnewton 6
  • panic while merging in unit test

    panic while merging in unit test

     panic: runtime error: slice bounds out of range [:1153] with capacity 1152
    
    goroutine 95 [running]:
    github.com/blugelabs/ice/v2.(*Segment).copyStoredDocs(0xc000b40780, 0x200, 0xc000fc0000, 0x7d0, 0x7d0, 0xc000fad3f0, 0x0, 0x1)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:787 +0x745
    github.com/blugelabs/ice/v2.mergeStoredAndRemap(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0xc0006b88d0, 0xc0006b88a0, 0x3, 0x3, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:659 +0x853
    github.com/blugelabs/ice/v2.mergeToWriter(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x401, 0xc000621f60, 0xc000798fc0, 0xc000621f40, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:130 +0x21c
    github.com/blugelabs/ice/v2.mergeSegmentBasesWriter(0xc000621f40, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x9c2580, 0xc0009fe300, 0x401, 0xc000798fc0, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:96 +0x15a
    github.com/blugelabs/ice/v2.merge(0xc0006b8870, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0x9c2580, 0xc0009fe300, 0xc000798fc0, 0x0, ...)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:85 +0x1d4
    github.com/blugelabs/ice/v2.(*Merger).WriteTo(0xc000cd4910, 0x9c2cc0, 0xc0009d8110, 0xc000798fc0, 0x9c61e0, 0xc000044950, 0x0)
    	C:/Users/runneradmin/go/pkg/mod/github.com/blugelabs/ice/[email protected]/merge.go:48 +0x191
    github.com/blugelabs/bluge/index.(*FileSystemDirectory).Persist(0xc0000ba780, 0x943751, 0x4, 0xd, 0x1d124eabf58, 0xc000cd4910, 0xc000798fc0, 0x9c4e60, 0xc000cd4910)
    	D:/a/bluge/bluge/index/directory_fs.go:125 +0x2d5
    github.com/blugelabs/bluge/index.(*Writer).merge(0xc0002f4480, 0xc0006b8870, 0x3, 0x3, 0xc000621f20, 0x3, 0x3, 0xd, 0x3, 0xc0009fe100, ...)
    	D:/a/bluge/bluge/index/merge.go:368 +0x22b
    github.com/blugelabs/bluge/index.(*Writer).executeMergeTask(0xc0002f4480, 0xc0007990e0, 0xc000621f00, 0xc000cd48c0, 0xc000621ea0)
    	D:/a/bluge/bluge/index/merge.go:144 +0x88b
    github.com/blugelabs/bluge/index.(*Writer).planMergeAtSnapshot(0xc0002f4480, 0xc0007990e0, 0xc0000fa980, 0xa, 0x4c4b40, 0x4024000000000000, 0xa, 0x7d0, 0x4000000000000000, 0x0, ...)
    	D:/a/bluge/bluge/index/merge.go:118 +0x47c
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc0002f4480, 0xc0007990e0, 0xc000895380)
    	D:/a/bluge/bluge/index/merge.go:56 +0x4e5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	D:/a/bluge/bluge/index/writer.go:131 +0xf8e
    FAIL	github.com/blugelabs/bluge	1.570s
    
    bug 
    opened by mschoch 5
  • Exposed sloppyness parameter in Query & Searcher

    Exposed sloppyness parameter in Query & Searcher

    Changes

    • Exposed slop parameter to (Muti/Match)PhraseQuery
    • Added a factory method MultiPhraseSearcher to take in slop as a parameter
    • Added tests for the above
    • In integ tests, added a utility function to create match results, to reduce verbosity
    opened by voldyman 5
  • Made in memory directory abstraction thread-safe, fixes #41

    Made in memory directory abstraction thread-safe, fixes #41

    Added a RW lock to avoid concurrent writes to segment map.

    I couldn't create a reliable test for this, i was encountering this race condition every 3rd time and haven't encountered it since the fix.

    (I am already in the AUTHORS file)

    opened by voldyman 5
  • Question: is it possible to add additonal values like recency to the score?

    Question: is it possible to add additonal values like recency to the score?

    I would like to configure how search results are scored based on other additional parameters that are not direct matches. For example I would want the recency of a document to be weighted into the score so they appear in front of other documents that may satisfy the query but are older and thus not that relevant. Is that possible or even the right thing to do to achieve the intended result?

    Another usecase is if i have a like to dislike ratio and i would like the ratio to effect the score/ranking but also maintain searchability

    I hope this is the right place to ask, wasn't sure

    opened by enex 4
  • quick fix to sort issue

    quick fix to sort issue

    It has been observed that in some cases the computed sort value for a DocumentMatch becomes corrupted. The problem has been traced back to the doc values uncompressed slices, but for now a quick fix is proposed to copy the bytes associate with the sort key, ensuring that no other doc values operations can corrupt them.

    opened by mschoch 4
  • In memory directory causes nil pointer dereference

    In memory directory causes nil pointer dereference

    The in memory directory implementation causes nil pointer deference because the Load method returns a nil closer..

    stacktrace:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x114c2ee]
    
    goroutine 12 [running]:
    github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0002e3ec0, 0x109145a, 0x8000000000000000)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
    github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc00049e200, 0x3, 0x1bd995)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
    github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
    github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc00003cd80, 0xc0002ec9c0, 0xc0000503c0)
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
    created by github.com/blugelabs/bluge/index.OpenWriter
    	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
    exit status 2
    
    opened by voldyman 4
  • Why numeric field store so many tokens?

    Why numeric field store so many tokens?

    i create a document include a field Year as the value 1896. Then a check the docValue of the field, the result like this:

    ice docvalues 000000000008.seg 2 Year
    Year 0x2001404e68000000000000  | 1896.000000 <nil>
    Year 0x240c0476400000000000  | 1896.000000 <nil>
    Year 0x286027340000000000  | 1896.000000 <nil>
    Year 0x2c06023b2000000000  | 1896.000000 <nil>
    Year 0x3030135a00000000  | 1896.000000 <nil>
    Year 0x3403011d50000000  | 1896.000000 <nil>
    Year 0x3818096d000000  | 1896.000000 <nil>
    Year 0x3c01404e680000  | 1896.000000 <nil>
    Year 0x400c04764000  | 1896.000000 <nil>
    Year 0x4460273400  | 1896.000000 <nil>
    Year 0x4806023b20  | 1896.000000 <nil>
    Year 0x4c30135a  | 1896.000000 <nil>
    Year 0x5003011d  | 1856.000000 <nil>
    Year 0x541809  | 1024.000000 <nil>
    Year 0x580140  | 2.000000 <nil>
    Year 0x5c0c  | 2.000000 <nil>
    

    | 1986.00000 this part is my add for debug.

    We can see there are 16 values for this field, i don't understand why a numeric design like this?

    and it used too much space to store this field, it's just a numeric.

    opened by hengfeiyang 3
  • Hot to convert []byte to float?

    Hot to convert []byte to float?

    As I understand, the one way to get document field values is VisitStoredFields. Callback retrive value as []byte. So, text fields is simply converted via string(value). But how to convert Numeric (float64) fields?

    I tried

    func Float64frombytes(bytes []byte) float64 {
    	bits := binary.BigEndian.Uint32(bytes)
    	float := math.Float32frombits(bits)
    	return float64(float)
    }
    

    and

    func Float64frombytes(bytes []byte) float64 {
    	bits := binary.BigEndian.Uint64(bytes)
    	float := math.Float64frombits(bits)
    	return float
    }
    

    But both of them does not works correct. I tried LittleEndian - still no result.

    opened by IAkumaI 3
  • Support illumos and Solaris

    Support illumos and Solaris

    This fixes build failure in recent versions of Grafana, e.g.:

    # github.com/blugelabs/bluge/index/lock
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:33:9: undefined: open
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:37:9: undefined: open
    ../.gopath/pkg/mod/github.com/blugelabs/[email protected]/index/lock/lock.go:49:11: e.unlock undefined (type *DefaultLockedFile has no field or method unlock)
    

    I've tested this patch against grafana 9.2.4 and it now builds correctly.

    opened by jperkin 1
  • Is there a way to use this library more as a caching layer?

    Is there a way to use this library more as a caching layer?

    Is there a way to use this library more as a caching layer (preferably using mmap so it is not limited by RAM)? For example, I already store data in some external database and I would want to use this library more as a cache and would like to have custom handler for both evict and restore.

    opened by kant777 0
  • Indexing/Analyzing URLs, Email Addresses, etc?

    Indexing/Analyzing URLs, Email Addresses, etc?

    Been trying to figure out how to index/analyze things like:

    I don't think any of the built-in analyzers will index these strings properly?

    opened by prologic 0
  • Sorting by ascending order of _score

    Sorting by ascending order of _score

    I think I've found a bug.

    Calling SortBy([]string{"_score"}) doesn't seem to actually give me the results in ascending order of the scores.

    Is this suppose to work? (is it according to the docs)... Happy to write a reproducer to confirm / file a bug report.

    opened by prologic 0
  • Difference between a NewTextField() and NewKeywordField()

    Difference between a NewTextField() and NewKeywordField()

    Can you give some explanation (and add some doc strings) as to the differences between the different field types?

    Near as I can tell:

    • Keyword Field: single word, case sensitive
    • Text Field: multiple words split by whitespace, case insensitive

    But I'm also not completely sure, "read the source" just doesn't cut it sorry 😅

    Thanks! 🙏

    opened by prologic 0
Releases(v0.2.2)
  • v0.2.2(Jul 4, 2022)

  • v0.2.1(May 26, 2022)

  • v0.2.0(May 26, 2022)

    Enhancements:

    • support ice v2 (index size reduction https://github.com/blugelabs/ice/pull/8)

    NOTE: this release has known crash https://github.com/blugelabs/bluge/issues/119

    Source code(tar.gz)
    Source code(zip)
  • v0.1.9(Jan 4, 2022)

  • v0.1.8(Nov 11, 2021)

  • v0.1.7(Jul 13, 2021)

    Some incantations of go get were still returning error messages about the renamed willf/bitset repository. This release updates all bluge's dependencies which also indirectly depended on this repository.

    • roaring bitmaps
    • bluge_segment_api
    • ice
    • vellum (also switched from couchbase repo to blevesearch, as this is the official maintained version now)

    We hope this nightmare is over.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Jul 12, 2021)

    Enhancements:

    • support overriding mmap behavior (#62)

    Bug fixes:

    • fix upstream import path (#65)
    • fix build issue on 32-bit systems (#68)
    • fix issue with snapshots containing 0 segments (#55)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Feb 12, 2021)

    Enhancements:

    • support sloppy phrase query
    • query fields have accessor methods

    Bug fixes:

    • fix panic in ASCII folding filter
    • HTML highlighter does proper escaping
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Dec 29, 2020)

    Bug fixes:

    • fix panic using in-memory indexes
    • fix data race with in-memory indexes
    • fix typo in error message

    Documentation Improvements:

    • Go docs added for TopNSearch
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Oct 8, 2020)

    Expose a public API to use a custom directory implementation. This is important functionality to demonstrate ways that applications can extend Bluge themselves.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Sep 29, 2020)

  • v0.1.1(Sep 29, 2020)

  • v0.1.0(Sep 29, 2020)

    Initial Release

    Major Changes from Bleve:

    • Removed IndexMapping
    • Moved query string to separate module
    • Removed JSON marshaling/unmarshaling
    • Removed HTTP handlers
    • Removed registry
    • Removed array positions
    • Removed internal key/value storage
    • Reorganized into fewer, larger packages
    • Strongly typed configuration
    • Static analysis via golangci-lint
    • New directory abstraction
    • Support for multi-process access to indexes (read-only)
    • Support online backup
    • Scoring via BM25 similarity model
    • Ability to customize scoring
    • Search, sort and aggregate over virtual fields
    • Completely overhauled aggregation framework
    Source code(tar.gz)
    Source code(zip)
Owner
Bluge Labs
Bluge Labs
Package for indexing zip files and storing a compressed index

zipindex zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index. It will o

High Performance, Kubernetes Native Object Storage 37 Nov 30, 2022
A Go implementation of the core algorithm in paper

Boolean Expression Indexer Go library A Go implementation of the core algorithm in paper <Indexing Boolean Expression>, which already supports the fol

wangyi 54 Dec 26, 2022
A small flexible merge library in go

conjungo A merge utility designed for flexibility and customizability. The library has a single simple point of entry that works out of the box for mo

InVision 109 Dec 27, 2022
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Ham

Hugo Bollon 373 Dec 20, 2022
Go native library for fast point tracking and K-Nearest queries

Geo Index Geo Index library Overview Splits the earth surface in a grid. At each cell we can store data, such as list of points, count of points, etc.

Hailo Network IP Ltd 344 Dec 3, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

stirlingx 752 Dec 26, 2022
Zero allocation Nullable structures in one library with handy conversion functions, marshallers and unmarshallers

nan - No Allocations Nevermore Package nan - Zero allocation Nullable structures in one library with handy conversion functions, marshallers and unmar

Andrey Kuzmin 63 Dec 20, 2022
A Go library for an efficient implementation of a skip list: https://godoc.org/github.com/MauriceGit/skiplist

Fast Skiplist Implementation This Go-library implements a very fast and efficient Skiplist that can be used as direct substitute for a balanced tree o

Maurice Tollmien 240 Dec 30, 2022
Go Library [DEPRECATED]

Tideland Go Library Description The Tideland Go Library contains a larger set of useful Google Go packages for different purposes. ATTENTION: The cell

Tideland 194 Nov 15, 2022
an R-Tree library for Go

rtreego A library for efficiently storing and querying spatial data in the Go programming language. About The R-tree is a popular data structure for e

Daniel Connelly 557 Jan 3, 2023
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

360 Enterprise Security Group, Endpoint Security, inc. 13.9k Jan 9, 2023
Golang library for querying and parsing OFX

OFXGo OFXGo is a library for querying OFX servers and/or parsing the responses. It also provides an example command-line client to demonstrate the use

Aaron Lindsay 113 Nov 25, 2022
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Geoffrey J. Teale 5.4k Jan 5, 2023
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.5k Jan 5, 2023
Go library implementing xor filters

xorfilter: Go library implementing xor filters Bloom filters are used to quickly check whether an element is part of a set. Xor filters are a faster a

null 607 Dec 30, 2022
Library for hashing any Golang interface

recursive-deep-hash Library for hashing any Golang interface Making huge struct comparison fast & easy How to use package main import ( "fmt" "git

Panos Petropoulos 6 Mar 3, 2022
The Go library that will drive you to AOP world!

Beyond The Golang library that will drive you to the AOP paradigm world! Check Beyond Documentation What's AOP? In computing, aspect-oriented programm

Wesovi Labs 52 Dec 6, 2022
☔️ A complete Go cache library that brings you multiple ways of managing your caches

Gocache Guess what is Gocache? a Go cache library. This is an extendable cache library that brings you a lot of features for caching data. Overview He

Vincent Composieux 1.6k Jan 1, 2023
A radix sorting library for Go (golang)

zermelo A radix sorting library for Go. Trade memory for speed! import "github.com/shawnsmithdev/zermelo" func foo(large []uint64) zermelo.Sort(l

Shawn Smith 48 Jul 30, 2022