Double-ARray Trie System for golang

Overview

Darts

This is a GO implementation of Double-ARray Trie System. It's a clone of the C++ version

Darts can be used as simple hash dictionary. You can also do very fast Common Prefix Search which is essential for morphological analysis, such as word split for CJK text indexing/searching.

Reference

What is Trie
An Implementation of Double-Array Trie

NEWS

  • Support building Double-Array from DAWG有向无环图, reduce the on-disk dict half as Trie. Lookup performance increases 25%.

TO DO list

  • Documentation/comments
  • Benchmark

Switch from unicode to byte version

gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w darts.go
gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w dawg.go

Usage

Input Dictionary Format

Key\tFreq

Each key occupies one line. The file should be utf-8 encoded

Code example (unicode version)

package main

import (
    "darts"
    "fmt"
)

func main() {
    d, err:= darts.Import("darts.txt", "darts.lib", true)
    if err == nil {
        if d.ExactMatchSearch([]rune("考察队员", 0)) {
            fmt.Println("考察队员 is in dictionary")
        }
    }
}

Code example (byte version)

package main

import (
    "darts"
    "fmt"
)

func main() {
    d, err := darts.Import("darts.txt", "darts.lib", true)
    if err == nil {
        key := []byte("考察队员")
        r := d.CommonPrefixSearch(key, 0)
        for i := 0; i < len(r); i++ {
            fmt.Println(string(key[:r[i].PrefixLen]))
        }
    }
}

Performance

Using a 100K item dictionary, a simple search on eath key takes go map 46 ms, takes byte_key version of darts 14 ms, and for unicode_key version of darts 9.5 ms.

LICENSE

Apache License 2.0

You might also like...
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Roaring bitmaps in Go (golang)

roaring This is a go version of the Roaring bitmap data structure. Roaring bitmaps are used by several major systems such as Apache Lucene and derivat

A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Set Set is a simple Set data structure implementation in Go (Golang) using LinkedHashMap. This library allow you to get a set of int64 or string witho

skiplist for golang

skiplist reference from redis zskiplist Usage package main import ( "fmt" "github.com/gansidui/skiplist" "log" ) type User struct { score float6

An in-memory string-interface{} map with various expiration options for golang

TTLCache - an in-memory cache with expiration TTLCache is a simple key/value cache in golang with the following functions: Expiration of items based o

High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

HyperLogLog and HyperLogLog++ implementation in Go/Golang.
HyperLogLog and HyperLogLog++ implementation in Go/Golang.

HyperLogLog and HyperLogLog++ Implements the HyperLogLog and HyperLogLog++ algorithms. HyperLogLog paper: http://algo.inria.fr/flajolet/Publications/F

Comments
  • DAWG - usage issues.

    DAWG - usage issues.

    func main() {
    d, err := darts.Import("../src/darts/darts.txt", "darts.lib", true) // fmt.Println(err) if err == nil {

        key := []byte("考察队员")
        r := d.CommonPrefixSearch(key, 0)
        fmt.Println(r)
        for i := 0; i < len(r); i++ {
            fmt.Println("kk ", string(key[:r[i].PrefixLen]))
        }
    
    }
    

    Produces:

    input dict length: 109135 build out length 373050 18.472ms [] [Finished in 2.5s]

    Without DAWG = True- it works just fine. ??

    Just playing with it, interested in Double-Array Trie and DAWG specifically. Thank you.

    opened by tbytes 1
  • Fix broken headings in Markdown files

    Fix broken headings in Markdown files

    GitHub changed the way Markdown headings are parsed, so this change fixes it.

    See bryant1410/readmesfix for more information.

    Tackles bryant1410/readmesfix#1

    opened by bryant1410 0
  • a question about begin and character encoding

    a question about begin and character encoding

    Hello, I haven't been able to figure out a few questions. Can you help me?

    If it's just to satisfy:

    for i := 0; i < len(siblings); i++ {
    
    d. darts.Check [begin+int(siblings[i].code)] = begin
    
    }
    

    Why is the value of begin begin = pos - int (siblings [0]. Code'), based on what consideration? Why not start with 0

    And why code = Unicode + 1

    Thank you

    opened by TXYH1 0
Owner
Andy Song
Andy Song
A highly optimized double-ended queue

Overview Deque is a highly optimized double-ended queue. Benchmark Benchmark_PushBack/Deque<harden> 100000000 10.3 ns/op 9 B/op

Edwin 66 Sep 15, 2022
Fast ring-buffer deque (double-ended queue)

deque Fast ring-buffer deque (double-ended queue) implementation. For a pictorial description, see the Deque diagram Installation $ go get github.com/

Andrew Gillis 383 Sep 27, 2022
Trie data structure implementation in Golang 🌳

Gotri Gotri is an Unicode character based Trie/prefix tree implementation in Go, with the suggestion/auto-complete feature for character searching. Si

Monir Zaman 9 Jun 17, 2022
Surprisingly space efficient trie in Golang(11 bits/key; 100 ns/get).

Slim - surprisingly space efficient data types in Golang Slim is collection of surprisingly space efficient data types, with corresponding serializati

null 1.8k Sep 20, 2022
A memory-efficient trie for testing the existence/prefixes of string only(for now).

Succinct Trie A memory-efficient trie for testing the existence/prefixes of string only(for now). Install go get -u github.com/nobekanai/sutrie Docume

野辺かない 2 Mar 10, 2022
An app with Trie tree and Breve search Implementation CLI and HTTP both 🥳

Introduction LifeLongLearner project consists of two different parts. My English Vocabulary My Technical Book Notes All of them provided by me within

A.Samet İleri 15 Jul 1, 2022
💯 Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator implements value validations for structs and individual fields based on tags.

Flamego 11 Aug 8, 2022
A serverless cluster computing system for the Go programming language

Bigslice Bigslice is a serverless cluster data processing system for Go. Bigslice exposes composable API that lets the user express data processing ta

GRAIL 508 Aug 31, 2022
CLRS study. Codes are written with golang.

algorithms CLRS study. Codes are written with golang. go version: 1.11 Heap BinaryHeap on array BinaryHeap on linkedlist LeftistHeap FibonacciHeap Tre

Apollo Li 671 Sep 23, 2022
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Ham

Hugo Bollon 361 Sep 27, 2022