efficient string matching in Golang via the aho-corasick algorithm.

Overview

aho-corasick

Efficient string matching in Golang via the aho-corasick algorithm.

x20 faster than https://github.com/cloudflare/ahocorasick and x3 faster than https://github.com/anknown/ahocorasick

Memory consuption is a eigth of https://github.com/cloudflare/ahocorasick and half of https://github.com/anknown/ahocorasick

This library is heavily inspired by https://github.com/BurntSushi/aho-corasick

Usage

builder := NewAhoCorasickBuilder(Opts{
  AsciiCaseInsensitive: true,
  MatchOnlyWholeWords:  true,
  MatchKind:            LeftMostLongestMatch,
  DFA:                  true,
})

ac := builder.Build([]string{"bear", "masha"})
haystack := "The Bear and Masha"
matches := ac.FindAll(haystack)

for _, match := range matches {
  println(haystack[match.Start():match.End()])
}

Matching can be done via NFA or DFA. NFA has runtime complexity O(N + M) in relation to the haystack and number of matches. DFA has runtime complexity O(N), but it uses more memory.

Replacing of matches in the haystack.

replaceWith needs to be the same length as the patterns

r := NewReplacer(ac)
replaced := r.ReplaceAll(haystack, replaceWith)

ReplaceAllFunc is useful, for example, if you want to use the original text cassing but you are matching case insensitively. You can replace partially by return false and from that point, the original string will be preserved.

replaced := r.ReplaceAllFunc(haystack, func(match Match) (string, bool) {
    return `<a>` + haystack[match.Start():match.End()] + `<\a>`, true
})

Search for matches one at a time via the iterator

iter := ac.Iter(haystack)

for next := iter.Next(); next != nil; next = iter.Next() {
    ...
}

It's plenty fast but if you want to use it in parallel, that is also possible.

Memory consumption won't increase because the read-only automaton is not actually copied, only the counters are.

The magic line is ac := ac

var w sync.WaitGroup

w.Add(50)
for i := 0; i < 50; i++ {
    go func() {
        ac := ac
        matches := ac.FindAll(haystack)
        println(len(matches))
        w.Done()
    }()
}
w.Wait()
Issues
  • Provide methods that accept `[]byte`.

    Provide methods that accept `[]byte`.

    Hi,

    I really like the implementation but would love to have a FindAll(haystack []byte) and also a Build(patterns [][]byte). I saw that all patterns and haystacks are converted to []byte internally. What do you think about accepting []byte by default and have the string methods call these? E.g. FindAllByte(haystack []byte) and then FindAll(haystack string) { FindAllByte([]byte(haystack)) }.

    opened by jeschkies 2
  • Should Opts structs fields be exported ?

    Should Opts structs fields be exported ?

    Hi, thanks for the work on this. I'm having a few issues with the Opts struct (I'm relatively new to Go). I seem to get an exported issue with it, so wondering if the fields should be in caps ?

    opened by ibrierley 2
  • Provide byte functions.

    Provide byte functions.

    This patch adds FindAllByte and BuildeByte so that []byte patterns and haystacks can be passed directly without any conversion. The original string to []byte conversions are push up to the constructor methods.

    Closes #6

    opened by jeschkies 1
  • High memory usage compared to other implementation

    High memory usage compared to other implementation

    Hi, I found this while looking for a lower memory usage alternative to anknown/ahocorasick.

    I have a dataset of around 6 million strings. The total memory usage, as shown by pprof, after building the automaton is just over 30GB, compared to 6.5GB for the anknown version.

    Do you have any tips for working out why it's using so much more RAM?

    Thanks in advance.

    opened by jonrscott 0
Owner
Petar Dambovaliev
(Go/Rust) developer/Musician
Petar Dambovaliev
Json-match - Command line util for matching values in a JSON input

json-match Match JSON input by specifying key and value > json-match -json '{\"p

Trond Boksasp 0 Jan 12, 2022
Fast, secure, efficient backup program

Introduction restic is a backup program that is fast, efficient and secure. It supports the three major operating systems (Linux, macOS, Windows) and

The restic backup program 17.1k Jun 19, 2022
Procmon is a Linux reimagining of the classic Procmon tool from the Sysinternals suite of tools for Windows. Procmon provides a convenient and efficient way for Linux developers to trace the syscall activity on the system.

Process Monitor for Linux (Preview) Process Monitor (Procmon) is a Linux reimagining of the classic Procmon tool from the Sysinternals suite of tools

Windows Sysinternals 3.3k Jun 25, 2022
Robust, flexible and resource-efficient pipelines using Go and the commandline

Robust, flexible and resource-efficient pipelines using Go and the commandline Project links: Documentation & Main Website | Issue Tracker | Chat Why

SciPipe 928 Jun 18, 2022
argv - Go library to split command line string as arguments array using the bash syntax.

Argv Argv is a library for Go to split command line string into arguments array. Documentation Documentation can be found at Godoc Example func TestAr

null 33 May 7, 2022
A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments, comma separated string with urls and an interval.

uptime A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments: -interva

Markus Tenghamn 6 Jan 6, 2022
cross-platform, cli app to perform various operations on string

sttr is command line software that allows you to quickly run various transformation operations on the string.

Abhimanyu Sharma 383 Jun 20, 2022
sttr is command line software that allows you to quickly run various transformation operations on the string.

sttr is command line software that allows you to quickly run various transformation operations on the string.

Abhimanyu Sharma 60 Sep 21, 2021
This is a Go Cli app that receives an string path to a log file, and based on it generates and prints in console an encoded polyline with the locations found in the log file.

GEOENCODE GO CLI APP DESCRIPTION This is a Go Cli app that receives an string path to a log file, and based on it generates and prints in console an e

Jose Luis Ojeda 1 Oct 1, 2021
Code examples for Algorithm Analysis and design (CS311) [School project]

Introduction Algorithm Analysis and design 2021/2022 Code examples implemeneted using golang Why Golang? Low Level programming language Awesome garbag

Mohammad Salah 0 Dec 5, 2021
Cli-algorithm - A cli program with A&DS in go!

cli-algorithm Objectives The objective of this cli is to implement 4 basic algorithms to sort arrays been Merge Sort Insertion Sort Bubble Sort Quick

Leonardo Brombilla Antunes 0 Jan 2, 2022
File backup via GoLang

SnakeSync Simple File Backup Tool Getting Started Requirements Go >= 1.17 CLI Usage Scan How to use the program via scan go run ./cmd -scan With Fla

Sadi Hakan 4 Nov 9, 2021
Listen podcast via CLI for golang

goplay Listen podcast via CLI.

Rick 16 Jun 17, 2022
:mag: Search the Go packages via command-line

GoSearch Search the Go packages for pkg.go.dev via command-line. It supports all search options in Search Help. Installation go get github.com/mingram

MinJae Kwon 53 Jun 23, 2022
Upterm is an open-source solution for sharing terminal sessions instantly over the public internet via secure tunnels.

Upterm is an open-source solution for sharing terminal sessions instantly over the public internet via secure tunnels.

Owen Ou 482 Jun 26, 2022
A multiple reverse shell sessions/clients manager via terminal written in go

A multiple reverse shell sessions/clients manager via terminal written in go

Krisna Pranav 12 Feb 23, 2022
A set of Go scripts to monitor YAGPDB status via the command-line.

A set of Go scripts to monitor YAGPDB status by making GET requests to the YAGPDB status endpoint.

Joe 2 Apr 20, 2022
A CLI to execute AT Commands via serial port connections.

AT Command CLI A CLI to execute AT Commands via serial port connections. Development Install Go Run go run main.go

Daniel Khaapamyaki 22 Jun 19, 2022
A command line http test tool. Maintain the case via git and pure text

httptest A command line http test tool Maintain the api test cases via git and pure text We want to test the APIs via http requests and assert the res

wklken 11 Jan 6, 2022