efficient string matching in Golang via the aho-corasick algorithm.

Overview

aho-corasick

Efficient string matching in Golang via the aho-corasick algorithm.

x20 faster than https://github.com/cloudflare/ahocorasick and x3 faster than https://github.com/anknown/ahocorasick

Memory consuption is a eigth of https://github.com/cloudflare/ahocorasick and half of https://github.com/anknown/ahocorasick

This library is heavily inspired by https://github.com/BurntSushi/aho-corasick

Usage

builder := NewAhoCorasickBuilder(Opts{
  AsciiCaseInsensitive: true,
  MatchOnlyWholeWords:  true,
  MatchKind:            LeftMostLongestMatch,
  DFA:                  true,
})

ac := builder.Build([]string{"bear", "masha"})
haystack := "The Bear and Masha"
matches := ac.FindAll(haystack)

for _, match := range matches {
  println(haystack[match.Start():match.End()])
}

Matching can be done via NFA or DFA. NFA has runtime complexity O(N + M) in relation to the haystack and number of matches. DFA has runtime complexity O(N), but it uses more memory.

Replacing of matches in the haystack.

replaceWith needs to be the same length as the patterns

r := NewReplacer(ac)
replaced := r.ReplaceAll(haystack, replaceWith)

ReplaceAllFunc is useful, for example, if you want to use the original text cassing but you are matching case insensitively. You can replace partially by return false and from that point, the original string will be preserved.

replaced := r.ReplaceAllFunc(haystack, func(match Match) (string, bool) {
    return `<a>` + haystack[match.Start():match.End()] + `<\a>`, true
})

Search for matches one at a time via the iterator

iter := ac.Iter(haystack)

for next := iter.Next(); next != nil; next = iter.Next() {
    ...
}

It's plenty fast but if you want to use it in parallel, that is also possible.

Memory consumption won't increase because the read-only automaton is not actually copied, only the counters are.

The magic line is ac := ac

var w sync.WaitGroup

w.Add(50)
for i := 0; i < 50; i++ {
    go func() {
        ac := ac
        matches := ac.FindAll(haystack)
        println(len(matches))
        w.Done()
    }()
}
w.Wait()
Issues
  • Should Opts structs fields be exported ?

    Should Opts structs fields be exported ?

    Hi, thanks for the work on this. I'm having a few issues with the Opts struct (I'm relatively new to Go). I seem to get an exported issue with it, so wondering if the fields should be in caps ?

    opened by ibrierley 2
  • export DFA options

    export DFA options

    opened by petar-dambovaliev 0
  • DFA implementation

    DFA implementation

    solves https://github.com/petar-dambovaliev/aho-corasick/issues/2

    opened by petar-dambovaliev 0
Owner
Petar Dambovaliev
(Go/Rust) developer/Musician
Petar Dambovaliev
Simplistic interactive filtering tool

peco Simplistic interactive filtering tool NOTE: If you are viewing this on GitHub, this document refers to the state of peco in whatever current bran

null 6.4k Jun 15, 2021
A command-line tool and library for generating regular expressions from user-provided test cases

Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4

Peter M. Stahl 4k Jun 12, 2021
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.

dasel Dasel (short for data-selector) allows you to query and modify data structures using selector strings. Comparable to jq / yq, but supports JSON,

Tom Wright 919 Jun 13, 2021
Fully featured Go (golang) command line option parser with built-in auto-completion support.

go-getoptions Go option parser inspired on the flexibility of Perl’s GetOpt::Long. Table of Contents Quick overview Examples Simple script Program wit

David Gamba 35 Jun 8, 2021
ASCII table in golang

ASCII Table Writer Generate ASCII table on the fly ... Installation is simple as go get github.com/olekukonko/tablewriter Features Automatic Padding

Oleku Konko 2.8k Jun 6, 2021
The personal information dashboard for your terminal

WTF (aka 'wtfutil') is the personal information dashboard for your terminal, providing at-a-glance access to your very important but infrequently-need

WTFUtil 12.5k Jun 19, 2021
Secure, private and feature-rich CLI password manager

Kure Kure is a free and open-source password manager for the command-line. This project aims to offer the most secure and private way of operating wit

Gastón Palomeque 105 Jun 10, 2021
A fast diff tool for comparing csv files

csvdiff A fast diff tool for comparing csv files. What is csvdiff? Csvdiff is a difftool to compute changes between two csv files. It is not a traditi

Aswin Karthik 320 May 30, 2021
git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.

Table of contents Introduction Reference Contributing Introduction Overview git-xargs is a command-line tool (CLI) for making updates across multiple

Gruntwork 433 Jun 13, 2021
CLI for Shamir's Secret Sharing and AES key generation, encryption, and decryption.

Shush ?? This simple program will help you run Shamir's Secret Sharing algorithm on any file using the split and merge commands.

null 22 May 27, 2021
Automatically generate Go (golang) struct definitions from example JSON

gojson gojson generates go struct definitions from json or yaml documents. Example $ curl -s https://api.github.com/repos/chimeracoder/gojson | gojson

Aditya Mukerjee 2.4k Jun 13, 2021
Source code editor in pure Go.

Editor Source code editor in pure Go. About This is a simple but advanced source code editor As the editor is being developed, the rules of how the UI

Jorge Miguel Pinto 224 Jun 5, 2021
vgrep - a user-friendly pager for grep

vgrep is a pager for grep, git-grep, ripgrep and similar grep implementations, and allows for opening the indexed file locations in a user-specified e

Valentin Rothberg 499 Jun 7, 2021
A versatile library for building CLI applications in Go

mow.cli Package cli provides a framework to build command line applications in Go with most of the burden of arguments parsing and validation placed o

Jawher Moussa 739 Jun 14, 2021