String-matching in Golang using the Knuth–Morris–Pratt algorithm (KMP)

Overview

gokmp

String-matching in Golang using the Knuth–Morris–Pratt algorithm (KMP).

Disclaimer

This library was written as part of my Master's Thesis and should be used as a helpful implementation reference for people interested in the Knuth-Morris-Pratt algorithm than as a performance string searching library.

I believe the compiler has since caught up to most of the gains that this library bought me back in the day.

See Documentation on GoDoc.

Example:

package main

import (
  "fmt"
	"github.com/paddie/gokmp"
)

const str = "aabaabaaaabbaabaabaaabbaabaabb"
//          "        _          _      _   "
//                   8          19     26
const pattern = "aabb"

func main() {
	kmp, _ := gokmp.NewKMP(pattern)
	ints := kmp.FindAllStringIndex(str)

	fmt.Println(ints)
}

Output:

[8 19 26]

Tests and Benchmarks:

go test -v -bench .

Output:

=== RUN TestFindAllStringIndex
--- PASS: TestFindAllStringIndex (0.00 seconds)
=== RUN TestFindStringIndex
--- PASS: TestFindStringIndex (0.00 seconds)
=== RUN TestContainedIn
--- PASS: TestContainedIn (0.00 seconds)
=== RUN TestOccurrences
--- PASS: TestOccurrences (0.00 seconds)
=== RUN TestOccurrencesFail
--- PASS: TestOccurrencesFail (0.00 seconds)
PASS
BenchmarkKMPIndexComparison	10000000	       178 ns/op
BenchmarkStringsIndexComparison	10000000	       359 ns/op
ok  	github.com/paddie/gokmp	5.854s

Comparison:

gokmp.FindStringIndex / strings.Index = 178/359 = 0.4958

Almost a 2x improvement over the naive built-in method.

You might also like...
A simple json parser built using golang

jsonparser A simple json parser built using golang Installation: go get -u githu

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Goribot 一个分布式友好的轻量的 Golang 爬虫框架。 完整文档 | Document !! Warning !! Goribot 已经被迁移到 Gospider|github.com/zhshch2002/gospider。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续

Match regex group into go struct using struct tags and automatic parsing

regroup Simple library to match regex expression named groups into go struct using struct tags and automatic parsing Installing go get github.com/oris

Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.
:triangular_ruler:gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt.

gofmtmd gofmtmd formats go source code block in Markdown. detects fenced code & formats code using gofmt. Installation $ go get github.com/po3rin/gofm

Search for Go code using syntax trees

gogrep GO111MODULE=on go get mvdan.cc/gogrep Search for Go code using syntax trees. Work in progress. gogrep -x 'if $x != nil { return $x, $*_ }' In

gofontrender renders text with different parameters using the go font renderer

gofontrender This simple program renders text using the go font render. It computes the anti-aliasing by computing the exact pixel coverage with the a

golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)
golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新,保留最基本的预览功能。服务器配置较低,如果出现链接超时请等待几秒刷新重试,或者换Chrome) 目前已经完成 docker部署 (不用为运行环境烦恼) Wor

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

bluemonday bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user generated content as

Comments
  • why TEN times slower than strings.contains

    why TEN times slower than strings.contains

    contains( 100001 times) * str len( 1193 ) run time: 482.834µs kmp-containedIn( 100001 times) * str len( 1193 ) run time: 376.550657ms

    strings.contains:

            str := "735224100,1152888111,1112227112,1113337982,111777642..."
            var temp bool // prevent compiler optimization
    
            //strings.containsIn
            containsCount := 1
    	containsRunTime := time.Now()
    CONTAINSLOOP:
    	if strings.Contains("111111111", str) {
    		temp = true
    	}
    
    	containsCount++
    	if containsCount <= MaxLoop {
    		goto CONTAINSLOOP
    	}
    
    	fmt.Println("contains(", MaxLoop, "times) * str len(", len(slice), ") run time:\n", time.Since(containsRunTime))
    
            //kmp.containsIn
    
            kmpCount := 1
    	kmpRunTime := time.Now()
    	kmp, _ := NewKMP("1111111111")
    KMPLOOP:
    	if kmp.ContainedIn(str) {
    		temp = true
    	}
    
    	kmpCount++
    	if kmpCount <= MaxLoop {
    		goto KMPLOOP
    	}
    	fmt.Println("kmp-containedIn(", MaxLoop, "times) * str len(", len(slice), ") run time:\n", time.Since(kmpRunTime))
    
    opened by XxYyKk 1
  • Support searching in streams

    Support searching in streams

    Awesome library and thank you for making it public!

    I'd like to be able to search inside streams (basically any Reader, or maybe ReaderAt), which should probably not be a hard change! I'll try to implement it myself over the next week, but filing this issue to keep track...

    opened by yuvipanda 1
agrep-like fuzzy matching, but made faster using Golang and precomputation.

goagrep There are situations where you want to take the user's input and match a primary key in a database. But, immediately a problem is introduced:

Zack 41 Nov 30, 2021
A fast string sorting algorithm (MSD radix sort)

Your basic radix sort A fast string sorting algorithm This is an optimized sorting algorithm equivalent to sort.Strings in the Go standard library. Fo

Algorithms to Go 178 Aug 17, 2022
User agent string parser in golang

User agent parsing useragent is a library written in golang to parse user agent strings. Usage First install the library with: go get xojoc.pw/userage

Alexandru Cojocaru 71 Aug 2, 2021
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Charles Banning 525 Sep 16, 2022
A Go slugify application that handles string

slugify A Go slugify application that handles string Example: package main import ( "fmt" "github.com/avelino/slugify" ) func main() { text := "E

Avelino 32 Sep 27, 2022
A collection of well-known string hash functions, implemented in Go

This library is a collection of "well-known" 32-bit string hashes, implemented in Go. It includes: Java string hash ELF-32 Jenkins' One-A

Damian Gryski 64 Mar 3, 2022
String i18n utilities for the Go Programming Language

About polyglot polyglot is a String translation package and tool for Go. Setup Make sure you have a working Go installation. See Getting Started Now r

Alexander Neumann 39 Sep 14, 2022
Package strit introduces a new type of string iterator, along with a number of iterator constructors, wrappers and combinators.

strit Package strit (STRing ITerator) assists in development of string processing pipelines by providing a simple iteration model that allows for easy

Maxim 84 Jun 21, 2022
Go implementation of the bspatch algorithm

binarydist Package binarydist implements binary diff and patch as described on http://www.daemonology.net/bsdiff/. It reads and writes files compatibl

Keith Rarick 281 Sep 7, 2022
Frongo is a Golang package to create HTML/CSS components using only the Go language.

Frongo Frongo is a Go tool to make HTML/CSS document out of Golang code. It was designed with readability and usability in mind, so HTML objects are c

Rewan_ 21 Jul 29, 2021