Useful Go String methods

Overview

Go-string

Go Report Card Str Count Badge

Useful string utility functions for Go projects. Either because they are faster than the common Go version or do not exist in the standard library.

You can find all details here https://pkg.go.dev/github.com/boyter/go-string

Probably the most useful methods are IndexAll and IndexAllIgnoreCase which for string literal searches should be drop in replacements for regexp.FindAllIndex while totally avoiding the regular expression engine and as such being much faster.

Some quick benchmarks using a simple program which opens a 550MB file and searches over it in memory. Each search is done three times, the first using regexp.FindAllIndex and the second using IndexAllIgnoreCase.

For this specific example the wall clock time to run is at least 10x less, but with the same matching results.

$ ./csperf ſecret 550MB
File length 576683100

FindAllIndex (regex ignore case)
Scan took 25.403231773s 16680
Scan took 25.39742299s 16680
Scan took 25.227218738s 16680

IndexAllIgnoreCase (custom)
Scan took 2.04013314s 16680
Scan took 2.019360935s 16680
Scan took 1.996732171s 16680

The above example in code for you to copy

// Simple test comparison between various search methods
func main() {
	arg1 := os.Args[1]
	arg2 := os.Args[2]

	b, err := ioutil.ReadFile(arg2)
	if err != nil {
		fmt.Print(err)
		return
	}

	fmt.Println("File length", len(b))

	haystack := string(b)

	var start time.Time
	var elapsed time.Duration

	fmt.Println("\nFindAllIndex (regex)")
	r := regexp.MustCompile(regexp.QuoteMeta(arg1))
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAll (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAll(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	r = regexp.MustCompile(`(?i)` + regexp.QuoteMeta(arg1))
	fmt.Println("\nFindAllIndex (regex ignore case)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAllIgnoreCase (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAllIgnoreCase(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}
}

Note that it performs best with real documents and wost when searching over random data. Depending on what you are searching you may have a similar speed up or a marginal one.

FindAllIndex has a similar speed up,

// BenchmarkFindAllIndex-8                         2458844	       480.0 ns/op
// BenchmarkIndexAll-8                            14819680	        79.6 ns/op

See the benchmarks for full proof where they test various edge cases.

The other most useful method is HighlightString. HighlightString takes in some content and locations and then inserts in/out strings which can be used for highlighting around matching terms. For example you could pass in "test" and have it return "<strong>te</strong>st". The argument locations accepts output from regexp.FindAllIndex or the included IndexAllIgnoreCase or IndexAll.

All code is dual-licenced as either MIT or Unlicence. Your choice when you use it.

Note that as an Australian I cannot put this into the public domain, hence the choice most liberal licences I can find.

You might also like...
Probability distributions and associated methods in Go

godist godist provides some Go implementations of useful continuous and discrete probability distributions, as well as some handy methods for working

S3 Reverse Proxy with GET, PUT and DELETE methods and authentication (OpenID Connect and Basic Auth)
S3 Reverse Proxy with GET, PUT and DELETE methods and authentication (OpenID Connect and Basic Auth)

Menu Why ? Features Configuration Templates Open Policy Agent (OPA) API GET PUT DELETE AWS IAM Policy Grafana Dashboard Prometheus metrics Deployment

Benchmarks of Go serialization methods

Benchmarks of Go serialization methods This is a test suite for benchmarking various Go serialization methods. Tested serialization methods encoding/g

Package git provides an incomplete pure Go implementation of Git core methods.

git Package git provides an incomplete pure Go implementation of Git core methods. Example Code: store := git.TempStore() defer os.RemoveAll(string(st

pggen - generate type safe Go methods from Postgres SQL queries

pggen - generate type safe Go methods from Postgres SQL queries pggen is a tool that generates Go code to provide a typesafe wrapper around Postgres q

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptograp

🐙🐱📦 Additional GitHub API methods
🐙🐱📦 Additional GitHub API methods

octostats 🐙 🐱 📦 A supplementary Go package on top of go-github and githubv4 GitHub API Superstructure Installation 🔨 go get github.com/google/go-g

Robust & Easy to use struct mapper and utility methods for Go

go-model Robust & Easy to use model mapper and utility methods for Go struct. Typical methods increase productivity and make Go development more fun ?

🎨 Terminal color rendering library, support 8/16 colors, 256 colors, RGB color rendering output, support Print/Sprintf methods, compatible with Windows.
🎨 Terminal color rendering library, support 8/16 colors, 256 colors, RGB color rendering output, support Print/Sprintf methods, compatible with Windows.

🎨 Terminal color rendering library, support 8/16 colors, 256 colors, RGB color rendering output, support Print/Sprintf methods, compatible with Windows. GO CLI 控制台颜色渲染工具库,支持16色,256色,RGB色彩渲染输出,使用类似于 Print/Sprintf,兼容并支持 Windows 环境的色彩渲染

A rest api with the crud methods made it in golang
A rest api with the crud methods made it in golang

go-API-REST A rest api made it in golang that connects to a mongodb database This API is compatible with the Angular frontend from https://github.com/

accessor methods generator for Go programming language

accessory accessory is an accessor generator for Go programming language. What is accessory? Accessory is a tool that generates accessor methods from

Set of functions/methods that will ease GO code generation

Set of functions/methods that will ease GO code generation

Script in Golang using Go 1.6 std lib methods to traverse directories and read the files

dev-check-in Script in Golang using Go 1.6 std lib methods to traverse directories and read the files Using only the standard library. It will find al

Image resizing for the Go programming language with common interpolation methods

This package is no longer being updated! Please look for alternatives if that bothers you. Resize Image resizing for the Go programming language with

A customized go list with index, sort, append, pop, count, clear and last item methods

golist A customized go list with index, sort, append, pop, count, clear and last item methods About The list data type has some more methods. Here are

A simple GO module providing CRUD and match methods on a User "entity" stored locally as JSON

A simple GO module providing CRUD and match methods on a User "entity" stored locally as JSON. Created for GO language learning purposes. Once finishe

Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transforming and consuming them.

iter Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transformi

Error handling hook & helper function to simplify writing API handler methods in Go.

Error handling hook & helper function to simplify writing API handler methods in Go.

This is a small utility that finds unused exported Go symbols (functions, methods ...) in Go

This is a small utility that finds unused exported Go symbols (functions, methods ...) in Go. For all other similar use cases

Comments
  • Limit doesn't work how it does in re.FindAllIndex

    Limit doesn't work how it does in re.FindAllIndex

    The way you stop searching when limit is reached:

    https://github.com/boyter/go-string/blob/2ecd9cca2b2d28df28fd5f4f3609f4acaee8ce53/index.go#L182

    ...means that you can't return the same results as re.FindAllIndex(), because it will return the first N matches and you'll return the first N matches over the first set of case permutations.

    You need to search for upto limit results for each case permutation and then return the first N matches from that set of results.

    bug 
    opened by james-antill 2
Releases(v1.0.2)
Owner
Ben Boyter
Codemonkey
Ben Boyter
A library to remove special characters from a string.

spechar Is a small library for removing special characters from strings. Install First you have to install the package: go get github.com/gowizzard/sp

Jonas Kwiedor 0 Dec 13, 2021
Package trn introduces a Range type with useful methods to perform complex operations over time ranges

Time Ranges Package trn introduces a Range type with useful methods to perform c

CappuccinoTeam 39 Aug 18, 2022
Golang metrics for calculating string similarity and other string utility functions

strutil strutil provides string metrics for calculating string similarity as well as other string utility functions. Full documentation can be found a

Adrian-George Bostan 122 Sep 22, 2022
Recursively searches a map[string]interface{} structure for another map[string]interface{} structure

msirecurse Recursively searches a map[string]interface{} structure for existence of a map[string]interface{} structure Motivation I wrote this package

Fred Moyer 1 Mar 3, 2022
Inflection is a string transformation library. It transforms strings from CamelCase to underscored string.

Inflection Inflection is a string transformation library. It transforms strings from CamelCase to underscored string. This is an implement of Inflecti

null 2 Jul 25, 2022
Golang ultimate ANSI-colors that supports Printf/Sprintf methods

Aurora Ultimate ANSI colors for Golang. The package supports Printf/Sprintf etc. TOC Installation Usage Simple Printf aurora.Sprintf Enable/Disable co

Konstantin Ivanov 1.2k Sep 15, 2022
JSON or YAML configuration wrapper with convenient access methods.

Config Package config provides convenient access methods to configuration stored as JSON or YAML. This is a fork of the original version. This version

Oleg Lebedev 254 Sep 26, 2022
A Go (golang) package that enhances the standard database/sql package by providing powerful data retrieval methods as well as DB-agnostic query building capabilities.

ozzo-dbx Summary Description Requirements Installation Supported Databases Getting Started Connecting to Database Executing Queries Binding Parameters

Ozzo Framework 574 Sep 22, 2022
Package kml provides convenince methods for creating and writing KML documents.

go-kml Package kml provides convenience methods for creating and writing KML documents. Key Features Simple API for building arbitrarily complex KML d

Tom Payne 66 Jul 29, 2022