Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.

Overview

A Go package for calculating the Levenshtein distance between two strings

Release GoDoc  Build Status Coverage Status Go Report Card

This package implements distance and similarity metrics for strings, based on the Levenshtein measure, in Go.

Project Status

v1.2.3 Stable: Guaranteed no breaking changes to the API in future v1.x releases. Probably safe to use in production, though provided on "AS IS" basis.

This package is being actively maintained. If you encounter any problems or have any suggestions for improvement, please open an issue. Pull requests are welcome.

Overview

The Levenshtein Distance between two strings is the minimum total cost of edits that would convert the first string into the second. The allowed edit operations are insertions, deletions, and substitutions, all at character (one UTF-8 code point) level. Each operation has a default cost of 1, but each can be assigned its own cost equal to or greater than 0.

A Distance of 0 means the two strings are identical, and the higher the value the more different the strings. Since in practice we are interested in finding if the two strings are "close enough", it often does not make sense to continue the calculation once the result is mathematically guaranteed to exceed a desired threshold. Providing this value to the Distance function allows it to take a shortcut and return a lower bound instead of an exact cost when the threshold is exceeded.

The Similarity function calculates the distance, then converts it into a normalized metric within the range 0..1, with 1 meaning the strings are identical, and 0 that they have nothing in common. A minimum similarity threshold can be provided to speed up the calculation of the metric for strings that are far too dissimilar for the purpose at hand. All values under this threshold are rounded down to 0.

The Match function provides a similarity metric, with the same range and meaning as Similarity, but with a bonus for string pairs that share a common prefix and have a similarity above a "bonus threshold". It uses the same method as proposed by Winkler for the Jaro distance, and the reasoning behind it is that these string pairs are very likely spelling variations or errors, and they are more closely linked than the edit distance alone would suggest.

The underlying Calculate function is also exported, to allow the building of other derivative metrics, if needed.

Installation

go get github.com/agext/levenshtein

License

Package levenshtein is released under the Apache 2.0 license. See the LICENSE file for details.

Issues
  • Enable go modules

    Enable go modules

    Go modules are becoming a de-facto standard in the Go community and adding go.mod files with metadata helps projects which may consume this repository as a dependency in deciding how to pin transitive dependencies and how to resolve conflicts.

    This is a result of the following commands:

    go mod init
    go get ./...
    go mod tidy
    

    in a clean Go 1.11.5 environment.

    opened by radeksimko 4
  • panic: runtime error: index out of range

    panic: runtime error: index out of range

    Hello,

    please excuse if this is not an issue with your code but as a beginner in Golang it seems to me like it could be.

    Here is the sample code I run:

    package main
    
    import "fmt"
    import "github.com/davecgh/go-spew/spew"
    import "github.com/agext/levenshtein"
    
    func main () {
    	var myparams = levenshtein.NewParams()
    	myparams.MinScore(0.92)
    	spew.Dump(myparams)
    
    	fmt.Println(levenshtein.Match(`foo`, `foo`, myparams))
    	fmt.Println(levenshtein.Match(`foo`, `bar`, myparams))
    	fmt.Println(levenshtein.Match(`NetworkManager-glib-0.8.1-5.el6_0.1.i686.rpm`, `gdm-plugin-fingerprint-2.30.4-39.el6.i686.rpm`, myparams))
    }
    

    It produces the following output:

    (*levenshtein.Params)(0xc000094100)({
     insCost: (int) 1,
     subCost: (int) 1,
     delCost: (int) 1,
     maxCost: (int) 0,
     minScore: (float64) 0.92,
     bonusPrefix: (int) 4,
     bonusScale: (float64) 0.1,
     bonusThreshold: (float64) 0.7
    })
    1
    0
    panic: runtime error: index out of range
    
    goroutine 1 [running]:
    github.com/agext/levenshtein.Calculate(0xc0000ca000, 0x24, 0x2c, 0xc0000cc000, 0x23, 0x30, 0x5, 0x1, 0x1, 0x1, ...)
    	/Users/smeier/go/src/github.com/agext/levenshtein/levenshtein.go:112 +0x8ba
    github.com/agext/levenshtein.Match(0x11036f6, 0x2c, 0x11037ff, 0x2d, 0xc000094100, 0x2)
    	/Users/smeier/go/src/github.com/agext/levenshtein/levenshtein.go:272 +0x1a8
    main.main()
    	/Users/smeier/Documents/grabbag/levenshtein_bug.go:14 +0x271
    exit status 2
    
    

    I am running go version go1.12.5 darwin/amd64.

    Any help or comments are appreciated.

    opened by stevemeier 3
  • Getting a 500 error when trying to pull this dep

    Getting a 500 error when trying to pull this dep

    OI have the following in my Gopkg.lock file:

    [[projects]]
      name = "github.com/agext/levenshtein"
      packages = ["."]
      revision = "5f10fee965225ac1eecdc234c09daf5cd9e7f7b6"
      version = "v1.2.1"
    

    Output from running dep ensure.

    Solving failure: no valid source could be created: 
            failed to set up sources from the following URLs:
    https://github.com/agext/levenshtein
    : remote repository at https://github.com/agext/levenshtein does not exist, or is inaccessible: remote: Internal Server Error.
    remote: 
    fatal: unable to access 'https://github.com/agext/levenshtein/': The requested URL returned error: 500
    : exit status 128
            failed to set up sources from the following URLs:
    ssh://[email protected]/agext/levenshtein
    : remote repository at ssh://[email protected]/agext/levenshtein does not exist, or is inaccessible: ERROR: 
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.
    : exit status 128
            failed to set up sources from the following URLs:
    git://github.com/agext/levenshtein
    : remote repository at git://github.com/agext/levenshtein does not exist, or is inaccessible: fatal: remote error: 
      
    : exit status 128
            failed to set up sources from the following URLs:
    http://github.com/agext/levenshtein
    : remote repository at http://github.com/agext/levenshtein does not exist, or is inaccessible: remote: Internal Server Error.
    remote: 
    fatal: unable to access 'http://github.com/agext/levenshtein/': The requested URL returned error: 500
    : exit status 128
    
    opened by HairyMike 2
  • Getting the issue 422  while pulling github.com/mattn/goveralls

    Getting the issue 422 while pulling github.com/mattn/goveralls

    package levenshtein failing as bwlow when it tried to access github.com/mattn/goveralls https://travis-ci.com/github/asellappen/levenshtein/jobs/398770419

    Testing with goveralls... go: downloading github.com/mattn/goveralls v0.0.7 go: github.com/mattn/goveralls upgrade => v0.0.7 go: downloading golang.org/x/tools v0.0.0-20200522201501-cb1345f3a375 Bad response status from coveralls: 422 {"message":"Couldn't find a repository matching this job.","error":true} The command "./test.sh $TEST_METHOD" exited with 1.

    have tried with adding COVERALLS_TOKEN in travis.yml as below and still its failing , travis encrypt COVERALLS_TOKEN=your_token_goes_here --add env.global env: global: secure: 0wWHj9zvAqpjPMxngeVm6+WwjynAAfOwoGy1yBnwiH7rsOskuWbf7aaaOhVC65fH+wiRbmBIpGMlVXv5wrKGrK7EXxMcoXBtie0MoK2UCWRwJcYG3+e4bZ9wLe+3eBRGK/EbT8GawHrJaZxPvhg......................................

    Please help to check and provide a fix for this?

    opened by asellappen 1
Releases(v1.2.3)
Owner
AGExt
ALRUX Go Extensions - an opensource golang library aiming to save developer time and sanity :)
AGExt
Go implementation to calculate Levenshtein Distance.

levenshtein Go package to calculate the Levenshtein Distance The library is fully capable of working with non-ascii strings. But the strings are not n

Agniva De Sarker 206 Jun 20, 2022
Some algorithms in go: maxflow(min-cuts or graph-cuts), edit-distance.

Algorithms In this repository, some algorithms are implemented in go language. GoDoc link: ed maxflow About Max-flow problem: A flow network is repres

Yi Deng 14 Apr 9, 2021
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Trie Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. Usage Create a Trie with: t := trie.New() Add Keys with:

Derek Parker 589 Jun 17, 2022
A prefix tree implementation in go

Trie (Prefix tree) This library is compatible with Go 1.11+ Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Us

Viant, Inc 27 Jun 15, 2022
A prefix-enhanced map in Go

PrefixMap PrefixMap is a prefix-enhanced map that eases the retrieval of values based on key prefixes. Quick Start Creating a PrefixMap // creates the

Alessandro Diaferia 30 Jun 13, 2022
Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph

Graphoscope A solution to access multiple independent data sources from a common UI and show data relations as a graph: Contains a list of by default

CERT.LV 29 May 26, 2022
Common data structures for solving problems in Golang

datastructs Common data structs for solving problems in Golang. List of data structures can be found in datastructs/pkg Rules for data structures Don'

Akira Masuda 1 Nov 7, 2021
Two approach for solving common items problem using Golang

Compare Two Arrays For Common Items Given two seperate arrays of integers, create a function that take two arrays and check for common itemss. Example

null 0 Nov 28, 2021
Common algorithms written in Go.

Common Algorithms in Go This repository contains a collection of a variety of common algorithms implemented using Go. Algorithms Implemented Search Li

Broden Wanner 1 Dec 28, 2021
Grokking-algorithms-go - Solutions to common Data Structures problems

This is a repository dedicated to study, learn and solve Data Structure algorith

Gabriel Magalhães 0 Apr 4, 2022
A tree like tool help you to explore data structures in your redis server

Redis-view is a tree like tool help you explore data structures in your redis server

dreamersdw 19 Mar 17, 2022
Python-like dictionaries for Go

Dict Python dictionary data type (dict) in Go Package dict is a Go implementation of Python dict, which are hashable object maps. Dictionaries complem

Mr. Gus 25 Jun 11, 2022
An yet-another red-black tree implementation, with a C++ STL-like API.

A red-black tree with an API similar to C++ STL's. INSTALLATION go get github.com/yasushi-saito/rbtree EXAMPLE More examples can be fou

Yasushi Saito 18 Apr 25, 2022
publish a tree-like data structure from a backend to a front-end

tree-publish publish a tree-like data structure from a backend to a front-end. It needs to be a tree in order to publish the data as JSON document. If

Lutz Behnke 0 Dec 20, 2021
GoStruct2Table - format your struct like a table.

GoStruct2Table format your struct like a table. Installing $ go get -u -v github.com/runningzyp/GoStruct2Table Simple Example import parser "github.c

YunpengZhan 9 Jun 14, 2022
ZSet is an in-memory Redis like sorted set datastructure

zset Getting Started Installing To start using hash, install Go and run go get: $ go get -u github.com/arriqaaq/zset This will retrieve the library. U

Farhan 4 May 16, 2022
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.

EStruct traverses javascript projects and maps all the dependencies and relationships to a JSON. The output can be used to build network visualizations of the project and document the architecture.

Ray Luxembourg 11 Jan 27, 2022
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.

Hilbert Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves. Documentation available here This is not an

Google 252 Jun 20, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Charles Banning 521 Jun 28, 2022