Unified text diffing in Go (copy of the internal diffing packages the officlal Go language server uses)

Overview

gotextdiff - unified text diffing in Go Go Reference Hexops logo

This is a copy of the Go text diffing packages that the official Go language server gopls uses internally to generate unified diffs.

If you've previously tried to generate unified text diffs in Go (like the ones you see in Git and on GitHub), you may have found github.com/sergi/go-diff which is a Go port of Neil Fraser's google-diff-match-patch code - however it does not support unified diffs.

This is arguably one of the best (and most maintained) unified text diffing packages in Go as of at least 2020.

(All credit goes to the Go authors, I am merely re-publishing their work so others can use it.)

Example usage

Import the packages:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
)

Assuming you want to diff a.txt and b.txt, whose contents are stored in aString and bString then:

edits := myers.ComputeEdits(span.URIFromPath("a.txt"), aString, bString)
diff := fmt.Sprint(gotextdiff.ToUnified("a.txt", "b.txt", aString, edits))

diff will be a string like:

--- a.txt
+++ b.txt
@@ -1,13 +1,28 @@
-foo
+bar

API compatability

We will publish a new major version anytime the API changes in a backwards-incompatible way. Because the upstream is not being developed with this being a public package in mind, API breakages may occur more often than in other Go packages (but you can always continue using the old version thanks to Go modules.)

Alternatives

Contributing

We will only accept changes made upstream, please send any contributions to the upstream instead! Compared to the upstream, only import paths will be modified (to be non-internal so they are importable.) The only thing we add here is this README.

License

See https://github.com/golang/tools/blob/master/LICENSE

Issues
  • used extremely large amount of memory

    used extremely large amount of memory

    Just comparing 2 solaris pkginfo output used about 1.5Gb of memory (attached).

    	edits := myers.ComputeEdits(span.URIFromPath(""), PrevContent, Content)
    	unifiedPatch := gotextdiff.ToUnified("src", "dst", PrevContent, edits)
    	var DiffContent = fmt.Sprint(unifiedPatch)
    
    	fmt.Printf("!!!! len(DiffContent):%+v\n", len(DiffContent))
    
    /usr/bin/time -v go run scratch.go
    !!!! len(DiffContent):1152378
    	Command being timed: "go run scratch.go"
    	User time (seconds): 2.07
    	System time (seconds): 2.51
    	Percent of CPU this job got: 102%
    	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.46
    	Average shared text size (kbytes): 0
    	Average unshared data size (kbytes): 0
    	Average stack size (kbytes): 0
    	Average total size (kbytes): 0
    	Maximum resident set size (kbytes): 15095320
    	Average resident set size (kbytes): 0
    	Major (requiring I/O) page faults: 5
    	Minor (reclaiming a frame) page faults: 3793204
    	Voluntary context switches: 3058
    	Involuntary context switches: 434
    	Swaps: 0
    	File system inputs: 0
    	File system outputs: 5640
    	Socket messages sent: 0
    	Socket messages received: 0
    	Signals delivered: 0
    	Page size (bytes): 4096
    	Exit status: 0
    

    solaris-sw-pkginfo.txt prev-pkginfo.txt

    opened by siff-duke 0
  • Merge algorithm is inadequate

    Merge algorithm is inadequate

    Hello,

    I'm curious if a better merge algorithm would be considered in-scope. Right now the patch always applies to the same line position regardless of context. When I found this package I was looking for something with more similar behaviour to Git, but I wasn't able to find such a thing and will need to implement it myself.

    opened by KernelDeimos 0
Owner
Hexops
Experiment everywhere
Hexops
Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Nikita Vershinin 55 Jul 22, 2022
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

?? The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Peter M. Stahl 688 Jul 28, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Charles Banning 523 Aug 7, 2022
Similar to Anki but uses the actual frequency of words

wordGame A program that uses a frequency-annotated vocabulary list to learn as efficiently as possible. Usage go run wordGame.go -freqTableFname=itali

null 3 Sep 21, 2021
Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

Converts a number to its English counterpart. Uses arbitrary precision; so a number of any size can be converted.

null 0 Dec 14, 2021
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

John Moore 76 Jul 31, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Sebastian Müller 55 Apr 13, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

JF Technology 482 Aug 1, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

ZoomIO 23 Jul 17, 2022
Extract urls from text

xurls Extract urls from text using regular expressions. Requires Go 1.13 or later. import "mvdan.cc/xurls/v2" func main() { rxRelaxed := xurls.Relax

Daniel Martí 939 Jul 29, 2022
Easy AWK-style text processing in Go

awk Description awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitt

Scott Pakin 94 Jul 25, 2022
Change the color of console text.

go-colortext package This is a package to change the color of the text and background in the console, working both under Windows and other systems. Un

Yi Deng 211 Jul 25, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Michał Derkacz 74 Mar 15, 2022
Package sanitize provides functions for sanitizing text in golang strings.

sanitize Package sanitize provides functions to sanitize html and paths with go (golang). FUNCTIONS sanitize.Accents(s string) string Accents replaces

Kenny Grant 319 Jul 17, 2022
Small and fast FTS (full text search)

Microfts A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as

Bill Burdick 27 Jul 30, 2022
text to speech bot for discord

text to speech bot for discord

takanakahiko 20 Jul 1, 2022
A diff3 text merge implementation in Go

Diff3 A diff3 text merge implementation in Go based on the awesome paper below. "A Formal Investigation of Diff3" by Sanjeev Khanna, Keshav Kunal, and

Keenan Nemetz 19 Apr 4, 2022
gomtch - find text even if it doesn't want to be found

gomtch - find text even if it doesn't want to be found Do your users have clever ways to hide some terms from you? Sometimes it is hard to find forbid

Nicolas Augusto Sassi 27 Apr 22, 2022
Convert scanned image PDF file to text annotated PDF file

Jisui (自炊) This tool is PoC (Proof of Concept). Jisui is a helper tool to create e-book. Ordinary the scanned book have not text information, so you c

Takumasa Sakao 27 Apr 7, 2022