Small and fast FTS (full text search)

Overview

Microfts

A small full text indexing and search tool focusing on speed and space. Initial tests seem to indicate that the database takes about twice as much space as the files it indexes.

Microfts implements a trigram GIN (generalized inverted index), relying on LMDB for storage, an open source, embedded, NOSQL, key-value store library (so it’s linked into microfts, not an external service). It uses AskAlexSharov’s fork of bmatsuo’s lmdb-go package to connect to it.

LICENSE

Microfts is MIT licensed, (c) 2020 Bill Burdick. All rights reserved.

Building

Note that building may generate warning messages from lmdb-go’s compilation of the LMDB C code.

go build -o microfts

Examples

Creating a database

./microfts create /tmp/bubba

Adding Text

This adds /tmp/tst to the database in /tmp/bubba

rm -rf /tmp/bubba
./microfts create /tmp/bubba
cat > /tmp/tst <<here
one
two three
four
four five
one two three
one three two
here
./microfts input -file /tmp/bubba /tmp/tst

Getting Info

./microfts info /tmp/bubba

Searching

./microfts search /tmp/bubba "one two"

Deleting a file’s information

./microfts delete /tmp/bubba /tmp/tst

Reclaiming space in the database (only really matters after deleting a large file)

./microfts compact /tmp/bubba

Finding grams for a string

./microfts grams "this is a test"
./microfts grams -gx "this is a test"

Finding candidates for grams

./microfts search -candidates -grams /tmp/bubba thi tes est

Usage

Exit Codes

  1. misc error
  2. file is missing
  3. file has changed
  4. file is unreadable
  5. no entry for file in database
  6. database missing

Help text

Usage:
   microfts info -groups DB
                   print information about each group in the database,
                   whether it is missing or changed
                   whether it is an org-mode entry
   microfts info [-chunks] DB GROUP
                   print info for a GROUP
                   -chunks also prints the chunks in GROUP if it has a corresponding file
   microfts info [-grams] DB
                   print info for database
                   displays any groups which do not exist as files
                   displays any groups which refer to files that have changed
                   -grams displays distribution information about the trigram index
   microfts create [-s GRAMSIZE] DB
                   create DATABASE if it does not exist
   microfts chunk [-nx | -data D | -dx] -d DELIM DB GROUP GRAMS
   microfts chunk [-nx | -data D | -dx] -gx DB GROUP GRAMS
                   ADD a chunk to GROUP with GRAMS.
                   -d means use DELIM to split GRAMS.
                   -gx means GRAMS is hex encoded with two bytes for each gram using base 37.
   microfts grams [-gx] CHUNK
                   output grams for CHUNK
   microfts input [-nx | -dx | -org] DB FILE...
                   For each FILE, create a group with its name and add a CHUNK for each chunk of input.
                   Chunk data is the line number, offset, and length for each chunk (starting at 1).
                   -org means chunks are org elements, otherwise chunks are lines
   microfts delete [-nx] DB GROUP
                   delete GROUP, its chunks, and tag entries.
                   NOTE: THIS DOES NOT RECLAIM SPACE! USE COMPACT FOR THAT
   microfts compact DB
                   Reclaim space for deleted groups
   microfts search [-n | -partial | -f | - limit N | -filter REGEXP | -u] DB TEXT
                   query with TEXT for objects
                   -f force search to skip changed and missing files instead of exiting
                   -filter makes search only return chunks that match the REGEXP
                   REGEXP syntax is here: https://golang.org/pkg/regexp/syntax/
   microfts search -candidates [-grams | -gx | -gd | -n | -f | -limit N | -dx | -u] DB TERM1 ...
                   dispay all candidates with the grams for TERMS without filtering
                   -grams indicates TERMS are grams, otherwise extract grams from TERMS
                   -gx: grams are in hex, -gd: grams are in decimal, otherwise they are 3-char strings
   microfts data [-nx | -dx] DB GROUP
                   get data for each doc in GROUP
   microfts update [-t] DB
                   reinput files that have changed
                   delete files that have been removed
                   -t means do a test run, printing what would have happened
   microfts empty DB GROUP...
                   Create empty GROUPs, ignoring existing ones

   microfts is targeted for groups of small documents, like lines in a file.

  -candidates
        return docs with grams for search
  -chunks
        info DB GROUP: display all of a group's chunks
  -comp string
        compression type to use when creating a database
  -d string
        delimiter for unicode tags (default ",")
  -data string
        data to define for object
  -dx
        use hex instead of unicode for object data
  -end-format string
        search: Go format string for the end of a group
        Arg to printf is the FILE
        The default value is ""
        if -sexp is provided and -end-format is not, the default is "\n"
        Not used with search -fuzzy -sort
  -f    search: skip changed and missing files instead of exiting
  -file
        search: display files rather than chunks
  -filter string
        search: filter results that match REGEXP
  -format string
        search: Go format string for each result
        Args to printf are FILE POSITION LINE OFFSET PERCENTAGE CHUNK
        FILE (string) is the name of the file
        POSITION (int) is the 1-based character position of the chunk in the file
        LINE (int) is the 1-based line of the chunk in the file
        OFFSET (int) is the 0-based offset of the first match in the chunk
        PERCENTAGE (float) is the percentage of a fuzzy match
        Note that you can place [ARGNUM] after the % to pick a particular arg to format
        The default format is %s:%[2]s:%[5]s\n
        -sexp sets format to (:filename "%s" :line %[3]d :offset %[4]d :text "%[6]s" :percent %[5]f)
          Note that this will cause all matches to be on one (potentially large) line of output (default "%[6]s:%[2]d:%[5]s\n")
  -fuzzy float
        search: specify a percentage fuzzy match
  -gd
        use decimal instead of unicode for grams
  -grams
        get: specify tags for intead of text
        info: print gram coverage
        search: specify grams instead of search terms
  -groups
        info: display information for each group
  -gx
        use hex instead of unicode for grams
  -limit int
        search: limit the number of results (default 9223372036854775807)
  -n    only print line numbers for search
  -org
        index org-mode chunks instead of lines
  -partial
        search: allow partial matches in search
  -prof
        profile cpu
  -s int
        gram size
  -sep
        print candidates on separate lines
  -sexp
        search: output matches as an s-expression ((FILE (POS LINE OFFSET chunk) ... ) ... )
        POS is the 1-based character position of the chunk in the file
        LINE is the 1-based line of the chunk in the file
        OFFSET is the 0-based offset of the first match in the chunk
  -sort
        search -fuzzy: sort all matches
        This ignores start-format and end-format because it sorts all matches, regardless of
        which file they come from.
  -start-format string
        search: Go format string for the start of a group
        Arg to printf is the FILE
        The default value is ""
        Not used with search -fuzzy -sort
  -t    update: do a test run, printing what would have happened
  -u    search: update the database before searching
  -v    verbose

Notes

Grams

Only alphanumeric characters are represented faithfully in grams, other characters are considered whitespace and display as ‘.’. This makes a base-37 triple (0-9 and A-Z), which just fits into 2 bytes. Which is a big deal, spacewise. Grams for starts of words begin with two whitespaces and ends of words end with one whitespace. There are no grams that end with two whitespaces.

Groups and chunks

The index consists of grams for chunks that belong to groups. Groups have names and the default is to use file names as group names.

Supported groups and chunks

Microfts supports using file names as groups and splitting files into chunks either by line or by org-mode element, with the chunk data being a triple of line, offset, chunk-length. Searching finds candidate chunks by intersecting gram entries and then consults the files named by the groups for the actual content.

Custom groups and chunks

If this is not sufficient, the command also supports custom usage: you can add chunks to a group, specifying data and grams. Searching can return candidate chunks for a set of grams.

Compressed representation for unsigned integers (lexicographically orderable)

7 bits 0 - 127 0xxxxxxx
12 bits 128 - 4095 1000xxxx X
20 bits 4096 - 1048575 1001xxxx X X
28 bits 1048576 - 268435455 1010xxxx X X X
36 bits 268435456 - 68719476735 1011xxxx X X X X
44 bits 68719476736 - 17592186044415 1100xxxx X X X X X
52 bits 17592186044416 - 4503599627370495 1101xxxx X X X X X X
60 bits 4503599627370496 - 1152921504606846975 1110xxxx X X X X X X X
64 bits 1152921504606846976 - 18446744073709551615 1111—- X X X X X X X X

LMDB Trees

Grams: GRAM-> BLOCK

GRAM is a 2-byte value

OID LIST

OID LISTS

9 lists of oids: [9][]byte.

Note – this is probably too ornate and a simple byte array and a count might have the same performance and space.

# 1-byte OIDS
# 2-byte OIDS
# 3-byte OIDS
# 4-byte OIDS
# 5-byte OIDS
# 6-byte OIDS
# 7-byte OIDS
# 8-byte OIDS
# 9-byte OIDS
OIDS

Gram 0 holds the info since 0 is not a legal gram

next unused oid
next unused gid
free oids
free gids

Chunks: OID -> BLOCK

OIDS are compressed integers

GID
data (e.g. line number)
gram count

Groups: GID -> BLOCK

GIDS are compressed integers

NAME
oid count
last changed timestamp
validity (valid = 0, deleted = 1)
org flag (whether -org was used)

Group Names: NAME->GID

Comments
  • runtime error on certain org files (minimal example included)

    runtime error on certain org files (minimal example included)

    I get a runtime error on certain org files. A minimal org file that shows the error is the following:

    #+STARTUP: hidestars

    • Reading ** Q
      • a

    And here is the backtrace

    $ ./microfts input -org /tmp/db.db /tmp/a.org panic: runtime error: slice bounds out of range [1:0]

    goroutine 1 [running]: main.orgPart(0x24, 0xc000020180, 0x2b, 0x23, 0xc00002019f, 0x4) /home/dicosmo/code/microfts/fulltext.go:144 +0x5cf main.forParts(0xc000020180, 0x2b, 0xc000143bc0) /home/dicosmo/code/microfts/fulltext.go:108 +0xf7 main.(*lmdbConfigStruct).indexOrg(0x665de0, 0x7ffceb819254, 0xa) /home/dicosmo/code/microfts/fts-lmdb.go:554 +0x18f main.(*lmdbConfigStruct).index(0x665de0, 0x7ffceb819254, 0xa) /home/dicosmo/code/microfts/fts-lmdb.go:527 +0x48 main.cmdInput.func1() /home/dicosmo/code/microfts/fts-lmdb.go:518 +0x4f main.(*lmdbConfigStruct).update.func1.1() /home/dicosmo/code/microfts/fts-lmdb.go:1682 +0x2f main.(*lmdbConfigStruct).runTxn(0x665de0, 0xc00002cc40, 0x0, 0xc000143d08) /home/dicosmo/code/microfts/fts-lmdb.go:1720 +0x1d0 main.(*lmdbConfigStruct).update.func1(0xc00002cc40, 0xc00002cc40, 0x0) /home/dicosmo/code/microfts/fts-lmdb.go:1681 +0x72 github.com/AskAlexSharov/lmdb-go/lmdb.(*Txn).runOpTerm(0xc00002cc40, 0xc000143df0, 0x0, 0x0) /home/dicosmo/go/pkg/mod/github.com/!ask!alex!sharov/[email protected]/lmdb/txn.go:158 +0x6e github.com/AskAlexSharov/lmdb-go/lmdb.(*Env).run(0xc00007a6f0, 0x1, 0x0, 0xc000143df0, 0x0, 0x0) /home/dicosmo/go/pkg/mod/github.com/!ask!alex!sharov/[email protected]/lmdb/env.go:515 +0xbb github.com/AskAlexSharov/lmdb-go/lmdb.(*Env).Update(...) /home/dicosmo/go/pkg/mod/github.com/!ask!alex!sharov/[email protected]/lmdb/env.go:482 main.(*lmdbConfigStruct).update(0x665de0, 0xc000143e70) /home/dicosmo/code/microfts/fts-lmdb.go:1680 +0x77 main.cmdInput(0x665de0) /home/dicosmo/code/microfts/fts-lmdb.go:516 +0x19e main.runLmdb(0xc000070180) /home/dicosmo/code/microfts/fts-lmdb.go:210 +0x3be main.main() /home/dicosmo/code/microfts/fulltext.go:357 +0xc6f

    opened by rdicosmo 5
  • Move Ivy support to its own package.

    Move Ivy support to its own package.

    It's better to implement as much as possible using Emac's default completion interface. Ivy (or any other) completion specific code should be it's own package. e.g. ivy-microfts or helm-microfts

    opened by progfolio 3
  • multiple search terms only match when they are in the same line

    multiple search terms only match when they are in the same line

    This may not be a bug. I use the default microfts input with no flags to index the files. The file one.org has the words "example" and "hashtag" in them, but not on the same line. So the first two searches below work where both words are in a single line. But the last one returns nothing, which was a surprise to me.

    ./microfts search -u ../cache/fts.db one example
    

    #+RESULTS: : /Users/jkitchin/Dropbox/emacs/microfts/examples/one.org:3:This is the first example with a one in it.

    ./microfts search -u ../cache/fts.db one hashtag
    

    #+RESULTS: : /Users/jkitchin/Dropbox/emacs/microfts/examples/one.org:7:And #one hashtag.

    ./microfts search -u ../cache/fts.db example hashtag
    
    enhancement 
    opened by jkitchin 3
  • does search -partial work?

    does search -partial work?

    I have a db setup so that this command

    ./microfts search ../cache/fts.db hashtag
    

    yields this result : /Users/jkitchin/Dropbox/emacs/microfts/examples/one.org:7:And #one hashtag.

    But this command yields no result

    ./microfts search ../cache/fts.db -partial hash
    

    I thought it would give me the same match. Is this expected for -partial?

    bug 
    opened by jkitchin 3
  • Run checkdoc/package-lint

    Run checkdoc/package-lint

    Both of these tools will help bring your library in line with Elisp conventions.

    checkdoc reports 20 errors:
     org-fts.el    25     info            White space found at end of line (emacs-lisp-checkdoc)
     org-fts.el    40     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    45     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    50     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    60     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    60     info            First line should be capitalized (emacs-lisp-checkdoc)
     org-fts.el    69     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    69     info            First line should be capitalized (emacs-lisp-checkdoc)
     org-fts.el    69     info            Argument ‘item’ should appear (as ITEM) in the doc string (emacs-lisp-checkdoc)
     org-fts.el    76     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    76     info            Lisp symbol ‘org-mode’ should appear in quotes (emacs-lisp-checkdoc)
     org-fts.el    86     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el    86     info            Lisp symbol ‘org-mode’ should appear in quotes (emacs-lisp-checkdoc)
     org-fts.el   103     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el   115     info            All variables and subroutines might as well have a documentation string (emacs-lisp-checkdoc)
     org-fts.el   136     info            All variables and subroutines might as well have a documentation string (emacs-lisp-checkdoc)
     org-fts.el   146     info            All variables and subroutines might as well have a documentation string (emacs-lisp-checkdoc)
     org-fts.el   163     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el   175     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
     org-fts.el   192     info            First sentence should end with punctuation (emacs-lisp-checkdoc)
    
    46 package-lint issues found:
    46 issues found:
    
    1:71: warning: You should depend on (emacs "24.1") if you need lexical-binding.
    8:0: error: Expected (package-name "version-num"), but found cl-lib.
    8:0: error: Expected (package-name "version-num"), but found executable.
    8:0: error: Expected (package-name "version-num"), but found ivy.
    8:0: error: Expected (package-name "version-num"), but found org.
    8:0: error: Expected (package-name "version-num"), but found package.
    11:0: error: Package should have a non-empty ;;; Commentary section.
    15:10: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-lib'.
    18:10: error: You should depend on (emacs "24.1") if you need `package'.
    25:0: error: `org-fts/microfts-url-alist' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    30:0: error: `org-fts/baseprogram-alist' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    35:0: error: `org-fts/baseprogram' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    39:0: error: `org-fts/program' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    44:0: error: `org-fts/db' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    49:0: error: `org-fts/search-args' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    54:0: error: `org-fts/hits' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    55:0: error: `org-fts/args' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    56:0: error: `org-fts/timer' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    57:0: error: `org-fts/actual-program' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    59:0: error: `org-fts/check-db' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    68:0: error: `org-fts/test' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    75:0: error: `org-fts/save-hook' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    85:0: error: `org-fts/open-hook' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    102:0: error: `org-fts/idle-task' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    114:0: error: `org-fts/microfts-search' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    131:47: error: You should depend on (emacs "24.3") if you need `file-name-base'.
    133:14: warning: Closing parens should not be wrapped onto new lines.
    135:0: error: `org-fts/found' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    139:38: error: You should depend on (emacs "27.1") if you need `org-show-all'.
    142:0: error: `org-fts/history' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    143:0: error: `org-fts/file-history' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    145:0: error: `org-fts/display' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    146:18: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-search'.
    147:18: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-search'.
    152:5: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-do'.
    152:18: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-incf'.
    154:9: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-do'.
    154:20: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-search'.
    154:55: error: You should depend on (emacs "24.3") or the cl-lib package if you need `cl-search'.
    162:0: error: `org-fts/search' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    174:0: error: `org-fts/find-org-file' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    184:20: error: You should depend on (emacs "25.1") or the seq package if you need `seq-filter'.
    185:37: error: You should depend on (emacs "25.1") or the seq package if you need `seq-sort'.
    186:49: error: You should depend on (emacs "25.1") if you need `string-collate-lessp'.
    191:0: error: `org-fts/ensure-binary' contains a non-standard separator `/', use hyphens instead (see Elisp Coding Conventions).
    217:20: error: You should depend on (emacs "24.4") if you need `zlib-decompress-region'.
    
    opened by progfolio 2
  • Feature request - one line per search result for completion tools in Emacs

    Feature request - one line per search result for completion tools in Emacs

    To use asynchronous search with completion tools like ivy/helm, we need the search command to output one complete line per result. I think you want something like:

    (:filename path :line line-number :offset integer :text "matched chunk" :percent float)

    I used a plist there, but other forms would work too, as long as you can "read" them in emacs.

    for each match. Then you can use a transformer function in ivy to convert that to what you want to do completion on.

    Note this is already supported in the vanilla search output which does output a line per result, but you have to parse that line to use it, and that isn't as reliable as using read in emacs. Also, it only outputs the line number so you can't jump to the match with offset.

    enhancement 
    opened by jkitchin 1
  • DB never updated

    DB never updated

    I'm having the problem, that the DB is never automatically updated. I can create and update the DB from the command line, but the function never gets called when I open and/or save an org-file in emacs.

    opened by Aquan1412 0
  • missing input files generate a panic

    missing input files generate a panic

    Hi, If /tmp/tst is missing (I had created /tmp/test by mistake, following your README), I get a panic. The panic message explains the error, but then it is followed by a long backtrace (truncated below).

    ./microfts input -file /tmp/bubba /tmp/tst
    panic: Error: stat /tmp/tst: no such file or directory, args: [/tmp/bubba /tmp/tst]
    
    goroutine 1 [running]:
    main.check(0x59fe60, 0xc00007a720)
            /home/bill/work/microfts/fulltext.go:205 +0x150
    main.(*lmdbConfigStruct).openInputFile(0x686980, 0x7fff9d4de550, 0x8, 0xc00013baa0, 0x4e740c, 0x7f9cec003ffb, 0x4)
            /home/bill/work/microfts/fts-lmdb.go:535 +0x79
    main.(*lmdbConfigStruct).indexLines(0x686980, 0x7fff9d4de550, 0x8)
            /home/bill/work/microfts/fts-lmdb.go:579 +0x5d
    main.(*lmdbConfigStruct).index(0x686980, 0x7fff9d4de550, 0x8)
            /home/bill/work/microfts/fts-lmdb.go:529 +0x6f
    main.cmdInput.func1()
    
    opened by sje30 1
  • Searching just for org headings?

    Searching just for org headings?

    Hi, thanks for this neat project.

    Is there a way to restrict a search just for org mode headings rather than the entire file? I saw in #6 mention of searching by headlines?

    Thanks.

    opened by sje30 2
  • microfts doesn't handle org source blocks

    microfts doesn't handle org source blocks

    The org parser/chunker in microfts seems to get confused by source blocks. When searching for a term that appears in a file with source blocks, the search UI displays just one huge line with the first source block for the org file (and none of the other matches). The screen shot shows a situation, where many lines in the file match "sphinx". In this case, the source block doesn't even have a match for the search term (it looks like the org parser squeezed all text into the source block chunk).

    Bildschirmfoto 2021-01-16 um 14 44 14

    A simple workaround obviously is to remove -org from org-fts-input-args, which I did (and it's still very useful then, but not quite what you intended, I guess).

    Here is a shell transcript of reproducing this from the command line:

    $ THIS IS WRONG
    $ rm org-fts.db
    $ ./microfts create org-fts.db
    $ ./microfts input -org org-fts.db ~/org/links.org
    $ ./microfts search org-fts.db sphinx | cut -c -80
    ~/org/links.org:29:    #+begin_src python\n      import sys, 
    
    $ WITHOUT ORG PARSING (same input file)
    $ rm org-fts.db
    $ ./microfts create org-fts.db
    $ ./microfts input org-fts.db ~/org/links.org
    $ ./microfts search org-fts.db sphinx | cut -c -80
    ~/org/links.org:682:** Sphinx
    ~/org/links.org:684:*** Requirements, Bugs, Test cases, … ins
    ~/org/links.org:688:*** Why use reStructuredText and Sphinx s
    ~/org/links.org:695:    documents, then Sphinx (or any of Mar
    ~/org/links.org:698:    sphinx-static-site-generator-for-main
    ...
    

    I looked into the code, but my Go fu is moot, so no patch, sorry ...

    opened by fpatz 5
  • Non Latin scripts

    Non Latin scripts

    Hi. Wanted to ask if this should work with non Latin scripts. I've installed and quickly tested and it seems it will not find anything searching for texts in Hebrew or Arabic. I only had spent a short time testing so sorry if I'm missing something here.

    opened by oatmealm 4
Releases(v1.0.1)
Owner
Bill Burdick
I've been programming since 1978 and I love learning and teaching new things.
Bill Burdick
In-memory, full-text search engine built in Go. For no particular reason.

Motivation I just wanted to learn how to write a search engine from scratch without any prior experience. Features Index content Search content Index

Michele Riva 27 Sep 1, 2022
Fast and secure steganography CLI for hiding text/files in images.

indie CLI This complete README is hidden in the target.png file below without the original readme.png this could have also been a lie as none could ev

BoB 4 Mar 20, 2022
Takes a full name and splits it into individual name parts

gonameparts gonameparts splits a human name into individual parts. This is useful when dealing with external data sources that provide names as a sing

James Polera 38 Sep 27, 2022
A full-featured regex engine in pure Go based on the .NET engine

regexp2 - full featured regular expressions for Go Regexp2 is a feature-rich RegExp engine for Go. It doesn't have constant time guarantees like the b

Doug Clark 670 Jan 9, 2023
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

John Moore 78 Sep 27, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Sebastian Müller 55 Nov 24, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

JF Technology 532 Jan 4, 2023
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

ZoomIO 26 Dec 19, 2022
Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Michał Derkacz 74 Mar 15, 2022
Diff, match and patch text in Go

go-diff go-diff offers algorithms to perform operations required for synchronizing plain text: Compare two texts and return their differences. Perform

Sergi Mansilla 1.4k Dec 25, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Joseph Kato 3k Jan 4, 2023
PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Allen Dang 73 Aug 15, 2022
ByNom is a Go package for parsing byte sequences, suitable for parsing text and binary data

ByNom is a Go package for parsing byte sequences. Its goal is to provide tools to build safe byte parsers without compromising the speed or memo

Andrew Bashkatov 4 May 5, 2021
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

?? The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

Peter M. Stahl 822 Dec 25, 2022
a simple and lightweight terminal text editor written in Go

Simple Text editor written in Golang build go build main.go

buzz 3 Oct 4, 2021
AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline

AppGo AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline. Installation To run AppGo it is necessar

Wendy Conde 0 Oct 23, 2021
A UTF-8 and internationalisation testing utility for text rendering.

ɱéťàł "English, but metal" Metal is a tool that converts English text into a legible, Zalgo-like character swap for the purposes of testing localisati

Harley 0 Jan 1, 2023
A simple action that looks for multiple regex matches, in a input text, and returns the key of the first found match.

Key Match Action A simple action that looks for multiple regex matches, in a input text, and returns the key of the first found match. TO RUN Add the

Chris 1 Aug 4, 2022
Search for Go code using syntax trees

gogrep GO111MODULE=on go get mvdan.cc/gogrep Search for Go code using syntax trees. Work in progress. gogrep -x 'if $x != nil { return $x, $*_ }' In

Daniel Martí 475 Dec 9, 2022