go-fasttld is a high performance top level domains (TLD) extraction module.

Overview

go-fasttld

Go Reference Go Report Card Codecov Coverage Mentioned in Awesome Go

GitHub license

go-fasttld is a high performance top level domains (TLD) extraction module implemented with compressed tries.

This module is a port of the Python fasttld module, with additional modifications to support extraction of subcomponents from full URLs, IPv4 addresses, and IPv6 addresses.

Trie

Background

go-fasttld extracts subcomponents like top level domains (TLDs), subdomains and hostnames from URLs efficiently by using the regularly-updated Mozilla Public Suffix List and the compressed trie data structure.

For example, it extracts the com TLD, maps subdomain, and google domain from https://maps.google.com:8080/a/long/path/?query=42.

go-fasttld also supports extraction of private domains listed in the Mozilla Public Suffix List like 'blogspot.co.uk' and 'sinaapp.com', extraction of IPv4 addresses, and extraction of IPv6 addresses.

Why not split on "." and take the last element instead?

Splitting on "." and taking the last element only works for simple TLDs like .com, but not more complex ones like oseto.nagasaki.jp.

Compressed trie example

Valid TLDs from the Mozilla Public Suffix List are appended to the compressed trie in reverse-order.

Given the following TLDs
au
nsw.edu.au
com.ac
edu.ac
gov.ac

and the example URL host `example.nsw.edu.au`

The compressed trie will be structured as follows:

START
 ╠═ au 🚩 ✅
 ║  ╚═ edu ✅
 ║     ╚═ nsw 🚩 ✅
 ╚═ ac
    ╠═ com 🚩
    ╠═ edu 🚩
    ╚═ gov 🚩

=== Symbol meanings ===
🚩 : path to this node is a valid TLD
✅ : path to this node found in example URL host `example.nsw.edu.au`

The URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are au -> edu -> nsw. Reversing the nodes gives the extracted TLD nsw.edu.au.

Installation

go get github.com/elliotwutingfeng/go-fasttld

Quick Start

Full demo available in the examples folder

Domain

// Initialise fasttld extractor
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

//Extract URL subcomponents
url := "https://[email protected]:5000/a/b/c/d/e/f/g/h/i?id=42"
res := extractor.Extract(fasttld.URLParams{URL: url})

// Display results
fmt.Println(res.Scheme)           // https://
fmt.Println(res.UserInfo)         // some-user
fmt.Println(res.SubDomain)        // a.long.subdomain
fmt.Println(res.Domain)           // ox
fmt.Println(res.Suffix)           // ac.uk
fmt.Println(res.RegisteredDomain) // ox.ac.uk
fmt.Println(res.Port) // 5000
fmt.Println(res.Path) // a/b/c/d/e/f/g/h/i?id=42

IPv4 Address

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://127.0.0.1:5000"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = 127.0.0.1
// res.Suffix = <no output>
// res.RegisteredDomain = 127.0.0.1
// res.Port = 5000
// res.Path = <no output>

IPv6 Address

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://[aBcD:ef01:2345:6789:aBcD:ef01:2345:6789]:5000"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = aBcD:ef01:2345:6789:aBcD:ef01:2345:6789
// res.Suffix = <no output>
// res.RegisteredDomain = aBcD:ef01:2345:6789:aBcD:ef01:2345:6789
// res.Port = 5000
// res.Path = <no output>

Internationalised label separators

go-fasttld supports the following internationalised label separators (IETF RFC 3490)

  • U+002E (full stop)
  • U+3002 (ideographic full stop)
  • U+FF0E (fullwidth full stop)
  • U+FF61 (halfwidth ideographic full stop)
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://brb\u002ei\u3002am\uff0egoing\uff61to\uff0ebe\u3002a\uff61fk"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = brb\u002ei\u3002am\uff0egoing\uff61to
// res.Domain = be
// res.Suffix = a\uff61fk
// res.RegisteredDomain = be\u3002a\uff61fk
// res.Port = <no output>
// res.Path = <no output>

Public Suffix List options

Specify custom public suffix list file

You can use a custom public suffix list file by setting CacheFilePath in fasttld.SuffixListParams{} to its absolute path.

cacheFilePath := "/absolute/path/to/file.dat"
extractor, _ := fasttld.New(fasttld.SuffixListParams{CacheFilePath: cacheFilePath})

Updating the default Public Suffix List cache

Whenever fasttld.New is called without specifying CacheFilePath in fasttld.SuffixListParams{}, the local cache of the default Public Suffix List is updated automatically if it is more than 3 days old. You can also manually update the cache by using Update().

// Automatic update performed if `CacheFilePath` is not specified
// and local cache is more than 3 days old
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

// Manually update local cache
if err := extractor.Update(); err != nil {
    log.Println(err)
}

Private domains

According to the Mozilla.org wiki, the Mozilla Public Suffix List contains private domains like blogspot.com and sinaapp.com.

By default, go-fasttld excludes these private domains (i.e. IncludePrivateSuffix = false)

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://google.blogspot.com"
res := extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = google
// res.Domain = blogspot
// res.Suffix = com
// res.RegisteredDomain = blogspot.com
// res.Port = <no output>
// res.Path = <no output>

You can include private domains by setting IncludePrivateSuffix = true

extractor, _ := fasttld.New(fasttld.SuffixListParams{IncludePrivateSuffix: true})

url := "https://google.blogspot.com"
res := extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = google
// res.Suffix = blogspot.com
// res.RegisteredDomain = google.blogspot.com
// res.Port = <no output>
// res.Path = <no output>

Extraction options

Ignore Subdomains

You can ignore subdomains by setting IgnoreSubDomains = true. By default, subdomains are extracted.

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://maps.google.com"
res := extractor.Extract(fasttld.URLParams{URL: url, IgnoreSubDomains: true})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = google
// res.Suffix = com
// res.RegisteredDomain = google.com
// res.Port = <no output>
// res.Path = <no output>

Punycode

Convert internationalised URLs to punycode before extraction by setting ConvertURLToPunyCode = true. By default, URLs are not converted to punycode.

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://hello.世界.com"
res := extractor.Extract(fasttld.URLParams{URL: url, ConvertURLToPunyCode: true})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = hello
// res.Domain = xn--rhqv96g
// res.Suffix = com
// res.RegisteredDomain = xn--rhqv96g.com
// res.Port = <no output>
// res.Path = <no output>

res = extractor.Extract(fasttld.URLParams{URL: url, ConvertURLToPunyCode: false})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = hello
// res.Domain = 世界
// res.Suffix = com
// res.RegisteredDomain = 世界.com
// res.Port = <no output>
// res.Path = <no output>

Testing

go test -v -coverprofile=test_coverage.out && go tool cover -html=test_coverage.out -o test_coverage.html

Benchmarks

go test -bench=. -benchmem -cpu 1

Modules used

Benchmark Name Source
GoFastTld go-fasttld (this module)
JPilloraGoTld github.com/jpillora/go-tld
JoeGuoTldExtract github.com/joeguo/tldextract
Mjd2021USATldExtract github.com/mjd2021usa/tldextract
M507Tlde github.com/M507/tlde

Results

Benchmarks performed on AMD Ryzen 7 5800X, Manjaro Linux.

go-fasttld performs especially well on longer URLs.


#1

https://news.google.com

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 2389614 496.8 ns/op 176 B/op 4 allocs/op ✔️
JPilloraGoTld 2300103 521.2 ns/op 224 B/op 2 allocs/op
JoeGuoTldExtract 1480351 822.2 ns/op 208 B/op 7 allocs/op
Mjd2021USATldExtract 1336317 876.7 ns/op 208 B/op 7 allocs/op
M507Tlde 2276070 513.1 ns/op 160 B/op 5 allocs/op

#2

https://iupac.org/iupac-announces-the-2021-top-ten-emerging-technologies-in-chemistry/

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 2254648 537.6 ns/op 304 B/op 4 allocs/op ✔️
JPilloraGoTld 1633924 737.0 ns/op 224 B/op 2 allocs/op
JoeGuoTldExtract 1532829 781.0 ns/op 288 B/op 6 allocs/op
Mjd2021USATldExtract 1444665 832.5 ns/op 288 B/op 6 allocs/op
M507Tlde 2032639 584.8 ns/op 272 B/op 5 allocs/op

#3

https://www.google.com/maps/dir/Parliament+Place,+Parliament+House+Of+Singapore,+Singapore/Parliament+St,+London,+UK/@25.2440033,33.6721455,4z/data=!3m1!4b1!4m14!4m13!1m5!1m1!1s0x31da19a0abd4d71d:0xeda26636dc4ea1dc!2m2!1d103.8504863!2d1.2891543!1m5!1m1!1s0x487604c5aaa7da5b:0xf13a2197d7e7dd26!2m2!1d-0.1260826!2d51.5017061!3e4

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 1519119 785.9 ns/op 784 B/op 4 allocs/op ✔️
JPilloraGoTld 399526 2848 ns/op 928 B/op 4 allocs/op
JoeGuoTldExtract 778827 1420 ns/op 1120 B/op 6 allocs/op
Mjd2021USATldExtract 755976 1523 ns/op 1120 B/op 6 allocs/op
M507Tlde 806964 1584 ns/op 1120 B/op 6 allocs/op

Acknowledgements

Issues
  • Update github.com/joeguo/tldextract digest to d83daa6

    Update github.com/joeguo/tldextract digest to d83daa6

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | github.com/joeguo/tldextract | require | digest | 7e06486 -> d83daa6 |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Update golang.org/x/net digest to 2871e0c

    Update golang.org/x/net digest to 2871e0c

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 1d1ef93 -> 2871e0c |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Handle internationalised period delimiters

    Handle internationalised period delimiters

    The current implementation only handles . as a period delimiter, but not 。, ., and 。

    | Code | Character | Is Handled | |--|--|--| | \u002e | . | :heavy_check_mark: | | \u3002 | 。 | :no_entry_sign: | | \uff0e | . | :no_entry_sign: | | \uff61 | 。 | :no_entry_sign: |

    Ideally, we can handle these period delimiter variants with minimal impact on performance.

    Similar to https://github.com/john-kurkowski/tldextract/pull/253

    enhancement help wanted good first issue 
    opened by elliotwutingfeng 1
  • Update golang.org/x/net digest to 1d1ef93

    Update golang.org/x/net digest to 1d1ef93

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 1850ba1 -> 1d1ef93 |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Update golang.org/x/net digest to 1850ba1

    Update golang.org/x/net digest to 1850ba1

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | a630d4f -> 1850ba1 |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Update golang.org/x/net digest to a630d4f

    Update golang.org/x/net digest to a630d4f

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 290c469 -> a630d4f |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Update golang.org/x/net digest to 290c469

    Update golang.org/x/net digest to 290c469

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | aac1ed4 -> 290c469 |


    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

    opened by renovate[bot] 1
  • Dependency Dashboard

    Dependency Dashboard

    This issue provides visibility into Renovate updates and their statuses. Learn more

    This repository currently has no open or pending branches.


    • [ ] Check this box to trigger a request for Renovate to run again on this repository
    opened by renovate[bot] 0
Releases(v0.1.3)
  • v0.1.3(May 2, 2022)

    What's Changed

    • go-fasttld is now in Awesome Go! :tada:
    • Enhancement: Refactored test case suite (removed redundant tests)
    • Fixed: Wildcard Suffix exception rules should be obeyed
    • Fixed: Path should not be modified when converting URL to punycode
    • Fixed: SubDomain or Domain with invalid punycode should be rejected
    • Fixed: SubDomain and/or Domain should be parsed even if Suffix is missing

    Full Changelog: https://github.com/elliotwutingfeng/go-fasttld/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Apr 30, 2022)

    What's Changed

    • go-fasttld is now in Awesome Go! :tada:
    • Fix unhandled edge case for some urls of format domain+singleTLD

    Full Changelog: https://github.com/elliotwutingfeng/go-fasttld/compare/v0.1.1...v0.1.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Apr 29, 2022)

    What's Changed

    • go-fasttld is now in Awesome Go! :tada:
    • Trim excess whitespace and period delimiters before converting URL to punycode
    • Fix wrong conversion of internationalised period delimiters when converting URL to punycode
    • Add compressed trie example to README

    Full Changelog: https://github.com/elliotwutingfeng/go-fasttld/compare/v0.1.0...v0.1.1

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 29, 2022)

    What's Changed

    • go-fasttld is now in Awesome Go! :tada:
    • Handle internationalised period delimiters. Addresses #6
    • Reduce execution time & memory usage by using strings.LastIndexAny instead of strings.Split to extract TLD suffix, domain, and subdomain

    Full Changelog: https://github.com/elliotwutingfeng/go-fasttld/compare/v0.0.3...v0.1.0

    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Apr 21, 2022)

Owner
Wu Tingfeng
Wu Tingfeng
High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Darren Elwood 125 Jan 18, 2022
skipmap is a high-performance concurrent sorted map based on skip list. Up to 3x ~ 10x faster than sync.Map in the typical pattern.

Introduction skipmap is a high-performance concurrent map based on skip list. In typical pattern(one million operations, 90%LOAD 9%STORE 1%DELETE), th

ZhangYunHao 51 Apr 21, 2022
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.2k May 17, 2022
A Go queue manager on top of Redis

Queue A Go library for managing queues on top of Redis. It is based on a hiring exercise but later I found it useful for myself in a custom task proce

Kaveh Mousavi Zamani 73 Apr 5, 2022
low level data type and utils in Golang.

low low level data type and utils in Golang. A stable low level function set is the basis of a robust architecture. It focuses on stability and requir

null 64 Apr 18, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

SmartyStreets (Archives) 287 Apr 5, 2022
TLDs finder: check domain name availability across all valid top-level domains

TLD:er TLDs finder — check domain name availability across all valid top-level d

Dwi Siswanto 43 May 14, 2022
fast tool for separate existing domains from list of domains using DNS/HTTP.

NETGREP How To Install • How to use Description netgrep can send http/https request or resolve domain from dns (can customize dns server) to separate

aWolver 2 Jan 27, 2022
Godaddy-domains-client-go - Godaddy domains api Client golang - Write automaticly from swagger codegen

Go API client for swagger Overview This API client was generated by the swagger-codegen project. By using the swagger-spec from a remote server, you c

Mickael Stanislas 0 Jan 9, 2022
🔑A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout (LSM+WAL) similar to Riak.

bitcask A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout

James Mills 7 Apr 15, 2022
the pluto is a gateway new time, high performance, high stable, high availability, easy to use

pluto the pluto is a gateway new time, high performance, high stable, high availability, easy to use Acknowledgments thanks nbio for providing low lev

mobus 2 Sep 19, 2021
🚥 Yet another pinger: A high-performance ICMP ping implementation build on top of BPF technology.

yap Yet-Another-Pinger: A high-performance ICMP ping implementation build on top of BPF technology. yap uses the gopacket library to receive and handl

dongdong 37 Apr 29, 2022
top in container - Running the original top command in a container

Running the original top command in a container will not get information of the container, many metrics like uptime, users, load average, tasks, cpu, memory, are about the host in fact. topic(top in container) will retrieve those metrics from container instead, and shows the status of the container, not the host.

silenceshell 67 Mar 23, 2022
Gue is Golang queue on top of PostgreSQL that uses transaction-level locks.

Gue is Golang queue on top of PostgreSQL that uses transaction-level locks.

Vladimir Garvardt 82 May 7, 2022
Coordinates shutdown of processes with multiple top-level services.

package supervisor Package supervisor coordinates shutdown among multiple services and the OS. Example func main() { m := supervisor.New(context.Back

dzrw 0 Dec 12, 2021
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

小张 3.1k May 10, 2022
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

LinDB 2.2k May 17, 2022
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

Joseph Kato 2.9k May 14, 2022
A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).

Abdullah Joseph 93 Apr 16, 2022
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)

TextRank on Go This source code is an implementation of textrank algorithm, under MIT licence. The minimum requred Go version is 1.8. MOTIVATION If th

David Belicza 156 May 8, 2022
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

grate A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats. Why? Grate focuses on speed and stability first

Jeremy Jay 103 Apr 13, 2022
Pi-hole data right from your terminal. Live updating view, query history extraction and more!

Pi-CLI Pi-CLI is a command line program used to view data from a Pi-Hole instance directly in your terminal.

Reece Mercer 41 Apr 26, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Joseph Kato 2.9k May 10, 2022
PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt PipeIt is a text transformation, conversion, cleansing and extraction tool. Features Split - split text to text array by given separator. Regex

Allen Dang 70 Apr 23, 2022
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!

rare A file scanner/regex extractor and realtime summarizor. Supports various CLI-based graphing and metric formats (histogram, table, etc). Features

Chris LaPointe 136 Apr 23, 2022
Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

go_policyExtractor Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec). Le programme suivant se base sur les intitulé

Nokeni 0 Nov 4, 2021
A block parser tool that allows extraction of various data types on DAS

das-database A block parser tool that allows extraction of various data types on DAS (register, edit, sell, transfer, ...) from CKB Prerequisites Ubun

DAS 11 May 10, 2022
GoStats is a go library for math statistics mostly used in ML domains, it covers most of the statistical measures functions.

GoStats GoStats is an Open Source Go library for math statistics mostly used in Machine Learning domains, it covers most of the Statistical measures f

Ilyes 20 Mar 5, 2022