A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Overview

prose Build Status GoDoc Coverage Status Go Report Card codebeat badge Awesome

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

You can find a more detailed summary on the library's performance here: Introducing prose v2.0.0: Bringing NLP to Go.

Installation

$ go get github.com/jdkato/prose/v2

Usage

Contents

Overview

package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("Go is an open-source programming language created at Google.")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag, tok.Label)
        // Go NNP B-GPE
        // is VBZ O
        // an DT O
        // ...
    }

    // Iterate over the doc's named-entities:
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Go GPE
        // Google GPE
    }

    // Iterate over the doc's sentences:
    for _, sent := range doc.Sentences() {
        fmt.Println(sent.Text)
        // Go is an open-source programming language created at Google.
    }
}

The document-creation process adheres to the following sequence of steps:

tokenization -> POS tagging -> NE extraction
            \
             segmentation

Each step may be disabled (assuming later steps aren't required) by passing the appropriate functional option. To disable named-entity extraction, for example, you'd do the following:

doc, err := prose.NewDocument(
        "Go is an open-source programming language created at Google.",
        prose.WithExtraction(false))

Tokenizing

prose includes a tokenizer capable of processing modern text, including the non-word character spans shown below.

Type Example
Email addresses [email protected]
Hashtags #trending
Mentions @jdkato
URLs https://github.com/jdkato/prose
Emoticons :-), >:(, o_0, etc.
package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("@jdkato, go to http://example.com thanks :).")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag)
        // @jdkato NN
        // , ,
        // go VB
        // to TO
        // http://example.com NN
        // thanks NNS
        // :) SYM
        // . .
    }
}

Segmenting

prose includes one of the most accurate sentence segmenters available, according to the Golden Rules created by the developers of the pragmatic_segmenter.

Name Language License GRS (English) GRS (Other) Speed†
Pragmatic Segmenter Ruby MIT 98.08% (51/52) 100.00% 3.84 s
prose Go MIT 75.00% (39/52) N/A 0.96 s
TactfulTokenizer Ruby GNU GPLv3 65.38% (34/52) 48.57% 46.32 s
OpenNLP Java APLv2 59.62% (31/52) 45.71% 1.27 s
Standford CoreNLP Java GNU GPLv3 59.62% (31/52) 31.43% 0.92 s
Splitta Python APLv2 55.77% (29/52) 37.14% N/A
Punkt Python APLv2 46.15% (24/52) 48.57% 1.79 s
SRX English Ruby GNU GPLv3 30.77% (16/52) 28.57% 6.19 s
Scapel Ruby GNU GPLv3 28.85% (15/52) 20.00% 0.13 s

† The original tests were performed using a MacBook Pro 3.7 GHz Quad-Core Intel Xeon E5 running 10.9.5, while prose was timed using a MacBook Pro 2.9 GHz Intel Core i7 running 10.13.3.

package main

import (
    "fmt"
    "strings"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, _ := prose.NewDocument(strings.Join([]string{
        "I can see Mt. Fuji from here.",
        "St. Michael's Church is on 5th st. near the light."}, " "))

    // Iterate over the doc's sentences:
    sents := doc.Sentences()
    fmt.Println(len(sents)) // 2
    for _, sent := range sents {
        fmt.Println(sent.Text)
        // I can see Mt. Fuji from here.
        // St. Michael's Church is on 5th st. near the light.
    }
}

Tagging

prose includes a tagger based on Textblob's "fast and accurate" POS tagger. Below is a comparison of its performance against NLTK's implementation of the same tagger on the Treebank corpus:

Library Accuracy 5-Run Average (sec)
NLTK 0.893 7.224
prose 0.961 2.538

(See scripts/test_model.py for more information.)

The full list of supported POS tags is given below.

TAG DESCRIPTION
( left round bracket
) right round bracket
, comma
: colon
. period
'' closing quotation mark
`` opening quotation mark
# number sign
$ currency
CC conjunction, coordinating
CD cardinal number
DT determiner
EX existential there
FW foreign word
IN conjunction, subordinating or preposition
JJ adjective
JJR adjective, comparative
JJS adjective, superlative
LS list item marker
MD verb, modal auxiliary
NN noun, singular or mass
NNP noun, proper singular
NNPS noun, proper plural
NNS noun, plural
PDT predeterminer
POS possessive ending
PRP pronoun, personal
PRP$ pronoun, possessive
RB adverb
RBR adverb, comparative
RBS adverb, superlative
RP adverb, particle
SYM symbol
TO infinitival to
UH interjection
VB verb, base form
VBD verb, past tense
VBG verb, gerund or present participle
VBN verb, past participle
VBP verb, non-3rd person singular present
VBZ verb, 3rd person singular present
WDT wh-determiner
WP wh-pronoun, personal
WP$ wh-pronoun, possessive
WRB wh-adverb

NER

prose v2.0.0 includes a much improved version of v1.0.0's chunk package, which can identify people (PERSON) and geographical/political Entities (GPE) by default.

package main

import (
    "github.com/jdkato/prose/v2"
)

func main() {
    doc, _ := prose.NewDocument("Lebron James plays basketbal in Los Angeles.")
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Lebron James PERSON
        // Los Angeles GPE
    }
}

However, in an attempt to make this feature more useful, we've made it straightforward to train your own models for specific use cases. See Prodigy + prose: Radically efficient machine teaching in Go for a tutorial.

Comments
  • The example in readme does not compile

    The example in readme does not compile

    I am referring to this:

    package main
    import "gopkg.in/jdkato/prose.v2"
    func main() { prose.NewDocument("Go is ...") }
    

    The NewDocument is actually in gopkg.in/jdkato/prose.v2/summarize now.

    However, go get gopkg.in/jdkato/prose.v2/summarize does not work either. The package does not compile due to usage of internal package. This is a typical error when using the gopkg.in service. gopkg.in only redirects the git URI, but does not rewrite package import paths. As a result, the referred imports will actually point back to the master branch, nullifying the essential purpose of versioning. To use gopkg.in properly, you need to manually rewrite the import paths in the entire repo in the release tags/branches (or just stop using gopkg.in for multi package repos..).

    I saw your repo from Hacker News, but your repo fails to build on smallrepo. Detailed build log here:

    https://smallrepo.com/builds/20180717-175536-bc73d63d

    Thanks.

    Status: Fixed Type: Bug 
    opened by h8liu 8
  • Wrap location parsing in a function

    Wrap location parsing in a function

    This moves the logic that was in chunk_test.go into a new exported function named Chunk. The primary difference is that, instead of returning a slice of locations, we're now returning a slice of strings (i.e., the actual chunks).

    This changes the usage from

    words := tokenize.TextToWords(text)
    tagger := tag.NewPerceptronTagger()
    tagged := tagger.Tag(words)
    rs := Locate(tagged, TreebankNamedEntities)
    
    for r, loc := range rs {
        res := ""
        for t, tt := range tagged[loc[0]:loc[1]] {
            if t != 0 {
                res += " "
            }
            res += tt.Text
        }
    
        if r >= len(expected) {
            t.Error("ERROR unexpected result: " + res)
        } else {
            if res != expected[r] {
                t.Error("ERROR", res, "!=", expected[r])
            }
        }
    }
    

    to

    words := tokenize.TextToWords(text)
    tagger := tag.NewPerceptronTagger()
    tagged := tagger.Tag(words)
    
    for i, chunk := range Chunk(tagged, TreebankNamedEntities) {
        if i >= len(expected) {
            t.Error("ERROR unexpected result: " + chunk)
        } else {
            if chunk != expected[i] {
                t.Error("ERROR", chunk, "!=", expected[i])
            }
        }
    }
    

    /cc @elliott5

    opened by jdkato 7
  • Possible enhancements to the

    Possible enhancements to the "summarize" package

    Have you considered adding the Coleman–Liau index for completeness? Even though "opinion varies on its accuracy": https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index

    On the subject of suspect measures, a composite "years-of-education" metric, taking the average of scores together (and their standard deviation) may be of use: https://github.com/elliott5/readability/blob/master/assess.go

    Finally, for giving feedback to users on how to change their prose to be easier to read, it would be great if your analysis could store:

    • sentences with their word length; and
    • words with their syllable length and frequency (the product of the two ranking non-readability).

    Keep up the good work!

    Status: Resolved Type: Enhancement Difficulty: Easy 
    opened by elliott5 7
  • go get not working

    go get not working

    I tried to go get the hugo repo. I am using Go 1.14 (with go modules)

    go get github.com/gohugoio/hugo

    package github.com/jdkato/prose/transform: cannot find package "github.com/jdkato/prose/transform" in any of:
    	/usr/local/go/src/github.com/jdkato/prose/transform (from $GOROOT)
    	/Users/x/GOPATH/src/github.com/jdkato/prose/transform (from $GOPATH)
    

    I'm wondering why it is saying transform can't be found?

    Status: Resolved Type: Question 
    opened by pjebs 5
  • Make it possible to use vendored

    Make it possible to use vendored

    Very useful project.

    I wanted to test the title case functionality for use in Hugo, but we vendor our libraries, and I get a ../vendor/github.com/jdkato/prose/transform/title.go:9:2: use of internal package not allowed when importing github.com/jdkato/prose/transform.

    See https://github.com/gohugoio/hugo/pull/3753

    I have been Googling this, and it seems there is no (simple) workaround other than avoiding the use of internal packages in libraries.

    Status: Resolved Type: Question 
    opened by bep 5
  • Roadmap

    Roadmap

    This is rough outline of some improvements I'd like to make.

    Documentation

    • [x] Improve and update README.
    • [ ] Add .github files.

    tokenize

    • [x] Port the pragmatic_segmenter.
    • [x] Get PunktSentenceTokenizer passing the The Golden Rules and possibly submit a PR upstream.

    tag

    • [x] Finish porting the PerceptronTagger (just training functionality left).
    • [x] ~~Port the TnT Tagger~~ (not going to make v1.0).
    • [ ] Improve testing strategy (we currently rely on NLTK).

    transform

    • [x] Improve Title and add support for variations (e.g., AP style).
    • [ ] Port Change Case.

    summarize

    • [x] Finish working on Syllables.
    • [x] Add a composite "years-of-education" metric.
    • [ ] Add the ability to update a Document's content without having to recalculate all of its statistics.
    • [x] Expand test suite.
    Status: Resolved Type: Enhancement 
    opened by jdkato 5
  • Allow overridable tokenizer parameters.

    Allow overridable tokenizer parameters.

    This PR changes iterTokenizer struct to contain the parameters that can be overriden. This will allow future changes to NewDocument that use a custom tokenizer.

    Will help with issue #41 and #32

    Test: existing unit test

    opened by Titousensei 4
  • Prose installation issues

    Prose installation issues

    I tried installing prose using the command provided go get gopkg.in/jdkato/prose.v2 and this is the error log printed on my terminal.

    # gopkg.in/jdkato/prose.v2
    ../gopkg.in/jdkato/prose.v2/extract.go:323:23: est.AtVec undefined (type *mat.VecDense has no field or method AtVec)
    ../gopkg.in/jdkato/prose.v2/extract.go:335:25: est.AtVec undefined (type *mat.VecDense has no field or method AtVec)
    ../gopkg.in/jdkato/prose.v2/extract.go:355:17: count.AtVec undefined (type *mat.VecDense has no field or method AtVec)
    ../gopkg.in/jdkato/prose.v2/extract.go:525:27: count.AtVec undefined (type *mat.VecDense has no field or method AtVec)
    

    go-lang version : go1.9.7 darwin/amd64

    Status: Fixed Type: Bug 
    opened by Hasil-Sharma 4
  • slice bounds out of range, title.go:49

    slice bounds out of range, title.go:49

    I'm trying to use TItleConverter.Title(), but it panics on me, when the string contains certain multibyte characters.

    Crashing testcases:

            type tc struct {
    		name  string
    		input string
    		want  string
    	}
    	tests := []tc{
                    tc{"panic", "This Agreement, dated [DATE] (the “Effective Date”) for Design Services (the “Agreement”) is between [DESIGNER NAME], of [DESIGNER COMPANY](“Designer”), and [CLIENT NAME], of [CLIENT COMPANY] (“Client”) (together known as the “Parties”), for the performance of said Design Services and the production of Deliverables, as described in Schedule A, attached hereto and incorporated herein by reference.", "panic"},
    		tc{"panic", "Crash,”“us,” “our” or “we” means Crash Network, Inc. (d/b/a Crash) and its subsidiaries and affiliates.", "panic"},
    		tc{"panic", "a “[“New Entity”],” an [Institution] and [Institution].", "panic"},
            }
    
    Status: Fixed Type: Bug Priority: High 
    opened by aaaton 4
  • List of Label?

    List of Label?

    Hi, I can see that we have a comprehensive list of tags but can't find anything for label (for example, PERSON, GPE, etc). Would be nice if someone can redirect me to the list, even if it is somewhere in the source code.

    opened by warrenbocphet 3
  • Introduce Tokenizer interface

    Introduce Tokenizer interface

    This PR allows the user to provide a different tokenizer.

    Users can specify their own Tokenizer in the DocOpts. This replaces the boolean Tokenize option (set Tokenizer to nil to disable.)

    Currently only IterTokenizer is provided, which can be customized with its own Using options. func Tokenize becomes public to allow users to provide their own implementation and completely replace IterTokenizer.

    Model and Extractor need to use the same Tokenizer as Document, so this PR modifies those APIs to be consistent.

    (Also separating makeCorpus from extracterFromData to simplify parameter passing.)

    This solves issue #41 and #32

    opened by Titousensei 3
  • sentences first, then words?

    sentences first, then words?

    I'm a bit surprised to see this:

    type Document struct {
            Model *Model
            Text  string
    
            // TODO: Store offsets (begin, end) instead of `text` field.
            entities  []Entity
            sentences []Sentence
            tokens    []*Token
    }
    
    // A Sentence represents a segmented portion of text.
    type Sentence struct {
            Text string // The sentence's text.
    }
    

    If I care about finding sentences first, then the words within them, I need to take two passes, right?

     	// First we will do only segmentation, to break up sentences
        doc, err := prose.NewDocument(string(content), prose.WithTagging(false), prose.WithTokenization(false), prose.WithExtraction(false))
    	if err != nil {
            log.Fatal(err)
        }
    
        // Iterate over the doc's sentences, and words within them
        sents := doc.Sentences()
        fmt.Println(len(sents))
        for _, sent := range sents {
            fmt.Println(sent.Text)
    
    		sdoc, err := prose.NewDocument(sent.Text, prose.WithTagging(false), prose.WithExtraction(false), prose.WithSegmentation(false))
    		if err != nil {
    			log.Fatal(err)
    		}
    
    		// Iterate over the doc's tokens:
    		for _, tok := range sdoc.Tokens() {
    			fmt.Println(tok.Text, tok.Tag, tok.Label)
    		}
    
    		// Iterate over the doc's named-entities:
    		for _, ent := range sdoc.Entities() {
    			fmt.Println(ent.Text, ent.Label)
    		}
        }
    

    I suspect it's less efficient that way.

    Likewise, tokens include named entities, but it might make more sense to be able to iterate tokens in a sentence in such a way that each token is either a named entity or a regular token?

    If you intend to store offsets, like the TODO comment says, maybe this can be worked around, by finding overlapping offset ranges (e.g. sentence 2 goes from character position 240 to 267; word 10 goes from 240 to 247; named entity 2 goes from 248 to 257; etc... then I can see that word 10 and named entity 2 are both part of sentence 2, even if you don't offer a hierarchical model).

    opened by ec1oud 0
  • Seeing ~100ms overhead per doc. Did performance break, or am I using the API incorrecty?

    Seeing ~100ms overhead per doc. Did performance break, or am I using the API incorrecty?

    I am seeing a ~100ms overhead to process any document (text), which seems like it can't be correct given the performance data listed for large corpuses. I've been porting a large Python/NTLK app to Go, but my legacy Python/NLTK text parsing is running ~250x faster than my new go/prose implementation, which seems like I must be doing something wrong.

    Did some external dependency break the performance of prose, or am I using the API incorrectly?

    Simple performance test (Go 1.18)

    Here's a simple test that processes the same short sentence twice. Both executions take ~100ms.

    Code:

    package main
    
    import (
    	"fmt"
    	"time"
    
    	"github.com/jdkato/prose/v2"
    )
    
    var text = "This is a simple test."
    
    func main() {
    	for i := 0; i < 2; i++ {
    		start := time.Now()
    		doc, err := prose.NewDocument(
    			text,
    			prose.WithExtraction(false),
    			prose.WithSegmentation(false))
    		duration := time.Since(start)
    		fmt.Println(duration)
    		if err != nil {
    			panic(err)
    		}
    
    		// Iterate over the doc's tokens:
    		fmt.Print("   ")
    		for _, tok := range doc.Tokens() {
    			fmt.Printf("(%v, %v)  ", tok.Text, tok.Tag)
    		}
    		fmt.Println()
    	}
    }
    

    Output:

    $ go run .
    118.549243ms
       (This, DT)  (is, VBZ)  (a, DT)  (simple, JJ)  (test, NN)  (., .)  
    117.214746ms
       (This, DT)  (is, VBZ)  (a, DT)  (simple, JJ)  (test, NN)  (., .)  
    $
    

    Comparison test using NLTK in Python (3.8)

    When I run the same test using NLTK in Python, the first document processed also has ~100ms of overhead, but all subsequent documents are processed very quickly (~400usec in the example below):

    Sample code:

    #!/usr/bin/env python
    import nltk
    from datetime import datetime
    
    text = "This is a simple test."
    
    for _ in range(2):
        start = datetime.now()
        raw_tokens = nltk.word_tokenize(text)
        pos_tokens = nltk.pos_tag(raw_tokens)
        duration = datetime.now() - start
        print(duration)
        print(f'   {pos_tokens}')
    

    Output:

    $ ./test-nltk.py 
    0:00:00.092738
       [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('simple', 'JJ'), ('test', 'NN'), ('.', '.')]
    0:00:00.000415
       [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('simple', 'JJ'), ('test', 'NN'), ('.', '.')]
    $
    
    opened by epmoyer 2
  • Update dependency neurosnap/sentences

    Update dependency neurosnap/sentences

    I'd recommend updating the dependency neurosnap/sentences to use its new location (github.com/neurosnap/sentences) and the latest version (v1.0.9). This version fixes a very minor security problem where the readme linked to some amazon s3 buckets for downloading binaries. When repositories vendor jdkato/prose they will also get neurosnap/sentences, and it is better if they get this updated version.

    opened by owen-mc 0
  • Cannot reproduce tokenization example

    Cannot reproduce tokenization example

    Installed prose with go get github.com/jdkato/prose/v2 and copied the code from the tokenization example in the readme.

    The output I get is

    @jdkato NN
    , ,
    go VB
    to TO
    http://example.com NN
    thanks NNS
    : :
    ) )
    . .
    

    Instead of the expected one reported in the example:

            // @jdkato NN
            // , ,
            // go VB
            // to TO
            // http://example.com NN
            // thanks NNS
            // :) SYM
            // . .
    

    The difference is that the smile ":)" symbol is not recognized.

    $ go version go version go1.16.5 linux/amd64

    opened by ClonedOne 0
  • Help: Unable to run on AWS lambda

    Help: Unable to run on AWS lambda

    Hi i have a function that utilizes prose/v2.

    it works fine as it passes my tests (just the logic though).

    When i deploy it on AWS lambda, it doesn't run. I have logs right to the time I do

    prose.NewDocument(stringVal)
    

    And nothing logs out after that, no error nothing. albeit a noob with aws lambda, so maybe there are logs i'm not looking at.

    What is it doing internally? Wonder why it would just end

    opened by jonyeezs 1
Releases(v1.2.1)
Owner
Joseph Kato
Joseph Kato
A Go package for n-gram based text categorization, with support for utf-8 and raw text

A Go package for n-gram based text categorization, with support for utf-8 and raw text. To do: write documentation make it faster Keywords: text categ

Peter Kleiweg 69 Nov 28, 2022
Go bindings for the snowball libstemmer library including porter 2

Go (golang) bindings for libstemmer This simple library provides Go (golang) bindings for the snowball libstemmer library including the popular porter

Richard Johnson 20 Sep 27, 2022
Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please ★ star this repository to show your support! ?? A Machine Learning library written in pure Go designed to support rele

NLP Odyssey 1.3k Dec 30, 2022
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

James Bowman 386 Dec 25, 2022
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

segment A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 Features Currently only segmentation at Word

bleve 74 Dec 19, 2022
Go efficient text segmentation and NLP; support english, chinese, japanese and other. Go 语言高性能分词

gse Go efficient text segmentation; support english, chinese, japanese and other. 简体中文 Dictionary with double array trie (Double-Array Trie) to achiev

ego 2.1k Jan 8, 2023
ASCII transliterations of Unicode text.

go-unidecode ASCII transliterations of Unicode text. Inspired by python-unidecode. Installation go get -u github.com/mozillazg/go-unidecode Install C

Huang Huang 99 Dec 2, 2022
A tool to find all duplicates in large sets of text documents.

⊧ dupi Dupi is an engine for identifying and exploring duplicative text in sets of documents. Status Dupi is in alpha/early beta development stage. Pl

go-air 13 Dec 23, 2022
A go library for reading and creating ISO9660 images

iso9660 A package for reading and creating ISO9660 Joliet and Rock Ridge extensions are not supported. Examples Extracting an ISO package main import

Kamil Domański 225 Jan 2, 2023
Cgo binding for icu4c library

About Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1. Installation Installation consis

Dmitry Bondarenko 21 Sep 27, 2022
Cgo binding for Snowball C library

Description Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartar

Dmitry Bondarenko 35 Nov 28, 2022
Natural language detection library for Go

Whatlanggo Natural language detection for Go. Features Supports 84 languages 100% written in Go No external dependencies Fast Recognizes not only a la

Abado Jack Mtulla 572 Dec 28, 2022
Unicode transliterator for #golang

Unicode transliterator (also known as unidecode) for Go Use the following command to install gounidecode go get -u github.com/fiam/gounidecode/unideco

Alberto García Hierro 76 Sep 27, 2022
Golang implementation of the Paice/Husk Stemming Algorithm

##Golang Implementation of the Paice/Husk stemming algorithm This project was created for the QUT course INB344. Details on the algorithm can be found

Aaron Groves 29 Sep 27, 2022
Golang port of Petrovich - an inflector for Russian anthroponyms.

Petrovich is the library which inflects Russian names to given grammatical case. This is the Go port of https://github.com/petrovich. Installation go

Ivan Ivanov 41 Dec 25, 2022
A multilingual command line sentence tokenizer in Golang

Sentences - A command line sentence tokenizer This command line utility will convert a blob of text into a list of sentences. Demo Docs Install go get

Eric Bower 341 Dec 30, 2022
Cross platform locale detection for Golang

go-locale go-locale is a Golang lib for cross platform locale detection. OS Support Support all OS that Golang supported, except android: aix: IBM AIX

Xuanwo 86 Aug 20, 2022
Golang RESTful Client for HanLP.中文分词 词性标注 命名实体识别 依存句法分析 语义依存分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

gohanlp 中文分词 词性标注 命名实体识别 依存句法分析 语义依存分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理 HanLP 的golang 接口 在线轻量级RESTful API 仅数KB,适合敏捷开发、移动APP等场景。服务器算力有限,匿名用户配额较少

xxj 30 Dec 16, 2022
📖 Tutorial: An easy way to translate your Golang application

?? Tutorial: An easy way to translate your Golang application ?? The full article is published on April 13, 2021, on Dev.to: https://dev.to/koddr/an-e

Vic Shóstak 6 Feb 9, 2022