Self-contained Machine Learning and Natural Language Processing library in Go


alt text

Mentioned in Awesome Go Go Reference Go Go Report Card Maintainability codecov License Unstable PRs Welcome

If you like the project, please ★ star this repository to show your support! 🤩

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing.

spaGO is self-contained, in that it uses its own lightweight computational graph framework for both training and inference, easy to understand from start to finish.


Natural Language Processing

Internal Machine Learning Framework

  • Automatic differentiation:

    • Define-by-Run (default, just like PyTorch does)
    • Define-and-Run (similar to the static graph of TensorFlow)
  • Optimization methods:

    • Gradient descent (Adam, RAdam, RMS-Prop, AdaGrad, SGD)
    • Differential Evolution
  • Neural networks:

    • Feed-forward models (Linear, Highway, Convolution, ...)
    • Recurrent models (LSTM, GRU, BiLSTM...)
    • Attention mechanisms (Self-Attention, Multi-Head Attention, ...)
    • Recursive auto-encoders

Additional features

spaGO is compatible with pre-trained state-of-the-art neural models:



Clone this repo or get the library:

go get -u

spaGO supports two main use cases, which are explained more in detail in the following.

CLI mode

Several programs can be leveraged to tour the current NLP capabilities in spaGO. A list of the demos now follows.

The Docker image can be built like this.

docker build -t spago:main . -f Dockerfile

Library mode

You can access the core functionality of spaGO, i.e. optimizing mathematical expressions by back-propagating gradients through a computational graph, in your own code by using spaGO in library mode.

At a high level, it comprises four main modules:

  1. Matrix
  2. Graph
  3. Model
  4. Optimizer

To get started, look at the implementation of built-in neural models, such as the LSTM. Don't be afraid, it is straightforward Go code. The idea is that you could have written spaGO :)

You may find a Feature Source Tree useful for a quick overview of the library's package organization.

There is also a repo with handy examples, such as MNIST classification.

Current Status

We're not at a v1.0.0 yet, so spaGO is currently work-in-progress.

However, it has been running smoothly for a quite a few months now in a system that analyzes thousands of news items a day!

Besides, it's pretty easy to get your hands on through, so you might want to use it in your real applications.

Early adopters may make use of it for production use today as long as they understand and accept that spaGO is not fully tested and that APIs might change.

Known Limits

Sadly, at the moment, spaGO is not GPU friendly by design.


We're glad you're thinking about contributing to spaGO! If you think something is missing or could be improved, please open issues and pull requests. If you'd like to help this project grow, we'd love to have you!

To start contributing, check the Contributing Guidelines.


We encourage you to write an issue. This would help the community grow.

If you really want to write to us privately, please email Matteo Grella with your questions or comments.

Projects Using spaGO

Below is a list of known projects that use spaGO:

Other Links


spaGO is a personal project that is part of the open-source NLP Odyssey initiative initiated by members of the EXOP team. I would therefore like to thank EXOP GmbH here, which is providing full support for development by promoting the project and giving it increasing importance.


We appreciate contributions of all kinds. We especially want to thank spaGO fiscal sponsors who contribute to ongoing project maintenance.

See our Open Collective page if you too are interested in becoming a sponsor.

  • Help with German zero shot

    Help with German zero shot

    I would like to run Sahajtomar/German_Zeroshot ( model in spago.

    The import was successful: ./huggingface-importer --model=Sahajtomar/German_Zeroshot --repo=./models -> BERT has been converted successfully!

    Can I now run the model with Bart server(as I believe supports the zero shot, not the Bart server)?

    I receive:

    bassea@AP15557 spago % ./bart-server server --repo=./models --model=Sahajtomar/German_Zeroshot --tls-disable Start loading pre-trained model from "models/Sahajtomar/German_Zeroshot" [1/2] Loading configuration... ok panic: bart: unsupported architecture BertForSequenceClassification

    goroutine 1 [running]:, 0x21, 0x2, 0xc000038660, 0x21, 0x492f960) /Users/bassea/go/src/spago/pkg/nlp/transformers/bart/loader/loader.go:43 +0x819, 0x0, 0x0) /Users/bassea/go/src/spago/cmd/bart/app/server.go:106 +0x105*Command).Run(0xc000222ea0, 0xc00022b440, 0x0, 0x0) /Users/bassea/go/pkg/mod/[email protected]/command.go:163 +0x4e0*App).RunContext(0xc0000351e0, 0x4ae0aa0, 0xc000036068, 0xc0000320a0, 0x5, 0x5, 0x0, 0x0) /Users/bassea/go/pkg/mod/[email protected]/app.go:313 +0x814*App).Run(...) /Users/bassea/go/pkg/mod/[email protected]/app.go:224 main.main() /Users/bassea/go/src/spago/cmd/bart/main.go:15 +0x72

    opened by abasse 16
  • Is it possible to pre-load passages from csv?

    Is it possible to pre-load passages from csv?

    Is it currently possible to preload let's say the go faq and run semantic search on the passages by only providing a question to spago serving squad2. I would like to create behavior like shown in this video semantic-search

    If not yet could you give me some pointers to dig into it?

    Thank you

    opened by go-dockly 8
  • float32 data type

    float32 data type

    This is just a question out of curiosity but: Do you have any plans to support the float32 data type (or any other types, like integers actually)?

    • It is very common to train a neural network with a float32 precision, as it reduces the computation cost without having any significant impact on the accuracy, and I was wondering what would be the speed gain for spago?
    • I was thinking about something like an Enum type given to the matrix creation function and maybe the possibility to convert an existing matrix from a type to another
    • Supporting types like uint8 could allow to implement some quantization schemes more easily, which seems like a good fit with Golang since it shines in distributed applications (e.g. sending quantized weights over the network is definitely more bandwidth friendly)

    For now I just find myself "fighting the matrix" by extracting the underlying data, convert them in the desired type, do some work with them and finally reloading them later in a matrix to run some computation.

    opened by xiorcale 8
  • Other integration / Telegram Bot

    Other integration / Telegram Bot

    Hi guys,

    Hope you are all well !

    I was looking for some implementation of AI chatbot golang and found your project spago.

    And, I am developing a multibot for telegram based on go plugins:

    So I was wondering how can I integrate spago as a QA bot plugin.

    Do you have any insights or advices for such integration ?

    Thanks in advance.

    Cheers, X

    opened by ghost 7
  • Running hugging_face_importer from docker container causes strange behavior

    Running hugging_face_importer from docker container causes strange behavior

    I was following instructions to test the question answering demo and noticed that the container never completed by outputting the spago model and left a zombie python process running on my machine (a mid-2013 MacBook Pro 2.3 ghz quad core i7, 16 gb ddr3). Here are steps to reproduce:

    # after cloning the repo in its latest form
    git rev-parse --short HEAD
    docker build -t spago:main . -f Dockerfile
    # that completes successfully
    mkdir ~/.spago
    # then i run the hugging face import step via the container
    docker run --rm -it -v ~/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago
    Running command: './hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago'
    Downloading dataset...
    Start downloading 🤗 `deepset/bert-base-cased-squad2`
    2020/06/27 18:55:30 Fetch the model configuration from ``
    Downloading... 508 B complete
    2020/06/27 18:55:30 Fetch the model vocabulary from ``
    Downloading... 214 kB complete
    2020/06/27 18:55:30 Fetch the model weights from `` (it might take a while...)
    # this process runs for a _really_ long time - ive actually never seen it finish successfully (have let it run for over 60 minutes)

    In another shell session I was inspecting what Python processes are operating because I noted some cpu hogging after the pytorch model was fully downloaded ...

    $ ls -hal ~/.spago/deepset/bert-base-cased-squad2/
    total 417M
    drwxr-xr-x 6 anthcor staff  192 Jun 27 14:56 ./
    drwxr-xr-x 3 anthcor staff   96 Jun 27 14:55 ../
    -rw-r--r-- 1 anthcor staff  508 Jun 27 14:55 config.json
    drwx------ 6 anthcor staff  192 Jun 27 14:57 embeddings_storage/
    -rw-r--r-- 1 anthcor staff 414M Jun 27 14:56 pytorch_model.bin
    -rw-r--r-- 1 anthcor staff 209K Jun 27 14:55 vocab.txt
    $ ps aux | grep spago
    anthcor          29694   6.1  0.1  4444408  22784 s003  S+    2:55PM   0:03.44 docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago
    anthcor          29705   0.0  0.0  4268300    700 s004  S+    2:56PM   0:00.00 grep spago
    anthcor          29693   0.0  0.0  4280612   6932 s003  S+    2:55PM   0:00.08 /usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/Resources/ /usr/local/bin/grc -es --colour=auto docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago

    When running this workflow not with docker everything works as expected.

    In order to stop everything I just kill the docker container and the zombie local python process.

    ps aux | grep -i spago | awk '{print $2}' | xargs kill -9 $1

    Not really sure why this happens – looks like the container is using a local system binary and that causes hangups in the flow of things but I could totally be wrong as I haven't really spent too much time diving in. Hope this gives enough insight into my issue – lmk if you would like any more details. Cheers 🍻

    opened by anthonycorletti 6
  • BERT Server Prediction Doesn't Seem To Work

    BERT Server Prediction Doesn't Seem To Work

    I'm attempting to use the BERT server, and have successfully gotten the /answer API call to work. I can't seem to find much information on BERT prediction in general, but I'm guessing its used to predict what the next sentence will be?

    Based on this I tried sending a JSON request to the /predict route like so, but it gives an empty response:

    $> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/predict

    Contrast this with discriminate which works:

    $> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/discriminate

    Also, what does discriminate do exactly? I found an article explaining all the details of BERT but cant seem to find out what discriminate does

    opened by bonedaddy 5
  • Cannot import project as library

    Cannot import project as library


    I am trying to import this project as a library by running the command go get -u (as instructed in the readme)

    I am getting the following error: cannot find package ""

    Anyone else got this error?

    opened by csaade 4
  • Better BERT Server Capabilities

    Better BERT Server Capabilities

    (apologies if the many issues are annoying, really liking the library so far)


    Right now when using the BERT server, a set of defaults are used without any control from the user:

    • HTTP server lacks customizability
    • HTTP server listens on
    • No ability to enable TLS

    Additionally it's not possible to build "external servers" as the functions used by the BERT http router (discriminateHandler, predictionHandler, qaHandler are private functions). If this was changed to instead export the handler functions (DiscriminateHandler, PredictionHandler, QaHandler) it would allow people to have more control over the BERT server, better middleware capabilities, TLS, etc...

    By using public handler functions, users would be able to define their own routers, say using chi, and overall have more control of the BERT server.

    I'd be more than happy to open a PR that implements these suggestions

    opened by bonedaddy 4
  • Multi-label BERT classifier from PyTorch

    Multi-label BERT classifier from PyTorch

    So I can convert and then load my BERT model, but I am having troubling working out how to operate it from Spago.

    It is a multi-label model and to use it in Python I do this:

        text_enc = bert_tokenizer.encode_plus(
        # mymodel implements pl.LightningModule
        outputs = mymodel(text_enc['input_ids'], text_enc['attention_mask'])
        pred_out = outputs[0].detach().numpy()

    And then process the pred_out array. This model has 5 outputs and all works as you expect in Python.

    So, how do I perform the equivalent in Spago? Borrowing code from the classifier server, I have got this far, but it just isn't obvious what I need to modify to cater for 5 output label layer.

    func getTokenized(vocab *vocabulary.Vocabulary, text string) []string {
    	cls := wordpiecetokenizer.DefaultClassToken
    	sep := wordpiecetokenizer.DefaultSequenceSeparator
    	tokenizer := wordpiecetokenizer.New(vocab)
    	tokenized := append([]string{cls}, append(tokenizers.GetStrings(tokenizer.Tokenize(text)), sep)...)
    	return tokenized
    // ....
    	model, err := bert.LoadModel(dir)
    	if err != nil {
    		log.Fatalf("error during model loading (%v)\n", err)
    	defer model.Close()
    	// We need a classifier that matches the output layer of our model.
    	var bc = bert.ClassifierConfig{
    		InputSize: 768,
    		Labels:    []string{"A", "B", "C", "D", "E"},
    	model.Classifier = bert.NewTokenClassifier(bc)
    	tokenized := getTokenized(model.Vocabulary, s)
    	g := ag.NewGraph(ag.ConcurrentComputations(runtime.NumCPU()))
    	proc := nn.ReifyForInference(model, g).(*bert.Model)
    	encoded := proc.Encode(tokenized)
    	logits := proc.SequenceClassification(encoded)
    	probs := floatutils.SoftMax(logits.Value().Data())

    However, this just gives me 0.2 for each, so I seem to be miles off. Is there an example, or can a short code sequence be provide? Is the wordpiecetokenizer even the correct thing to use?

    opened by jimidle 3
  • Accelerators


    There is a ton of work being done with risc-v and machine learning accelerators

    tinygo makes it’s possible to leverage spago on these accuracies I feel.

    Just wanted to point this out as I saw in your Q&A that you felt it was not possible to accelerate spago

    opened by gedw99 3
  • This is the best thing since sliced bread

    This is the best thing since sliced bread

    Hello I do not have any issue so feel free to close this but I just wanted to say that spago is fantastic. Love what has been done here in native go!!! Thank you everybody involved. FORZA ITALIA

    opened by go-dockly 3
  • QA Chinese model result does not match python version

    QA Chinese model result does not match python version

    Using this Chinese model This model runs on python locally, the output is correct, but from spaGO is not.

    Similar to #101, but I cannot find the bool parameter for QA, how to turn off output is forced to be a distribution (sum must be 1), whereas with Python, the output is free?

    server := bert.NewServer(model)
    answers := s.model.Answer(body.Question, body.Passage)

    Translated QA: Context: My name is Clara, I live in Berkeley Q: what is my name? A: Clara

    Output is supposed to be 克拉拉 but got

        "answers": [
                "text": "我叫克拉拉,我住在伯克利。",
                "start": 0,
                "end": 13,
                "confidence": 0.2547743
                "text": "住在伯克利。",
                "start": 7,
                "end": 13,
                "confidence": 0.22960596
                "text": "我叫克拉拉,我住",
                "start": 0,
                "end": 8,
                "confidence": 0.1548344
        "took": 1075

    ./bert-server server --repo=~/.spago --model=luhua/chinese_pretrain_mrc_roberta_wwm_ext_large --tls-disable

    PASSAGE="我叫克拉拉,我住在伯克利。"                                                                                                                                                 
    curl -k -d '{"question": "'"$QUESTION1"'", "passage": "'"$PASSAGE"'"}' -H "Content-Type: application/json" ""
    opened by Tonghua-Li 2
  • Gorgonia tensors

    Gorgonia tensors

    Hi Matteo in your gophercon deck you mentioned to have gpu friendly gorgonia tensors on spago's roadmap. I am curious about how this might work. Could you give any pointers. I suppose because the tensors are more or less just a slice of floats? I read on their git that with regards to cuda the api is expected to change quite a bit before hitting v1.0. Currently on 0.9.17 so I guess not before gorgonia's cuda interface got to a stable first release?

    opened by go-dockly 0
  • Nearly 4 times the memory usage when compared to python for the same model

    Nearly 4 times the memory usage when compared to python for the same model

    I ran memory profiling for the code and spago version uses 3.9 GB when compared to 1.2 GB of python. The model sizes are similar valhalla/distilbart-mnli-12-3 , it is 2.5 GB after transforming (hf-importer) to spago and where as upstream python version is 2.1 GB.

    Memory profiling in spago


    Memory profiling in Python

    Line #    Mem usage    Increment  Occurences   Line Contents
         7    217.3 MiB    217.3 MiB           1   @profile
         8                                         def classify():
         9   1227.3 MiB   1010.0 MiB           1       classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")
        11   1227.3 MiB      0.0 MiB           1       sequence = "PalmOS on Raspberry Pi"
        12   1227.3 MiB      0.0 MiB           1       candidate_labels = ["startup", "business", "legal", "tech"]
        15   1235.1 MiB      7.8 MiB           1       res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)
        17   1235.1 MiB      0.0 MiB           5       for i, label in enumerate(candidate_labels):
        18   1235.1 MiB      0.0 MiB           4           print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))

    Is this expected? Spago can be very useful in low memory environments like ARM SBC to conducted CPU bound inference, But the memory usage needs to optimized.

    Python version seems to be faster in overall operation timing as well because loading of configuration, model weights takes variable timing in spago.

    opened by abishekmuthian 2
  • Differences in the output of zero shot classification between python & spago for the same model

    Differences in the output of zero shot classification between python & spago for the same model

    I appreciate everyone involved with the spago project for developing a proper Machine Learning framework for Go.

    I'm in the process of exploring spago and found that the output for valhalla/distilbart-mnli-12-3 differs for zero shot classification when using python vs spago .

    	model, err := zsc.LoadModel("spago/valhalla/distilbart-mnli-12-3")
    	if err != nil {
    	defer model.Close()
    	sequence := "PalmOS on Raspberry Pi"
    	// arbitrary list of topics
    	candidateLables := []string{"startup", "business", "legal", "tech"}
    	result, err := model.Classify(sequence, "", candidateLables, true)
    	if err != nil {
    	for i, item := range result.Distribution {
    		fmt.Printf("%d. %s [%.2f]\n", i, item.Class, item.Confidence)
    0. tech [0.89]
    1. startup [0.02]
    2. legal [0.01]
    3. business [0.00]
        classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")
        sequence = "PalmOS on Raspberry Pi"
        candidate_labels = ["startup", "business", "legal", "tech"]
        res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)
        for i, label in enumerate(candidate_labels):
            print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))
    0. tech [0.99]
    1. legal [0.77]
    2. startup [0.05]
    3. business [0.00]

    Is this an expected behavior? If so why.

    opened by abishekmuthian 1
  • bart-large-mnli multi_class does not agree with Python version

    bart-large-mnli multi_class does not agree with Python version

    If you convert facebook/bart-large-mnli and use it to evaluate the demo text at huggingface and compare against a local Python setup for verification, we find that:

    • the online demo card and the local Python agree on the label score
    • the label probabilities given back are vastly different
    • the Python version takes roughly 16 seconds on my local machine, but the Spago version takes 37 seconds - this is a MAC and there is no GPU available

    Python code is

        text = "A new model offers an explanation for how the Galilean satellites formed around the solar system’s " \
               "largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling " \
               "mysteries when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, " \
               "best known for his contributions to the search for the solar system’s missing “Planet Nine,” spotted a " \
               "beer bottle. At a steep, 20 degree grade, he wondered why it wasn’t rolling down the hill. He realized " \
               "there was a breeze at his back holding the bottle in place. Then he had a thought that would only pop " \
               "into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.” Europa is one of " \
               "Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal, " \
               "Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory " \
               "in France, present a theory explaining how some moons form around gas giants like Jupiter and Saturn, " \
               "suggesting that millimeter-sized grains of hail produced during the solar system’s formation became " \
               "trapped around these massive worlds, taking shape one at a time into the potentially habitable moons we " \
               "know today. "
        cc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
        labels = ['space & cosmos', 'scientific discovery', 'microbiology', 'robots', 'archeology']
        r = cc(text, labels, multi_class=True)

    Go code, with same text and classes, is:

    bartModel, err = zsc.LoadModel(bartDir)
    // ... check err
    result, err := bartModel.Classify(c.InputText, "", classes, true)

    Similarly using the model valhalla/distilbart-mnli-12-3 also gives wildly different results to the online huggingface demo, using the same text and label set as above.

    So, is there something else I need to do, or is the zsc code not working? My go code is essentially just like the zsc demo code.

    opened by jimidle 7
  • v1.0.1(Sep 16, 2022)

  • v1.0.0(Sep 14, 2022)

    First stable release!


    • Fix bug preventing the embeddings model from being traversed on nn.Apply.
    • Fix incorrect use of self-attention cache when used for cross-attention.


    • Optimize implementation of some Dense matrix functions, especially on amd64 with AVX.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0-alpha(Jun 14, 2022)

    With this release we introduce breaking changes that bring significant improvements to the project's structure, API and performance.

    It would be difficult and confusing to list every single API change. Instead, the following sections will broadly describe the most relevant changes, arranged by topic.

    Project structure

    Until this release, the project was essentially a monorepo in disguise: the core packages for handling matrices and computational graphs were accompanied by many models implementations (from the very simple up to the most sophisticated ones) and commands (models management utilities and servers).

    We now prefer to keep in this very repository only the core components of spaGO, only enriched with an (opinionated) set of popular models and functionalities. Bigger sub-packages and related commands are moved to separate repositories. The moved content includes, most notably, code related to Transformers and Flair. Please refer to the section Projects using Spago from the README for an updated list of references to separate projects (note: some of them are still work in progress). If you have the feeling that something big is missing in spaGO, chances are it was moved to one of these separate projects: just have a look there first.

    The arrangement of packages has been simplified: there's no need anymore to distinguish between cmd and pkg; all the main subpackages are located in the project's root path. Similarly, many packages, previously nested under pkg/ml, can now be found at root level too.

    Go version and dependencies

    The minimum required Go version is 1.18, primarily needed for the introduction of type parameters (generics).

    Thanks to the creation of separate projects, discussed above, and further refactoring, the main set of required dependencies is limited to the ones for testing.

    Only the subpackage embeddings/store/diskstore requires something more, so we defined it as "opt-in" submodule, with its own dependencies.

    float32 vs. float64

    Instead of separate packages mat32 and mat64, there is now a single unified package mat. Many parts of the implementation make use of type parameters (generics), however the package's public API makes a rather narrow use of them.

    In particular, we abstained from adding type parameters to widely-used types, such as the Matrix interface. Where suitable, we are simply favoring float64 values, the de-facto preferred floating point type in Go (just think about Go math package). For other situations, we introduced a new subpackage mat/float. It provides simple types, holding either float32 or float64 values, as scalars or slices, and makes it easy to convert values between different precisions, all without making explicit use of generics. This design prevents the excessive spreading of type arguments to tons of other types that need to manipulate matrices, bot from other spaGO packages and from your own code.


    • The type mat.Matrix is the primary interface for matrices and vectors throughout the project.
    • The type mat.Dense is the concrete implementation for a dense matrix. Unlike the interface, it has a type argument to distinguish between float32 and float64.
    • We removed implementation and support for sparse matrices, since their efficacy and utility were marginal. A better implementation might come back in the future.
    • A new dense matrix can be created "from scratch" by calling one of the several functions mat.New*** (NewDense, NewVecDense, ...). Here you must choose which data type to use, specifying it as type parameter (unless implicit).
    • Once you have an existing matrix, you can create new instances preserving the same data type of the initial one: simply use one of the New*** methods on the matrix instance itself, rather than their top-level function counterparts.
    • Any other operation performed on a matrix that creates a new instance will operate with the same type of the receiver, and returns an instance of that type too.
    • Operations with matrices of different underlying data types are allowed, just beware the memory and computation overheads introduced by the necessary conversions.

    Auto-grad package

    • The package ag now implicitly works in "define-by-run" mode only. It's way more performant compared to the previous releases, and there would be no significant advantage in re-using a pre-defined graph ("define-and-run").
    • There is no Graph anymore! At least, not as a first citizen: an implicit "virtual" graph is progressively formed each time an operation over some nodes is applied. The virtual graph can be observed by simply walking the tree of operations. Most methods of the former Graph are now simple functions in the ag package.
    • We still provide a way to explicitly "free" some resources after use, both for helping the garbage collector and for returning some objects to their sync.Pool. The function ag.ReleaseGraph operates on the virtual graph described above, usually starting from the given output nodes.
    • Forward operations are executed concurrently. As soon as an Operator is created (usually by calling one of the functions in ag, such as Add, Prod, etc.), the related Function's Forward procedure is performed on a new goroutine. Nevertheless, it's always safe to ask for the Operator's Value without worries: if it's called too soon, the function will lock until the result is computed, and only then return the value.
    • To maximize performance, we removed the possibility to set a custom limit for concurrent computations. Thanks to the new design, we now let the Go runtime itself manage this problem for us, so that you can still limit and finetune concurrency with the GOMAXPROCS variable.
    • The implementation of backpropagation is also redesigned and improved. Instead of invoking the backward procedure on an explicit Graph, you can call ag.Backward or ag.BackwardMany, specifying the output node (or nodes) of your computation (such as loss values, in traditional scenarios). The backward functions traverse the virtual graph and propagate the gradients, leveraging concurrency and making use of goroutines and locks in a way that's very similar to the forward procedure. The backward functions will lock and wait until the whole gradients propagation is complete before returning. The locking mechanism implemented in the nodes' Grad methods, will still prevent troubles in case your own code reads the gradients concurrently (that would be very uncommon).
    • We also modified the implementation of time-steps handling and truncated backpropagation. Since we don't have the support of a concrete Graph structure anymore, we introduced a new dedicated type ag.TimeStepHandler, and related functions, such as NodeTimeStep. For performing a truncated backpropagation, we provide the function ag.BackwardT and ag.BackwardManyT: they work similarly to the normal backpropagation functions described above, only additionally requiring a time-step handler and the desired amount of back steps.
    • We simplified and polished the API for creating new node-variables. Instead of having multiple functions for simple variables, scalars, constants, with/without name or grads, and various combination of those, you can now create any new variable with ag.Var, which accepts a Matrix value and creates a new node-variable with gradients accumulation disabled by default. To enable gradients propagation, or setting an explicit name (useful for model params or constants), you can use the Variable's chainable methods WithGrad and WithName. As a shortcut to create a scalar-matrix variable you can use ag.Scalar.
    • The package ag/encoding provides generic structures and functions to obtain a sort of view of a virtual graph, with the goal of facilitating the encoding/marshaling of a graph in various formats. The package ag/encoding/dot is a rewriting of the former pkg/ml/graphviz, that uses the ag/encoding structures to represent a virtual graph in Graphviz DOT format.


    • As before, package nn provides types and functions for defining and handling models. Its subpackages are implementations of most common models. The set of built-in models has been remarkably revisited, moving some of them to separate projects, as previously explained.
    • The Model interface has been extremely simplified: it only requires the special empty struct Module to be embedded in a model type. This is necessary only to distinguish an actual model from any other struct, which is especially useful for parameters traversal, or other similar operations.
    • Since the Graph has been removed from ag, the models clearly don't need to hold a reference to it anymore. Similarly, there is no need for any other model-specific field, like the ones available from the former BaseModel. This implies the elimination of some seldomly used properties. Notable examples are the "processing mode" (from the old Graph) and the time step (from the old BaseModel). In situations where a removed value or feature is still needed, we suggest to either reintroduce the missing elements on the models that needs them, or to extract them to separate types and functions. An example of extracted behavior is the handling of time steps, already mentioned in the previous section.
    • There is no distinction anymore between "pure" models and processors, making "reification" no longer necessary: once a model is created (or loaded), it can be immediately used, even for multiple concurrent inferences.
    • A side effect of removing processor instances is that it's not possible to hold any sort of state related to a specific inference inside the structure of a model (or, at least, it's discouraged in most situations). Keeping track of a state is quite common for models that work with a running "memory" or cache. The recommended approach is to represent the state as a separate type, so that the "old" state can be passed as argument to the model's forward function (along with any other input), and the "new" or updated state can be returned from the same function (along with any other output). Some good examples can be observed in the implementation of recurrent networks (RNNs), located at nn/recurrent/...: each model has a single-step forward function (usually called Next) that accepts a previous state and returns a new one.
    • We removed the Stack Model, in favor of a new simple function nn.Forward, that operates on a slice of StandardModel interfaces, connecting outputs to inputs sequentially for each module.
    • We introduced the new type nn.Buffer: it's a Node implementation that does not require gradients, but can be serialized just like any other parameter. This is useful, for example, to store constants, to track the mean and std in batch norm layers, etc. As a shortcut to create a Buffer with a scalar-matrix value you can use nn.Const.
    • We refactored the arguments of the parameters-traversal functions ForEachParam and ForEachParamStrict. Furthermore, the new interface ParamsTraverser allows to traverse a model's parameters that are not automatically discovered by the traversal functions via reflection. If a model implements this interface, the function TraverseParams will take precedence over the regular parameters visit.
    • We introduced the function Apply, which visits all sub-models of any Model. Typical usages of this function include parameters initialization.


    • The embeddings model has been refactored and made more flexible by splitting the new implementation into three main concerns: stores, the actual model, and the model's parameters.
    • Raw embeddings data can be read from, and perhaps written to, virtually any suitable medium, be it in-memory, on-disk, local or remote services or databases, etc. The Store interface, defined in package embeddings/store, only requires an implementation to implement a bunch of read/write functions for key/value pairs. Both keys and values are just slice of bytes. For example, in a typical scenario involving word embeddings, a key might be a string word converted to []byte, and the value the byte-marshaled representation of a vector (or a more complex struct also holding other properties).
    • It's not uncommon for a complex model, or application, to make use of more than one store. For a more convenient handling, multiple independent Stores can be organized together in a Repository, another interface defined in embeddings/store. A Repository is simply a provider for Stores, where each Store is identified by a string name. For example, if we are going to use a relational database for storing embeddings data, the Repository might establish the connection to the database, whereas each Store might identify a separate table by name, used for reading/writing data.
    • We provide two built-in implementations of Repository/Store pairs. The package embeddings/store/diskstore is a Go submodule that stores data on disk, using BadgerDB; this is comparable to the implementation from previous releases. The package embeddings/store/memstore is a simple volatile in-memory implementation; among other usages, it might be especially convenient for testing.
    • The package embeddings implements the main embeddings Model. One Model can read and write data to a single Store, obtained from a Repository by the configured name. The model delegates to the embeddings Store the responsibility to actually store the data; for this reason, the Store value on a Model is prevented from being serialized (this is done with the utility type embeddings/store.PreventStoreMarshaling).
    • To facilitate different use cases, the Model allows a limited set of possible key types, using the constraint Key as type argument.
    • The type Embedding represents a single embedding value that can be handled by a Model. It satisfies the interface nn.Param, allowing seamless integration with operations involving any other model. Behind the hood, the implementation takes care of reading/writing data against a Store, efficiently handling marshaling/unmarshaling and preventing race conditions. The Value and the Payload (if any) are read/written against the Store; the Grad is only kept in memory. All properties of different Embedding instances for the same key are kept synchronized upon changes.
    • A Model keeps track of all Embedding parameters with associated gradients. The method TraverseParams allows these parameters to be discovered and seen as if they were any other regular type of parameter. This is especially important for operations such as embeddings optimization.
    • It is a common practice to share the same embeddings among multiple models. In this case it is important that the serialized (and deserialized) instance is very same one. Therefore, we introduced the Shared structure that prevents binary marshaling.


    • Gradient descent optimization algorithms are available under the package gd, with minor API changes.
    • We removed other methods, such as differential evolution, planning to re-implement them on separate forthcoming projects.


    • We removed the formed package pkg/utils. Some of its content was related to functionalities now moved to separate projects. Any remaining useful code has been refactored and moved to more appropriate places.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(May 24, 2021)


    • New package ml/ag/encoding/dot, for simple serialization of a Graph to DOT (Graphviz) format.
    • New package ml/nn/sgu, implementing a Spatial Gating Unit (SGU) model.
    • New package ml/nn/conv1x1, implementing a simple 1-dimensional 1-sized-kernel convolution model.
    • New package ml/nn/gmlp, implementing a gMLP model.


    • ml/nn/activation/Model.Forward now simply returns the input as it is if the activation function is the identity.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(May 13, 2021)


    • ml/losses.WeightedCrossEntropy()
    • ml/losses.FocalLoss()
    • ml/losses.WeightedFocalLoss()
    • nlp/sequencelabeler.LoadModel() (it replaces Load() and LoadEmbeddings())
    • nlp/charlm.LoadModel()
    • nlp/transformers/bert.Model.PredictMLM()
    • nlp/transformers/bart/tasks package
    • nlp/transformers/bert.Model.Vectorize()
    • ml/ag.Graph.Nodes() and ml/ag.Nodes()
    • ml/nn.Model.Close()
    • ml/nn.ReifyForTraining() and ml/nn.ReifyForInference()
    • ml/ag.Graph.Backward() now panics if it is executed with nodes belonging to different graphs.
    • The new ml/graphviz package allows exporting a Graph to Graphviz DOT format. To make it possible, we introduced a new go-mod dependency gographviz.
    • A custom name can be optionally set to a Graph's Variables. This can be useful for debugging purposes and visual graph representation. You can now use Graph.NewVariableWithName() and Graph.NewScalarWithName() to create named Variables, and get the name of a Variable with Variable.Name().


    • All UnaryElementwise functions provided by the package ag/fn have been promoted to separate dedicated structs. This improves debuggability and you can get appropriate function names when using reflection. Here is the full list of the modified functions: Tan, Tanh, Sigmoid, HardSigmoid, HardTanh, ReLU, Softsign, Cos, Sin, Exp, Log, Neg, Reciprocal, Abs, Mish, GELU, Sqrt, Swish. For the same reason, a dedicated Square function is introduced, replacing Prod with both operands set to the same value.
    • ml/ag types Operator, Variable, Wrapper are now public.
    • ml/nn.Reify() now expects a Graph and a Processing Mode arguments instead of a Context object (removed).
    • ml/nn.BaseModel has been modified, replacing the field Ctx Context with a direct reference to the model's Graph and the Processing Mode (fields G and ProcessingMode).
    • Refactoring server implementation of nlp/sequencelabeler, nlp/transformers/bert, and nlp/transformers/bart.
    • Upgrade various dependencies.
    • Regenerate protocol buffers files (with protoc-gen-go v1.26.0 and protoc v3.16.0).


    • nlp/sequencelabeler.Load() and LoadEmbeddings() (now replaced by nlp/sequencelabeler.LoadModel())
    • ml/nn.Context (see related changes on Reify() and BaseModel)
    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 16, 2021)

  • v0.5.1(Mar 7, 2021)



    • Improve nlp.transformer.generation algorithms:
      • optimize Generator.getTopKScoredTokens().
      • optimize Generator.updateTokensScores().
    • Simplify mat32.Dense.Mul when doing Matrix-Vector multiplication.
    • Refactor math32 functions using chewxy/math32 functions.
    • Improve ag.Graph efficiency:
      • Use pre-computed cache doing ag.Graph.groupNodesByHeight().
      • Use sync.pool to reduce allocations of graph's operators.


    • Fix past key-values usage on self-attention and cross-attention
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Feb 15, 2021)


    • Implement a beam-search algorithm for conditional generation:
      • nlp.transformer.generation package.
    • Add implementation of the Sentence-Piece tokenizer:
      • nlp.tokenizers.sentencepiece package.
    • BART improvements:
      • gRPC and HTTP API to perform Text Generation.
      • Add support for "Marian" architecture (used for translation tasks).
      • Add sinusoidal positional encoder (used by Marian).
      • Add "head" for conditional generation:
        • nlp.transformers.bart.head.conditionalgeneration package.
    • Add nn.Closer interface (e.g. embeddings.Model needs to close the underlying key-value store).
    • Add Swish act. function without trainable parameters.
    • Add SiLU act. function (it is just an alias for Swish).
    • New pe.SinusoidalPositionalEncoder (this implementation replaces unused pe.PositionalEncoder and pe.AxialPositionalEncoder)


    • Update urfave/cli to v2.
    • Update dgraph-io/badger to v3.
    • Make the BART positional encoder an interface to support various encoding (i.e. trainable vs static).
    • Rename to fn.NewSwish into fn.NewSwishB as this was the Swish variant with trainable parameters (B).
    • Relax ag.GetOpName to match operator names in lower-case.
    • Allow arbitrary activation function on BART encoder/decoder layers.
    • Use precomputed "keys" and "values" in self-attention, multi-head attention and BART decoder.


    • In relation to the aforementioned positional encoding changes:
      • pe.PositionalEncoder and related functions
      • pe.AxialPositionalEncoder and related functions


    • Fix causal-mask used by nn.ScaledDotProductAttention
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jan 22, 2021)


    • New function ReleaseMatrix to packages mat32 and mat64.
    • New methods to Matrix interface, from mat32 and mat64: Minimum, Maximum, MulT, Inverse, DoNonZero. However, the implementation on sparse matrices is not implemented yet (it always panics).


    • Prefer handling Matrix interface values over specific Dense or Sparse matrices, also avoiding unnecessary type casts. Relevant changes to the public API are listed below.
      • mat(32|64).Stack function's arguments and returned value are now Matrix interfaces, instead of explicit Dense matrices.
      • Dense.Minimum and Dense.Maximum, from packages mat32 and mat64, return a Matrix interface, instead of a specific Dense type.
      • The return values of fofe.EncodeDense, fofe.Encode, and fofe.BiEncode are slices of Matrix values, instead of Dense or Sparse.
      • The z argument of the function fofe.Decode is of type Matrix, instead of Dense.
      • (Differential Evolution optimizer) API was changed handling Matrix values, instead of specific Dense matrices. Changes include: Member.TargetVector, Member.DonorVector, ScoredVector.Vector, the vector argument of NewMember function, the solution argument of score and validate functions passed to NewOptimizer.
      • PositionalEncoder.Cache and AxialPositionalEncoder.Cache are slices of Matrix, instead of slices of Dense.
      • AxialPositionalEncoder.EncodingAt returns a Matrix value, instead of Dense.
      • nn.DumpParamsVector returns a Matrix value, instead of Dense.
      • The vector argument of the function nn.LoadParamsVector is a Matrix, instead of Dense.
      • The value argument of the method embeddings.Model.SetEmbedding is of type Matrix, instead of Dense.
      • The type of the struct field evolvingembeddings.WordVectorPair.Vector is Matrix, instead of Dense.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 17, 2021)


    • Various new test cases (improving the coverage).
    • nlp.embeddings.syncmap package.
    • ml.nn.recurrent.srnn.BiModel which implements a bidirectional variant of the Shuffling Recurrent Neural Networks (SRNN).
    • Configurable timeout and request limit to all HTTP and gRPC servers (see also commands help).


    • All CLI commands implementation has been refactored, so that the docker-entrypoint can reuse all other cli.App objects, instead of just running separate executables. By extension, now the Dockerfile builds a single executable file, and the final image is way smaller.
    • All dependencies have been upgraded to the latest version.
    • Simplify custom error definitions using fmt.Errorf instead of functions from
    • Custom binary data serialization of matrices and models is now achieved with Go's encoding.gob. Many specific functions and methods are now replaced by fewer and simpler encoding/decoding methods compatible with gob. A list of important related changes follows.
      • utils.kvdb.KeyValueDB is no longer an interface, but a struct which directly implements the former "badger backend".
      • utils.SerializeToFile and utils.DeserializeFromFile now handle generic interface{} objects, instead of values implementing Serializer and Deserializer.
      • mat32 and mat64 custom serialization functions (e.g. MarshalBinarySlice, MarshalBinaryTo, ...) are replaced by implementations of BinaryMarshaler and BinaryUnmarshaler interfaces on Dense and Sparse matrix types.
      • PositionalEncoder.Cache and AxialPositionalEncoder.Cache fields (from package) are now public.
      • All types implementing nn.Model interface are registered for gob serialization (in init functions).
      • embeddings.Model.UsedEmbeddings type is now nlp.embeddings.syncmap.Map.
      • As a consequence, you will have to re-serialize all your models.
    • Flair converter now sets the vocabulary directly in the model, instead of creating a separate file.
    • sequencelabeler.Model.LoadParams has been renamed to Load.


    • In relation to the aforementioned gob serialization changes:
      • nn.ParamSerializer and related functions
      • nn.ParamsSerializer and related functions
      • utils.Serializer and utils.Deserializer interfaces
      • utils.ReadFull function
    • sequencelabeler.Model.LoadVocabulary


    • docker-entrypoint sub-command hugging-face-importer has been renamed to huggingface-importer, just like the main command itself.
    • docker-entrypoint sub-command can be correctly specified without leading ./ or / when run from a Docker container.
    • BREAKING: mat32.Matrix serialization has been fixed, now serializing single values to chunks of 4 bytes (instead of 8, like float64). Serialized 32-bit models will now be half the size! Unfortunately you will have to re-serialize your models (sorry!).
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jan 10, 2021)


    • Static analysis job (golint and gocyclo) to Go GitHub workflow.
    • You can set a limit for concurrent heavyweight Graph computations (e.g. forward and backward steps) - see (GraphOption) and If no option is specified, by default the limit is set to runtime.NumCPU().
    • You can set a limit for concurrent heavyweight computations of (e.g. params update step).
    • New package utils.processingqueue.
    • mat32 package, which operates on float32 data type.
    • It's possible to switch between float32 and float64 as default floating-point data type, using the script
    • Go GitHub workflow has been adapted to run tests using both float32 and float64 as main floating-point data type.
    • This CHANGELOG file.
    • Pull and convert Hugging Face models automatically if not found locally when starting BERT or BART server.
    • Move content from GitHub Wiki to README in related package folders.


    • (GraphOption) expects the maximum number of concurrent computations handled by heavyweight Graph operations (e.g. forward and backward steps).
    • ml.nn.linear.Model and ml.nn.convolution.Model read the concurrent computations limit set on the model's Graph, thus SetConcurrentComputations() methods have been removed.
    • mat has been renamed to mat64 and some functions have been renamed.
    • The whole project now works with float32 floating-point data type by default, by using the package mat32.
    • When imported, the new package mat32 is always aliased as mat. Then, explicit usages of float64 type have been replaced with mat.Float. Moreover, bitsize-specific functions have been made more generic (i.e. operating with mat.Float type) or split into separate implementation, in mat32 and mat64. In this way, switching the whole project between float32 and float64 is just a matter of changing all imports, from mat32 to mat64, or vice-versa (see also the new file
    • Update internal links to pre-trained NER models to float32 versions.
    • nlp.sequencelabeler.Convert() now loads and converts original Flair models, instead of pre-processed dumps.
    • Change command line arguments to make them more consistent; please refer to the help messages of each command.
    • Update Dockerfile using a new base building image and adding bart server.


    • Added dedicated package names to different protocol buffers definition files to avoid name conflicts.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Dec 30, 2020)

    Notable changes:

    • Added support for BART model (tested on Natural Language Inference task);
    • Added BART API to perform Zero-Shot Text Classification;
    • Significant reduction of boilerplate code through the unification of nn.Model and nn.Processor interfaces:
      • there is now a single nn.Model interface that can be reified to become a neural processor. See nn.Reify();
      • there was no compelling reason to have a Forward method in the nn.Model interface so it has been removed, gracefully increasing flexibility in the implementation of a model.
    Source code(tar.gz)
    Source code(zip)
NLP Odyssey
NLP Odyssey
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.8k Dec 27, 2022
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

AUNUM 351 Nov 20, 2022
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Conner DiPaolo 1.4k Jan 5, 2023
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.9k Dec 30, 2022
Machine Learning for Go

GoLearn GoLearn is a 'batteries included' machine learning library for Go. Simplicity, paired with customisability, is the goal. We are in active deve

Stephen Whitworth 8.7k Jan 3, 2023
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "" Package ml provides some implementations of usefull machine learnin

Alonso Vidales 196 Nov 10, 2022
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

WeBankFinTech 392 Dec 28, 2022
Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Nikolay Dubina 25 Dec 30, 2022
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 3.5k Jan 9, 2023
Katib is a Kubernetes-native project for automated machine learning (AutoML).

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architec

Kubeflow 1.3k Jan 2, 2023
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

null 82 Dec 14, 2022
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Raven 2 Oct 31, 2022
Bigmachine is a library for self-managing serverless computing in Go

Bigmachine Bigmachine is a toolkit for building self-managing serverless applications in Go. Bigmachine provides an API that lets a driver process for

GRAIL 180 Nov 15, 2022
Fast, simple sklearn-like feature processing for Go

go-featureprocessing Fast, simple sklearn-like feature processing for Go Does not cross cgo boundary No memory allocation No reflection Convenient ser

Nikolay Dubina 89 Dec 2, 2022 is an open source, portable runtime for training and using deep learning on time series data. is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY is 774 Dec 15, 2022
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

AUNUM 306 Dec 11, 2022
FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

Vadim Kulibaba 1 May 30, 2022
A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

null 2 Oct 3, 2021
Bcfm-study-case - A simple http server using the Echo library in Go language

Task 1 Hakkında Burada Go dilinde Echo kütüphanesini kullanarak basit bir http s

Caner Gülay 0 Feb 2, 2022