Machine Learning for Go

Related tags



GoDoc Build Status
Code Coverage

Support via Gittip

GoLearn is a 'batteries included' machine learning library for Go. Simplicity, paired with customisability, is the goal. We are in active development, and would love comments from users out in the wild. Drop us a line on Twitter.

twitter: @golearn_ml


See here for installation instructions.

Getting Started

Data are loaded in as Instances. You can then perform matrix like operations on them, and pass them to estimators. GoLearn implements the scikit-learn interface of Fit/Predict, so you can easily swap out estimators for trial and error. GoLearn also includes helper functions for data, like cross validation, and train and test splitting.

package main

import (


func main() {
	// Load in a dataset, with headers. Header attributes will be stored.
	// Think of instances as a Data Frame structure in R or Pandas.
	// You can also create instances from scratch.
	rawData, err := base.ParseCSVToInstances("datasets/iris.csv", false)
	if err != nil {

	// Print a pleasant summary of your data.

	//Initialises a new KNN classifier
	cls := knn.NewKnnClassifier("euclidean", "linear", 2)

	//Do a training-test split
	trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50)

	//Calculates the Euclidean distance and returns the most popular label
	predictions, err := cls.Predict(testData)
	if err != nil {

	// Prints precision/recall metrics
	confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions)
	if err != nil {
		panic(fmt.Sprintf("Unable to get confusion matrix: %s", err.Error()))
Iris-virginica	28	2	  56	0.9333	0.9333  0.9333
Iris-setosa	    29	0	  59	1.0000  1.0000	1.0000
Iris-versicolor	27	2	  57	0.9310	0.9310  0.9310
Overall accuracy: 0.9545


GoLearn comes with practical examples. Dive in and see what is going on.

cd $GOPATH/src/
go run knnclassifier_iris.go
cd $GOPATH/src/
go run instances.go
cd $GOPATH/src/
go run trees.go


Join the team

Please send me a mail at [email protected]

  • Logistic Regression

    Logistic Regression

    Logistic Regression based on liblinear. More features (SVC, SVR) are available soon

    opened by npbool 46
  • Adds Attributes and Instances

    Adds Attributes and Instances

    A possible solution for issue #24.

    • Adds an Attribute interface and the CategoricalAttribute and FloatAttribute implementations ** CategoricalAttribute is for discrete string values ** FloatAttribute is for numeric values
    • Instances Combines an Attributes slice and storage for a self-contained dataset representation.
    • KNNClassifier has been modified to use Attribute and Instance types ** See the test file for use of the API
    • CSV handling needed to be moved back into base due to a circular dependency ** Also adds the datasets used to test CSV handling
    • Doesn't get rid of skelterjohn/go.matrix dependency yet because the SwapRows() method has no equivalent in the new library.
    opened by Sentimentron 37
  • Implement neural networks

    Implement neural networks

    Not sure if we want to stick with plain vanilla neural nets, or if we want to do deep learning. Do you have any experience with this @macmania ?

    opened by sjwhitworth 22
  • Problems with EDF and DenseInstances when using large datasets

    Problems with EDF and DenseInstances when using large datasets

    Trying to benchmark the k-NN Classifier with a large dataset with a large number of features. I'm using this in particular:

    Had to do a fair bit of yak shaving to get edf to work:

    • The math here might be off? When the numbers are small there aren't likely to be problems, but when rowsPerPage is about 3.96 and you have 108,000 rows, the lack-of-rounding error compounds and you end up not asking for enough pages. I think rowsPerPage needs to be math.Floored.
    • extend in alloc.go doesn't appear to consider edfAnonMode. I get a panic on line 21 because e.f is nil. I extracted a fileSize variable which I set to os.Getpagesize() in case we're in edfAnonMode. Few issues here: (a) f isn't a great name for a struct field, (b) my solution was hacky, not sure where else various modes aren't being considered, but also don't want to riddle the package with switches on mode, (c) not sure why os.Getpagesize() is the correct value, I copied it from map.go, but this value should probably be extracted as a constant somewhere.
    • The purpose of the startBlock == 0 guard in AllocPages isn't clear, but I found I needed to move e.extend(pagesRequested) up before the if block.
    • I tried to the String function in fixed.go blowing up when I tried to use it for some debugging output. It assumes f.alloc has non-zero length, and that f.alloc[0] has length at least 61. Got index out of range panics. Not sure whether these assumptions are supposed to hold and weren't for some other reason that's broken, so I just worked around it by removing anything related to alloc from the Sprintf interpolation arguments, and so I got rid of the if altogether. The source of the 61 number wasn't clear.

    I didn't want to submit a PR with these fixes because (a) the tests don't give me a lot of confidence that the changes haven't broken something else, (b) lot of my changes felt hacky, and feels like the right solution would involve heavier refactoring.

    Couple big-picture questions about EDF:

    • What's its origin story? What problem is it trying to solve? My vague guess is to give some sort of common interface for the internal data structures so that one can have an in-memory backing or a file-system backing, presumably to facilitate something like an HDFS-backed distributed computing scenario? Why else would file-system backing be desirable?
    • Why is it in the golearn repo? Seems like it's its own project.
    • It's incredibly sloooow. My dataset is 108,000 rows, 128 float attributes, and a single label class which can take on one of about 1000 labels. I randomly split the dataset roughly 60/40 into a training and validation set, and saved off those CSVs ahead of time (so I'm not using the testTrainSplit function). For each of those files, the ParseCSVToInstances takes about 30s, never mind actually doing the kNN classification (literally on the order of a few years, made some tweaks but still on the order of an hour). For comparison I have code that parses both CSVs and does the kNN-classification in about 30s total. Hard to pinpoint where all the slowness is coming from, but seems scary.
    opened by amitkgupta 16
  • Random Forest: Vastly different results to scikit learn

    Random Forest: Vastly different results to scikit learn

    When training a random forest, I see vastly different, and poorer results when using golearn, than when using Python's scikit-learn. Unfortunately the dataset is confidential, so I can't share it here online. However, I'm using the same train/test split, and have ensured that data is represented the same way (they're all floats in both).

    When using scikit learn

    Auc: 0.943867958106
    Confusion Matrix
    [[35878  1876]
     [ 5402 16388]]
                 precision    recall  f1-score   support
              0       0.87      0.95      0.91     37754
              1       0.90      0.75      0.82     21790
    avg / total       0.88      0.88      0.88     59544

    When using golearn, with the same number of estimators..

    Reference Class True Positives  False Positives True Negatives  Precision   Recall  F1 Score
    --------------- --------------  --------------- --------------  ---------   ------  --------
    1.00        4199        1366        15858       0.7545      0.4263  0.5448
    0.00        15858       5651        4199        0.7373      0.9207  0.8188
    Overall accuracy: 0.7408

    As you can see, there's a big drop in precision and recall, on both outcomes. Any ideas as to what could be the problem @Sentimentron ?

    opened by sjwhitworth 16
  • Unable to allocate memory running `TestRandomForest1` in a loop

    Unable to allocate memory running `TestRandomForest1` in a loop

    If I wrap the code inside TestRandomForest1 inside a 10-iteration for loop, I get the following panic:

    panic: cannot allocate memory

    I'm running this on an m3.2xlarge EC2 instance (Linux 3.13.0-32-generic x86_64). The first 8 or so iterations are likely to succeed, the problem almost always occurs when iterating 10 or more times.

    Other things to note: If I increase the forest size to something like 50, then 2-3 iterations suffice to cause the panic. Also, if I remove the call to rf.Predict(testData) (and all subsequent code depending on the result of that Predict), then the panics do not occur.

    Here's the full output, starting from the panic (not including the output of the first few iterations that succeed):

    panic: cannot allocate memory
    goroutine 184 [running]:
    runtime.panic(0x5851e0, 0xc)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/panic.c:279 +0xf5
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x67, 0xc2083f6b00, 0x0, 0x0)
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0xa3*DecisionTreeNode).Predict(0xc2082189b0, 0x7fd6cf13a9f8, 0xc2083f6b00, 0x0, 0x0)
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0xad*ID3DecisionTree).Predict(0xc20807bc40, 0x7fd6cf13a9f8, 0xc2083f6b00, 0x0, 0x0)
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x51·004()
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x140
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 16 [chan receive]:
    testing.RunTests(0x601658, 0x69c010, 0x1, 0x1, 0x575401)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:505 +0x923
    testing.Main(0x601658, 0x69c010, 0x1, 0x1, 0x6a8700, 0x0, 0x0, 0x6a8700, 0x0, 0x0)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:435 +0x84
    main.main() +0x9c
    goroutine 19 [finalizer wait]:
    runtime.park(0x413090, 0x6a4ef8, 0x6a3a09)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/proc.c:1369 +0x89
    runtime.parkunlock(0x6a4ef8, 0x6a3a09)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/proc.c:1385 +0x3b
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/mgc0.c:2644 +0xcf
    goroutine 20 [runnable]:*BaggedModel).Predict(0xc208818380, 0x7fd6cf13a9f8, 0xc208818340, 0x0, 0x0)
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x2cf*RandomForest).Predict(0xc20807bc20, 0x7fd6cf13a9f8, 0xc208818340, 0x0, 0x0)
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x51
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x1a3
    testing.tRunner(0xc208050090, 0x69c010)
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:422 +0x8b
    created by testing.RunTests
        /home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:504 +0x8db
    goroutine 191 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 190 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 189 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 188 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 187 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 186 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 185 [runnable]:·004()
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x25b
    goroutine 183 [chan receive]:·003()
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x8c
    created by*BaggedModel).Predict
        /home/ubuntu/.gvm/pkgsets/go1.3/global/src/ +0x175
    exit status 2

    This of course makes it hard to write benchmarking tests which expect to be able to execute the code multiple times in order to time it.

    opened by Amit-PivotalLabs 14
  • MMap implementation error

    MMap implementation error

    When installing everything from scratch on my Raspberry Pi, I get this error when trying to run the tests.

    ../riobard/go-mmap/mmap_linux.go:8: undefined: syscall.MAP_32BIT
    ../riobard/go-mmap/mmap_linux.go:8: const initializer (<T>)(syscall.MAP_32BIT) is not a constant
    ../riobard/go-mmap/mmap_linux.go:17: undefined: syscall.MAP_STACK
    ../riobard/go-mmap/mmap_linux.go:17: const initializer (<T>)(syscall.MAP_STACK) is not a constant
    ../riobard/go-mmap/mmap_linux.go:18: undefined: syscall.MAP_HUGETLB
    ../riobard/go-mmap/mmap_linux.go:18: const initializer (<T>)(syscall.MAP_HUGETLB) is not a constant

    Any ideas @Sentimentron?

    opened by sjwhitworth 14
  • Equal-width and Chi-merge discretisation

    Equal-width and Chi-merge discretisation

    This patch implements equal-width histogram binning and Chi-merge discretisation for converting FloatAttributes attributes into CategoricalAttributes for use with ID3 decision trees, naive Bayes etc.

    opened by Sentimentron 13
  • ID3 and random decision trees and random forests

    ID3 and random decision trees and random forests

    This patch

    • Adds the meta sub-package which implements bagging
    • Adds RandomForest in the ensemble sub-package
    • Adds ID3DecisionTree and RandomTree to the trees sub-package

    ID3DecisionTree performs similarly to the reference, RandomForest performance is competitive with (but a little lower than) the equivalent WEKA classifier.

    opened by Sentimentron 12
  • Metrics


    Haven't finished yet, just open for viewing difference between branches.

    • [x] Implement polynomial kernel
    • [x] Implement euclidean distance (2 norm)
    • [x] Implement L1 norm
    • [x] Implement RBF kernel


    Some more TODOs:

    • [ ] Extend pairwise metrics to accept matrix input
    • [x] Maybe integrate all kernels into one file if the code isn't too big
    • [x] Doc the kernel formulas, so others can know which kernel we are using.
    opened by lazywei 11
  • Close flie

    Close flie

    opened by mattn 0
  • Example about how to query the model after being trained using KNN ?

    Example about how to query the model after being trained using KNN ?

    Hi! Please cold you give an example about how to query this model after being trained ? like when the testData is pased here: cls.Predict(testData) it is possible do to something like this :

    csv example: Sepal length, Sepal width,Petal length, Petal width, Species 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 5.9,3.0,4.2,1.5,Iris-versicolor 6.0,2.2,4.0,1.0,Iris-versicolor 6.1,2.9,4.7,1.4,Iris-versicolor

    input := '5.1,3.5,1.4,0.2' predictions, err := cls.Predict(input) if err != nil { panic(err) } and print/get the species type ?

    I managed to this in python but and quite lost about how to do it in Go

    Cheers! Chris.

    opened by ch-rigu 1
  • Merge pull request #1 from sjwhitworth/master

    Merge pull request #1 from sjwhitworth/master

    latest chnages

    opened by sandeepmishraSSR 0
  • Arch linux support / installation?

    Arch linux support / installation?


    I noticed that the golearn Wiki currently only lists Ubuntu/OpenSUSE as being supported linux platforms.

    Is this still the case?

    I started following the installation instructions, just to see if it would work, but I ran into issues related to the handling of GOROOT / GOPATH..

    Whereas golearn suggests setting GOROOT=$HOME/go and $GOPATH=$HOME/go/bin, the arch wiki suggests that GOPATH should be ~/go instead.

    When I attempted to follow the conventions from the golearn install docs, when attempting to call go get, I run into the following issue:

    go tool: no such tool "compile"

    (compile lives /usr/lib/go/pkg/tool/linux_amd64/ on arch linux, where $GOROOT is typically set to /usr/lib/go...)

    When attempting to use the Arch conventions, the go get command succeeds, but the source code for golearn doesn't appear to be pulled, so the $GOPATH/src/ path does not exist.

    It seems likely that I'm simply misunderstanding something about how GOPATH and GOROOT work, and how they interact with go get..

    Has anyone had luck working with golearn on arch linux?

    Any advice would be greatly appreciated.

    opened by khughitt 2
  • Any replacement for model saving and loading part for golearn.

    Any replacement for model saving and loading part for golearn.


    I'm trying to do some research for model serving options. can you help with any resources for the same.

    Thanks Shrinidhi

    opened by shrinidhisuresha 2
  • Integrating DataFrame-go with goLearn

    Integrating DataFrame-go with goLearn

    There is a golang version of pandas (biggest data processing library in python) being developed here:

    This library allows a much easier pipeline for data handling. Using this, it would be much easier to do data cleaning and feature engineering inside golang, before using golearn (since Golearn Fixed Data Grid isn't designed for handling so much data processing).

    Since Fixed Data Grid is already very deeply integrated into golearn, it would not be feasible to change everything to support dataframe-go. Instead, could we build a function that converts the dataframe-go object into a golearn Fixed Data Grid? That way, the two libraries would be easily integrated, with minimal changes.

    opened by Yushgoel 2
  • Error resolving Attribute CategoricalAttribute

    Error resolving Attribute CategoricalAttribute


    I'm trying to use Random Forest to classify item's category (either Drink or Food) based on item's name. But I got this error Error resolving Attribute CategoricalAttribute

    Here is my example:

    Please help me to resolve it. Thanks

    opened by thaitanloi365 1
  • linear_models: fix cgo issues, upgrade to liblinear 2.14

    linear_models: fix cgo issues, upgrade to liblinear 2.14

    Requires an additional step to install:

    • cd /tmp &&
    • wget
    • tar xvf v241.tar.gz
    • cd liblinear-241
    • make lib
    • sudo install -vm644 linear.h /usr/include
    • sudo install -vm755 /usr/lib
    • sudo ln -sfv /usr/lib/
    opened by Sentimentron 0
  • How to construct custom dataset with csv files?

    How to construct custom dataset with csv files?

    I have read this doc, Something confused me, when I use base.ParseCSVToInstances read csv files how can I tell the api which colume is feature or label. Because not all csv file ended colume with label. Please show more examples please, Thanks a million @AlekSi @defcube @linkerlin

    opened by Alucardmini 0
  • cannot find package

    cannot find package ""

    When running the example from IRIS go run main.go I get the follow error

            c:\go\src\\gonum\matrix (from $GOROOT)
            ~\go\src\\gonum\matrix (from $GOPATH)

    I think this is because gonum has archived the matrix function and it's repository is in I can do a pull request if needed to update it.

    opened by garyhlusko 0
Stephen Whitworth
Stephen Whitworth
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.6k Jul 21, 2021
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.1k Jul 27, 2021
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.1k Jul 19, 2021
Bigmachine is a library for self-managing serverless computing in Go

Bigmachine Bigmachine is a toolkit for building self-managing serverless applications in Go. Bigmachine provides an API that lets a driver process for

GRAIL 170 Jun 18, 2021
Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Nikolay Dubina 15 Jun 17, 2021
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Conner DiPaolo 1.2k Jul 20, 2021
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 70 Jul 27, 2021
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

AUNUM 228 Jul 21, 2021
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "" Package ml provides some implementations of usefull machine learnin

Alonso Vidales 191 Apr 17, 2021
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Ryan Bressler 687 Jul 17, 2021
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

AUNUM 282 Jul 10, 2021
Generative Adversarial Network in Go via Gorgonia

Generative adversarial networks Recipe for simple GAN in Golang ecosystem via Gorgonia library Table of Contents About Why Instruments Usage Code expl

Dimitrii Lopanov 59 Jul 19, 2021
A Go idiomatic binding to the C++ core of PyTorch

GoTorch GoTorch reimplements PyTorch high-level APIs, including modules and functionals, in idiomatic Go. Thus enables deep learning programming in Go

Yi Wang 53 Jul 17, 2021
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Volcano 1.8k Jul 27, 2021