Random Forest implementation in golang

Overview

GoDoc: https://godoc.org/github.com/malaschitz/randomForest

Test:

go test ./... -cover -coverpkg=.  

randomForest

Random Forest implementation in golang.

Simple Random Forest

	xData := [][]float64{}
	yData := []int{}
	for i := 0; i < 1000; i++ {
		x := []float64{rand.Float64(), rand.Float64(), rand.Float64(), rand.Float64()}
		y := int(x[0] + x[1] + x[2] + x[3])
		xData = append(xData, x)
		yData = append(yData, y)
	}
	forest := randomForest.Forest{}		
	forest.Data = randomforest.ForestData{X: xData, Class: yData}
	forest.Train(1000)
	//test
	fmt.Println("Vote", forest.Vote([]float64{0.1, 0.1, 0.1, 0.1})) 
	fmt.Println("Vote", forest.Vote([]float64{0.9, 0.9, 0.9, 0.9}))

Extremely Randomized Trees

	forest.TrainX(1000)	

Deep Forest

Deep forest inspired by https://arxiv.org/abs/1705.07366

    dForest := forest.BuildDeepForest()
    dForest.Train(20, 100, 1000) //20 small forest with 100 trees help to build deep forest with 1000 trees

Continuos Random Forest

Continuos Random Forest for data where are still new and new data (forex, wheather, user logs, ...). New data create a new trees and oldest trees are removed.

forest := randomForest.Forest{}
data := []float64{rand.Float64(), rand.Float64()}
res := 1; //result
forest.AddDataRow(data, res, 1000, 10, 2000) 
// AddDataRow : add new row, trim oldest row if there is more than 1000 rows, calculate a new 10 trees, but remove oldest trees if there is more than 2000 trees.

Boruta Algorithm for feature selection

Boruta algorithm was developed as package for language R. It is one of most effective feature selection algorithm. There is paper in Journal of Statistical Software.

Boruta algorithm use random forest for selection important features.

	xData := ... //data
	yData := ... //labels
	selectedFeatures := randomforest.BorutaDefault(xData, yData)
	// or randomforest.BorutaDefault(xData, yData, 100, 20, 0.05, true, true)

In /examples is example with MNIST database. On picture are selected features (495 from 784) from images.

boruta 05

Issues
  • got an stack overflower when train a randomforest

    got an stack overflower when train a randomforest

    I extracted the features from real samples and used randomforest. Borutadefault training to get the selected features, but when creating the randomforest, the program crashed because of stack overflow. I saw that the computer memory was used up. The following is the data I use and the label of the data。 data.txt

    package main
    
    import (
    	"bufio"
    	"fmt"
    	"io"
    	"math/rand"
    	"os"
    	"strings"
    
    	randomforest "github.com/malaschitz/randomForest"
    )
    
    func FileLines(fileName string) ([]string, error) {
    	fi, err := os.Open(fileName)
    	if err != nil {
    		return nil, err
    	}
    	defer fi.Close()
    	ret := make([]string, 0, 1000)
    	br := bufio.NewReader(fi)
    	for {
    		a, _, c := br.ReadLine()
    		if c == io.EOF {
    			break
    		}
    		ret = append(ret, string(a))
    	}
    	return ret, nil
    }
    
    func main() {
    	X := make([][]float64, 0)
    	Y := make([]int, 0)
    
    	// get data: the last number of each line indicates the category,
    	// and the others indicate whether it has characteristics
    	lines, _ := FileLines(`data.txt`)
    	for _, line := range lines {
    		if strings.HasSuffix(line, "1") {
    			Y = append(Y, 1)
    		} else {
    			Y = append(Y, 0)
    		}
    		numbers := strings.Split(line, ",")
    		numbers = numbers[:len(numbers)-1]
    		x := make([]float64, 0, len(numbers))
    		for _, n := range numbers {
    			if n == "1" {
    				x = append(x, 0.9)
    			} else {
    				x = append(x, 0.1)
    			}
    		}
    		X = append(X, x)
    	}
    	//
    	max := len(X)
    	for k := 0; k < 3; k++ {
    		for i := 0; i < max; i++ {
    			pos := rand.Intn(max)
    			tmpx := X[i]
    			X[i] = X[pos]
    			X[pos] = tmpx
    			tmpy := Y[i]
    			Y[i] = Y[pos]
    			Y[pos] = tmpy
    		}
    	}
    
    	forest := randomforest.Forest{}
    	forestData := randomforest.ForestData{X: X, Class: Y}
    	forest.Data = forestData
    	forest.Train(100)
    	N := 100
    	s := 0
    	sw := 0
    	for i := 0; i < N; i++ {
    		vote := forest.Vote(X[i])
    		bestV := 0.0
    		bestI := -1
    		for j, v := range vote {
    			if v > bestV {
    				bestV = v
    				bestI = j
    			}
    		}
    		if bestI == Y[i] {
    			s++
    		}
    		//
    		vote = forest.WeightVote(X[i])
    		bestV = 0.0
    		bestI = -1
    		for j, v := range vote {
    			if v > bestV {
    				bestV = v
    				bestI = j
    			}
    		}
    		if bestI == Y[i] {
    			sw++
    		}
    
    	}
    	fmt.Println("try", N, "times")
    	fmt.Printf("Correct:        %5.2f %%\n", float64(s)*100/float64(N))
    	fmt.Printf("Weight Correct: %5.2f %%\n", float64(sw)*100/float64(N))
    	forest.PrintFeatureImportance()
    }
    
    
    opened by 751620780 2
  • Limit number of goroutines during training to prevent crashes

    Limit number of goroutines during training to prevent crashes

    This adds a limit on the number of concurrent goroutines spawned during training of forests. It defaults to the number of logical cores available. Without this, training big forests can crash.

    opened by gnawybol 1
Releases(v1.1)
Golang k-d tree implementation with duplicate coordinate support

Golang k-d tree implementation with duplicate coordinate support

DownFlux 44 Jan 9, 2022
k-modes and k-prototypes clustering algorithms implementation in Go

go-cluster GO implementation of clustering algorithms: k-modes and k-prototypes. K-modes algorithm is very similar to well-known clustering algorithm

e-Xpert Solutions 29 Sep 8, 2021
A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

Charles Iliya Krempeaux 179 Oct 15, 2021
An implementation of Neural Turing Machines

Neural Turing Machines Package ntm implements the Neural Turing Machine architecture as described in A.Graves, G. Wayne, and I. Danihelka. arXiv prepr

Fumin 396 Sep 4, 2021
Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Song2 Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Masaya Watanabe 44 Nov 12, 2021
Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication.

Phil Wang 168 Jan 11, 2022
A high performance go implementation of Wappalyzer Technology Detection Library

wappalyzergo A high performance port of the Wappalyzer Technology Detection Library to Go. Inspired by https://github.com/rverton/webanalyze. Features

ProjectDiscovery 204 Jan 17, 2022
Go implementation of the yolo v3 object detection system

Go YOLO V3 This repository provides a plug and play implementation of the Yolo V3 object detection system in Go, leveraging gocv. Prerequisites Since

Wim Spaargaren 47 Dec 7, 2021
k-means clustering algorithm implementation written in Go

kmeans k-means clustering algorithm implementation written in Go What It Does k-means clustering partitions a multi-dimensional data set into k cluste

Christian Muehlhaeuser 369 Dec 15, 2021
Naive Bayesian Classification for Golang.

Naive Bayesian Classification Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports ter

Jake Brukhman 721 Jan 2, 2022
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Ryan Bressler 699 Dec 29, 2021
Genetic Algorithms library written in Go / golang

Description Genetic Algorithms for Go/Golang Install $ go install git://github.com/thoj/go-galib.git Compiling examples: $ git clone git://github.com

Thomas Jager 187 Jan 5, 2022
Golang Genetic Algorithm

goga Golang implementation of a genetic algorithm. See ./examples for info on how to use the library. Overview Goga is a genetic algorithm solution wr

null 147 Jan 10, 2022
Golang Neural Network

Varis Neural Networks with GO About Package Some time ago I decided to learn Go language and neural networks. So it's my variation of Neural Networks

Artem Filippov 41 Jan 11, 2022
Golang HTML to PDF Converter

Golang HTML to PDF Converter For reading any document, one prefers PDF format over any other formats as it is considered as a standard format for any

MindInventory 207 Jan 14, 2022
A high-performance timeline tracing library for Golang, used by TiDB

Minitrace-Go A high-performance, ergonomic timeline tracing library for Golang. Basic Usage package main import ( "context" "fmt" "strcon

TiKV Project 42 Dec 21, 2021
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

null 1.9k Jan 12, 2022
Another AOC repo (this time in golang!)

advent-of-code Now with 100% more golang! (It's going to be a long advent of code...) To run: Get your data for a given year/day and copy paste it to

Jon Schwartz 0 Dec 14, 2021
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Raven 1 Dec 2, 2021