Naive Bayesian Classification for Golang.

Related tags

Mathematics bayesian
Overview

Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.


Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.


Installation

Using the go command:

go get github.com/navossoc/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.


Features

  • Conditional probability and "log-likelihood"-like scoring.
  • Underflow detection.
  • Simple persistence of classifiers.
  • Statistics.
  • TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/navossoc/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

Issues
  • remove extra float64 slice

    remove extra float64 slice

    using simple benchmark over ProbScores:

    func BenchmarkProbScores(b *testing.B) {
        c := NewClassifier(Good, Bad)
        c.Learn([]string{"tall", "handsome", "rich"}, Good)
    
        for n := 0; n < b.N; n++ {
            c.ProbScores([]string{"the", "tall", "man"})
        }
    }
    

    old code: BenchmarkProbScores-4 5000000 271 ns/op 32 B/op 2 allocs/op

    new code: BenchmarkProbScores-4 10000000 199 ns/op 16 B/op 1 allocs/op

    of course this will be more obvious with more classes in the classifier

    PS: because it makes the code less obvious, I am not sure it is worth it to be merged, I just needed the 1 less allocation.

    opened by jackdoe 7
  • Add classes after classifier creation

    Add classes after classifier creation

    Apologies in advance as my knowledge of Go is still somewhat limited, so this may be a naive question.

    I want to expose the naive bayes classifying as a HTTP Web service, with both train and classify endpoints. I have no trouble with that, but I want the train endpoint to be able to accept new labels (labels that aren't currently in the classifier). Right now the labels are simply specified as consts and passed into the constructor. Can you think of the best way to add the ability to add labels at run-time?

    opened by JakeAustwick 7
  • Release 1.0 is really old - make a new release

    Release 1.0 is really old - make a new release

    1.0 is still importing things like "gob" instead of "encoding/gob" etc. Can you make a new release? I can also help co-maintain the project is that helps.

    Tools like dep will pick up release versions for most people and they will get code that won't work for newer versions of go.

    Thanks!

    opened by urjitbhatia 5
  • Panic if underflow is detected in `SafeProbScores`

    Panic if underflow is detected in `SafeProbScores`

    SafeProbScores ... If an underflow is detected, this method panics

    Source

    I am getting a bit confused by the comment in the method above according to the doc this method is suppose to panic but the code instead returns an error.

    Am I missing something ?

    opened by yml 3
  • Changed SafeProbScores to return an error instead of a panic

    Changed SafeProbScores to return an error instead of a panic

    I know this is not a backwards compatible change, but this is a mathematical error and not really a runtime error, so: a) a panic causes functions to unwind outside of this package, which is not good for long running applications and b) there's little need to fill the log with stack traces given this is a known and reasonably common outcome.

    PS. Great package, really useful, thanks!

    opened by mish15 2
  • Fix function comments based on best practices from Effective Go

    Fix function comments based on best practices from Effective Go

    Every exported function in a program should have a doc comment. The first sentence should be a summary that starts with the name being declared. From effective go.

    I generated this with CodeLingo and I'm keen to get some feedback, but this is automated so feel free to close it and just say opt out to opt out of future CodeLingo outreach PRs.

    opened by BlakeMScurr 1
  • fix data race issue for Classifier.seen

    fix data race issue for Classifier.seen

    When running the LogScores method in a highly concurrent situation, I noticed that Go's data race detector would complain about a data race regarding Classifier.seen. So that's why I changed any read and write operation to that particular member of the struct to only use atomic load and increment functions.

    opened by akrennmair 1
  • add Observe method to support externally learned word frequencies

    add Observe method to support externally learned word frequencies

    External methods to learn word frequencies might be things like distributed word-count in spark. For online classification, however, it might still be desirable to use go.

    opened by sweigert 1
  • Use CodeLingo to Address Further Issues

    Use CodeLingo to Address Further Issues

    Hi @jbrukh!

    Thanks for merging the fixes from our earlier pull request. They were generated by CodeLingo which we've used to find a further 30 issues in the repo. This PR adds a set of CodeLingo Tenets which catch any new cases of the found issues in PRs to your repo.

    CodeLingo will also send follow-up PRs to fix the existing repos in the codebase. Install CodeLingo GitHub app after merging this PR. It will always be free for open source.

    We're most interested to see if we can help with project specific bugs. Tell us about more interesting issues and we'll see if our tech can help - free of charge.

    Thanks, Blake and the CodeLingo Team

    opened by CodeLingoTeam 0
  • Modernized go fmt, lint etc fixes. Simple code cleanup

    Modernized go fmt, lint etc fixes. Simple code cleanup

    @jbrukh hey, I did some simple code cleanup here

    • Moved the package doc to doc.go
    • Fixed some range syntax
    • Added some function docs
    • Other minor changes recommended according to golint
    opened by urjitbhatia 0
  • added WordsByClass to help with debugging learned classifiers

    added WordsByClass to help with debugging learned classifiers

    In order to assess the quality of a training set I found it useful to know which words are most prominent in a given class. Of course this could be done by a separate wordcount as well but the classifier did that already - so why not use it.

    opened by sweigert 0
  • JSON serialization

    JSON serialization

    Hi. My edits:

    • JSON serialization as an option (gob by default, so it have full backward compatibility). Also JSON have around 25% less file size
    • Minor codestyle fixes - error check in defer, spread operator in test
    • Go mod file
    2020/07/24 20:32:04 gob [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 gob size 816
    2020/07/24 20:32:04 json [One Two Three Four Five Six Seven Eight Nine Ten]
    2020/07/24 20:32:04 json size 611
    
    package main
    
    import (
    	"github.com/jbrukh/bayesian"
    	"log"
    	"os"
    	"path"
    )
    
    func write(ser bayesian.Serializer) {
    	const (
    		One   bayesian.Class = "One"
    		Two   bayesian.Class = "Two"
    		Three bayesian.Class = "Three"
    		Four  bayesian.Class = "Four"
    		Five  bayesian.Class = "Five"
    		Six   bayesian.Class = "Six"
    		Seven bayesian.Class = "Seven"
    		Eight bayesian.Class = "Eight"
    		Nine  bayesian.Class = "Nine"
    		Ten   bayesian.Class = "Ten"
    	)
    
    	classifier := bayesian.NewClassifier(One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten)
    	oneStuff := []string{"lorem", "ipsum", "dolor"}
    	twoStuff := []string{"sit", "amet", "consectetur"}
    	threeStuff := []string{"adipiscing", "elit", "sed"}
    	fourStuff := []string{"do", "eiusmod", "tempor"}
    	fiveStuff := []string{"incididunt", "ut", "labore"}
    	sixStuff := []string{"et", "dolore", "magna"}
    	sevenStuff := []string{"aliqua", "ut", "enim"}
    	eightStuff := []string{"ad", "minim", "veniam"}
    	nineStuff := []string{"quis", "nostrud", "exercitation"}
    	tenStuff := []string{"ullamco", "laboris", "nisi"}
    
    	classifier.Learn(oneStuff, One)
    	classifier.Learn(twoStuff, Two)
    	classifier.Learn(threeStuff, Three)
    	classifier.Learn(fourStuff, Four)
    	classifier.Learn(fiveStuff, Five)
    	classifier.Learn(sixStuff, Six)
    	classifier.Learn(sevenStuff, Seven)
    	classifier.Learn(eightStuff, Eight)
    	classifier.Learn(nineStuff, Nine)
    	classifier.Learn(tenStuff, Ten)
    
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	err = classifier.WriteToFile(path.Join(wd, "out_"+string(ser)), ser)
    	if err != nil {
    		panic(err)
    	}
    }
    
    func read(ser bayesian.Serializer) {
    	wd, err := os.Getwd()
    	if err != nil {
    		panic(err)
    	}
    
    	file := path.Join(wd, "out_"+string(ser))
    
    	classifier, err := bayesian.NewClassifierFromFile(file, ser)
    	if err != nil {
    		panic(err)
    	}
    
    	f, err := os.Open(file)
    	if err != nil {
    		panic(err)
    	}
    	info, err := f.Stat()
    	if err != nil {
    		panic(err)
    	}
    
    	log.Println(ser, classifier.Classes)
    	log.Println(ser, "size", info.Size())
    }
    
    func main() {
    	write(bayesian.Gob)
    	read(bayesian.Gob)
    
    	write(bayesian.JSON)
    	read(bayesian.JSON)
    }
    
    opened by nnqq 0
  • Request for a new function that will enable adding of new class to an existing classifier

    Request for a new function that will enable adding of new class to an existing classifier

    Hi,

    I found this library very useful. I think since this has a supervised learning mechanism, it would be good if we can add a new class for stuffs that can be learned that can't be categorized from the existing classes.

    Thanks

    opened by tonyStreet 0
  • request for a tag of an older commit

    request for a tag of an older commit

    git tag -a 1.1 35eb93528ee -m "tag a specific older version that was built against"
    git push --tags
    

    In addition, it would be nice if current versions were tagged as well...

    opened by jeff-knurek 0
  • Allow classifier to initialise with only one class

    Allow classifier to initialise with only one class

    Current code panics in case the classifier is initialised with just 1 class. However I have an edge case where there might only be a single class. So I was thinking maybe classifier could be allowed to initialise with just 1 class. I made changes in the code and tested for my use case. It worked. However, the unit tests fail in this case. Is there any specific reason to keep this limitation? What would be a good way to solve this, if any?

    opened by SanketSKasar 1
  • Seen() is always 0?

    Seen() is always 0?

    package main
    
    import (
    	"log"
    
    	"github.com/jbrukh/bayesian"
    )
    
    const (
    	Arabic  bayesian.Class = "Arabic"
    	Malay   bayesian.Class = "Malay"
    	Yiddish bayesian.Class = "Yiddish"
    )
    
    func main() {
    
    	nbClassifier := bayesian.NewClassifier(Arabic, Malay, Yiddish)
    	arabicStuff := []string{"algeria", "bahrain", "comoros"}
    	malaysianStuff := []string{"malaysians", "bahasa"}
    	yiddishStuff := []string{"jewish", "jews", "israel"}
    	nbClassifier.Learn(arabicStuff, Arabic)
    	nbClassifier.Learn(malaysianStuff, Malay)
    	nbClassifier.Learn(yiddishStuff, Yiddish)
    
    	log.Println(nbClassifier.Learned()) // 3
    	log.Printf(`SEEN: %d`, nbClassifier.Seen()) // 0
    }
    
    opened by uccmen 1
Owner
Jake Brukhman
Jake Brukhman
Parses the Graphviz DOT language in golang

Parses the Graphviz DOT language and creates an interface, in golang, with which to easily create new and manipulate existing graphs which can be writ

Walter Schulze 484 Jun 25, 2022
Golang RServe client. Use R from Go

Roger Roger is a Go RServe client, allowing the capabilities of R to be used from Go applications. The communication between Go and R is via TCP. It i

Senseye 262 Jun 27, 2022
A well tested and comprehensive Golang statistics library package with no dependencies.

Stats - Golang Statistics Package A well tested and comprehensive Golang statistics library / package / module with no dependencies. If you have any s

Montana Flynn 2.5k Jul 6, 2022
Naive Bayesian Classification for Golang.

Naive Bayesian Classification Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports ter

Jake Brukhman 735 Jun 30, 2022
Bayesian text classifier with flexible tokenizers and storage backends for Go

Shield is a bayesian text classifier with flexible tokenizer and backend store support Currently implemented: Redis backend English tokenizer Example

Erik Aigner 152 Jun 30, 2022
naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Sebastien Binet 1.4k Jun 29, 2022
naive go bindings to GnuPlot

go-gnuplot Simple-minded functions to work with gnuplot. go-gnuplot runs gnuplot as a subprocess and pushes commands via the STDIN of that subprocess.

Sebastien Binet 26 Nov 8, 2021
naive go bindings to the CPython C-API

go-python Naive go bindings towards the C-API of CPython-2. this package provides a go package named "python" under which most of the PyXYZ functions

Sebastien Binet 1.4k Jun 24, 2022
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Dan Wolf 11 Aug 17, 2021
Naive Bayes spam-filtering in Go

Naive Bayes Spam-Filtering Spam is a simple implementation of naive Bayes spam-filtering algorithm. Resources youtube - live coding(farsi). License Th

Mobocrat 4 Nov 20, 2021
Naive LEGO helper for SberCloud DNS to be used with the EXEC plugin

Naive LEGO helper for SberCloud DNS Very basic, no any checks performed To be used with the exec plugin as described here Environment variables SBC_AC

null 0 Nov 3, 2021
A naive implementation of Raft consensus algorithm.

This implementation is used to learn/understand the Raft consensus algorithm. The code implements the behaviors shown in Figure 2 of the Raft paper wi

Martin 0 Dec 3, 2021
Paxoskv: a Naive and Basic paxos kv storage

paxoskv: a Naive and Basic paxos kv storage 这个repo 目前仅是用于学习的实例代码. 这是一个基于paxos, 只有200行代码的kv存储系统的简单实现, 以最简洁的形式展示paxos如何运行, 作为 可靠分布式系统-paxos的直观解释 这篇教程中的代

null 0 Nov 29, 2021
A naive and simple implementation of blockchains.

naivechain A naive and simple implementation of blockchains. Build And Run Download and compile go get -v github.com/kofj/naivechain Start First Node

疯魔慕薇 314 Jun 26, 2022
Gocfg - A naive and simple cfg parser that uses maps internally done in Go

gocfg A simple ini-like parser based on maps. Key iteration can be done using th

Lucas Eduardo 2 Jan 27, 2022
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Goribot 一个分布式友好的轻量的 Golang 爬虫框架。 完整文档 | Document !! Warning !! Goribot 已经被迁移到 Gospider|github.com/zhshch2002/gospider。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续

null 208 Jul 4, 2022
Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - ???? Get started - ?? Config sample - ?? Commands List - ?? Support and Suggestions

Oxequa 4.2k Jun 29, 2022
golang feature toggle library - a library to help make golang feature toggling clean and easy

toggle supports env_variable backed toggling. It can also be updated via a pubsub interface (tested w/ redis) 2 engines for toggle backing are include

John Calabrese 24 Mar 29, 2022
Hprose 1.0 for Golang (Deprecated). Hprose 2.0 for Golang is here:

Hprose for Golang Introduction Installation Usage Http Server Http Client Synchronous Invoking Synchronous Exception Handling Asynchronous Invoking As

Hprose 136 Jun 19, 2022
graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog

graylog-golang is a full implementation for sending messages in GELF (Graylog Extended Log Format) from Go (Golang) to Graylog

Robert Kowalski 77 May 25, 2022
Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - ???? Get started - ?? Config sample - ?? Commands List - ?? Support and Suggestions

Oxequa 4.2k Jul 4, 2022
Golang Skeleton With Fully Managed Versions For Kick Start GoLang Project Development

Golang Skeleton With Fully Managed Versions For Kick Start GoLang Project Development There is no doubt that Golang’s good documentation and intellige

MindInventory 323 Jun 26, 2022
Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package.

Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package. The library allows you to call Go service methods from PHP with a minimal footprint, structures and []byte support.

Spiral Scout 1.1k Jun 24, 2022
Golang CS:GO external base. Development currently halted due to compiler/runtime Golang bugs.

gogo Golang CS:GO External cheat/base. Also, my first Golang project. Wait! Development momentarily halted due to compiler/runtime bugs. Disclaimer Th

cristei 2 Jun 25, 2022
golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新,保留最基本的预览功能。服务器配置较低,如果出现链接超时请等待几秒刷新重试,或者换Chrome) 目前已经完成 docker部署 (不用为运行环境烦恼) Wor

CZC 59 Jun 14, 2022
memresolver is an in-memory golang resolver that allows to override current golang Lookup func literals

mem-resolver memresolver is an in-memory golang resolver that allows to override current golang Lookup func literals How to use it Create your custom

Antonio Ojea 4 Jun 23, 2022
Using the Golang search the Marvel Characters. This project is a web based golang application that shows the information of superheroes using Marvel api.

marvel-universe-web using the Golang search the Marvel Universe Characters About The Project This project is a web based golang application that shows

Burak KÖSE 2 Oct 10, 2021
A Golang REST API to handle users and posts for a simple instagram backend. Uses MongoDB as the database. Tested using golang-testing and Postman.

A Golang REST API to handle users and posts for a simple instagram backend. Uses MongoDB as the database. Tested using golang-testing and Postman.

Nitin Narayanan 1 Oct 10, 2021
packM 🧬 is fivem resource compiler for golang with the power of golang+typescript+webpack

packM ?? packM ?? is fivem resource compiler for golang ,typescript with the power of golang+typescript compiler+webpack

normalM 2 Jun 28, 2022