# Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. `bayesian`

also supports term frequency-inverse document frequency calculations (TF-IDF).

Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.

## Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.

## Installation

Using the go command:

```
go get github.com/navossoc/bayesian
go install !$
```

## Documentation

See the GoPkgDoc documentation here.

## Features

- Conditional probability and "log-likelihood"-like scoring.
- Underflow detection.
- Simple persistence of classifiers.
- Statistics.
- TF-IDF support.

## Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

```
import "github.com/navossoc/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)
```

Then you can ascertain the scores of each class and the most likely class your data belongs to:

```
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)
```

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

```
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)
```

## Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as `LogScores`

, `SafeProbScores`

, and `ProbScores`

)

```
import "github.com/navossoc/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)
// Required
classifier.ConvertTermsFreqToTfIdf()
```

Then you can ascertain the scores of each class and the most likely class your data belongs to:

```
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)
```

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

```
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)
```

Use wisely.