Bayesian text classifier with flexible tokenizers and storage backends for Go

Overview

Shield is a bayesian text classifier with flexible tokenizer and backend store support

Currently implemented:

  • Redis backend
  • English tokenizer

Example

package main

import (
  "github.com/eaigner/shield"
)

func main() {
  sh := shield.New(
    shield.NewEnglishTokenizer(),
    shield.NewRedisStore("127.0.0.1:6379", "", 0),
  )

  sh.Learn("good", "sunshine drugs love sex lobster sloth")
  sh.Learn("bad", "fear death horror government zombie god")

  c, _ := sh.Classify("sloths are so cute i love them")
  if c != "good" {
    panic(c)
  }

  c, _ = sh.Classify("i fear god and love the government")
  if c != "bad" {
    panic(c)
  }
}
You might also like...
k-modes and k-prototypes clustering algorithms implementation in Go

go-cluster GO implementation of clustering algorithms: k-modes and k-prototypes. K-modes algorithm is very similar to well-known clustering algorithm

Probability distributions and associated methods in Go

godist godist provides some Go implementations of useful continuous and discrete probability distributions, as well as some handy methods for working

On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Training materials and labs for a "Getting Started" level course on COBOL

COBOL Programming Course This project is a set of training materials and labs for COBOL on z/OS. The following books are available within this reposit

A curated list of Awesome Go performance libraries and tools

Awesome Go performance Collection of the Awesome™ Go libraries, tools, project around performance. Contents Algorithm Assembly Benchmarks Compiling Co

Deploy, manage, and scale machine learning models in production
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

The Go kernel for Jupyter notebooks and nteract.
The Go kernel for Jupyter notebooks and nteract.

gophernotes - Use Go in Jupyter notebooks and nteract gophernotes is a Go kernel for Jupyter notebooks and nteract. It lets you use Go interactively i

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

Mab Multi-Armed Bandits Go Library Description Installation Usage Creating a bandit and selecting arms Numerical integration with numint Documentation

A program that generates a folder structure with challenges and projects for mastering a programming language.

Challenge Generator A program that generates a folder structure with challenges and projects for mastering a programming language. Explore the docs »

Comments
  • Add Ability to Test and Reconnect to Data Store

    Add Ability to Test and Reconnect to Data Store

    I've been using this library as part of a service and I've seen it lose the Redis connection, so I added a simple method on the store (and the related interfaces) to query and reset the connection as needed.

    This could apply to any other data store over the network, but could also just be a nil function for stores that don't need a connection reset.

    opened by mikeflynn 2
  • index out of range shield.(*RedisStore).ClassWordCounts

    index out of range shield.(*RedisStore).ClassWordCounts

    Hi,

    I'm getting this error quite often, any ideas on what may cause such behaviour ?

    it seems to be caused after running shieldInstance.Classify(text)

    panic: runtime error: index out of range
    
    goroutine 90 [running]:
    github.com/eaigner/shield.(*RedisStore).ClassWordCounts(0xc20806c000, 0xc20927a078, 0x2, 0xc20927e000, 0x2, 0x2, 0xc2092373b0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/redis.go:113 +0x85c
    github.com/eaigner/shield.(*shield).Score(0xc20814d6c0, 0xc209fb0460, 0x1a, 0x0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/shield.go:92 +0x27e
    github.com/eaigner/shield.(*shield).Classify(0xc20814d6c0, 0xc209fb0460, 0x1a, 0x0, 0x0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/shield.go:143 +0x7d
    
    opened by thomasmodeneis 0
Owner
Erik Aigner
Erik Aigner
Tpu-traffic-classifier - This small program creates ipsets and iptables rules for nodes in the Solana network

TPU traffic classifier This small program creates ipsets and iptables rules for

Triton One 10 Nov 23, 2022
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Dan Wolf 13 Sep 9, 2022
A License Classifier

License Classifier Introduction The license classifier is a library and set of tools that can analyze text to determine what type of license it contai

Google 266 Dec 22, 2022
Versioned model registry suitable for temporary in-training storage and permanent storage

Cogment Model Registry Cogment is an innovative open source AI platform designed to leverage the advent of AI to benefit humankind through human-AI co

Cogment 2 May 26, 2022
A highly flexible blockchain architecture with great transaction performance.

XuperChain 中文说明 What is XuperChain XuperChain, the first open source project of XuperChain Lab, introduces a underlying solution to build the super al

null 1.6k Jan 2, 2023
The MapReduce pattern with Goroutines and channels to count n-grams in a directory of text files

MapReduce Ngram This Golang program implements the MapReduce pattern with Goroutines and channels to count n-grams in a directory of text files. Usage

Zachary Ashen 0 Dec 16, 2021
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

null 82 Dec 14, 2022
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.

End-to-end computer vision platform Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises. onepa

Onepanel, Inc. 642 Dec 12, 2022
Go types, funcs, and utilities for working with cards, decks, and evaluating poker hands (Holdem, Omaha, Stud, more)

cardrank.io/cardrank Package cardrank.io/cardrank provides a library of types, funcs, and utilities for working with playing cards, decks, and evaluat

null 62 Dec 25, 2022
Genetic Algorithm and Particle Swarm Optimization

evoli Genetic Algorithm and Particle Swarm Optimization written in Go Example Problem Given f(x,y) = cos(x^2 * y^2) * 1/(x^2 * y^2 + 1) Find (x,y) suc

Guillaume Simonneau 26 Dec 22, 2022