Bayesian text classifier with flexible tokenizers and storage backends for Go

Overview

Shield is a bayesian text classifier with flexible tokenizer and backend store support

Currently implemented:

  • Redis backend
  • English tokenizer

Example

package main

import (
  "github.com/eaigner/shield"
)

func main() {
  sh := shield.New(
    shield.NewEnglishTokenizer(),
    shield.NewRedisStore("127.0.0.1:6379", "", 0),
  )

  sh.Learn("good", "sunshine drugs love sex lobster sloth")
  sh.Learn("bad", "fear death horror government zombie god")

  c, _ := sh.Classify("sloths are so cute i love them")
  if c != "good" {
    panic(c)
  }

  c, _ = sh.Classify("i fear god and love the government")
  if c != "bad" {
    panic(c)
  }
}
Issues
  • Add Ability to Test and Reconnect to Data Store

    Add Ability to Test and Reconnect to Data Store

    I've been using this library as part of a service and I've seen it lose the Redis connection, so I added a simple method on the store (and the related interfaces) to query and reset the connection as needed.

    This could apply to any other data store over the network, but could also just be a nil function for stores that don't need a connection reset.

    opened by mikeflynn 2
  • index out of range shield.(*RedisStore).ClassWordCounts

    index out of range shield.(*RedisStore).ClassWordCounts

    Hi,

    I'm getting this error quite often, any ideas on what may cause such behaviour ?

    it seems to be caused after running shieldInstance.Classify(text)

    panic: runtime error: index out of range
    
    goroutine 90 [running]:
    github.com/eaigner/shield.(*RedisStore).ClassWordCounts(0xc20806c000, 0xc20927a078, 0x2, 0xc20927e000, 0x2, 0x2, 0xc2092373b0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/redis.go:113 +0x85c
    github.com/eaigner/shield.(*shield).Score(0xc20814d6c0, 0xc209fb0460, 0x1a, 0x0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/shield.go:92 +0x27e
    github.com/eaigner/shield.(*shield).Classify(0xc20814d6c0, 0xc209fb0460, 0x1a, 0x0, 0x0, 0x0, 0x0)
        /opt/gocode/src/moody/Godeps/_workspace/src/github.com/eaigner/shield/shield.go:143 +0x7d
    
    opened by thomasmodeneis 0
Owner
Erik Aigner
Erik Aigner
Tpu-traffic-classifier - This small program creates ipsets and iptables rules for nodes in the Solana network

TPU traffic classifier This small program creates ipsets and iptables rules for

Triton One 7 May 16, 2022
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Dan Wolf 11 Aug 17, 2021
A License Classifier

License Classifier Introduction The license classifier is a library and set of tools that can analyze text to determine what type of license it contai

Google 239 Jun 24, 2022
Versioned model registry suitable for temporary in-training storage and permanent storage

Cogment Model Registry Cogment is an innovative open source AI platform designed to leverage the advent of AI to benefit humankind through human-AI co

Cogment 2 May 26, 2022
A highly flexible blockchain architecture with great transaction performance.

XuperChain 中文说明 What is XuperChain XuperChain, the first open source project of XuperChain Lab, introduces a underlying solution to build the super al

null 1.6k Jun 29, 2022
The MapReduce pattern with Goroutines and channels to count n-grams in a directory of text files

MapReduce Ngram This Golang program implements the MapReduce pattern with Goroutines and channels to count n-grams in a directory of text files. Usage

Zachary Ashen 0 Dec 16, 2021
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

null 58 Jun 23, 2022
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.

End-to-end computer vision platform Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises. onepa

Onepanel, Inc. 605 Jun 24, 2022
Go types, funcs, and utilities for working with cards, decks, and evaluating poker hands (Holdem, Omaha, Stud, more)

cardrank.io/cardrank Package cardrank.io/cardrank provides a library of types, funcs, and utilities for working with playing cards, decks, and evaluat

null 50 Jun 21, 2022
Genetic Algorithm and Particle Swarm Optimization

evoli Genetic Algorithm and Particle Swarm Optimization written in Go Example Problem Given f(x,y) = cos(x^2 * y^2) * 1/(x^2 * y^2 + 1) Find (x,y) suc

Guillaume Simonneau 23 Jun 12, 2022
k-modes and k-prototypes clustering algorithms implementation in Go

go-cluster GO implementation of clustering algorithms: k-modes and k-prototypes. K-modes algorithm is very similar to well-known clustering algorithm

e-Xpert Solutions 31 Mar 14, 2022
Probability distributions and associated methods in Go

godist godist provides some Go implementations of useful continuous and discrete probability distributions, as well as some handy methods for working

Edd Robinson 33 Apr 1, 2022
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Conner DiPaolo 1.3k Jun 30, 2022
Training materials and labs for a "Getting Started" level course on COBOL

COBOL Programming Course This project is a set of training materials and labs for COBOL on z/OS. The following books are available within this reposit

Open Mainframe Project 2.2k Jun 27, 2022
A curated list of Awesome Go performance libraries and tools

Awesome Go performance Collection of the Awesome™ Go libraries, tools, project around performance. Contents Algorithm Assembly Benchmarks Compiling Co

Oleg Kovalov 271 Jun 29, 2022
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.8k Jul 3, 2022
The Go kernel for Jupyter notebooks and nteract.

gophernotes - Use Go in Jupyter notebooks and nteract gophernotes is a Go kernel for Jupyter notebooks and nteract. It lets you use Go interactively i

GopherData 3.3k Jun 27, 2022
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

Mab Multi-Armed Bandits Go Library Description Installation Usage Creating a bandit and selecting arms Numerical integration with numint Documentation

Stitch Fix Technology 28 May 15, 2022
A program that generates a folder structure with challenges and projects for mastering a programming language.

Challenge Generator A program that generates a folder structure with challenges and projects for mastering a programming language. Explore the docs »

João Freitas 68 Jun 25, 2022