A recommender system service based on collaborative filtering written in Go

Overview

Language: English | 中文

gorse: Go Recommender System Engine

Build Coverage Report GoDoc RTD Demo
build codecov Go Report Card GoDoc Documentation Status Website

gorse is an offline recommender system backend based on collaborative filtering written in Go.

This project is aim to provide a high performance, easy-to-use, programming language irrelevant recommender micro-service based on collaborative filtering. We could build a simple recommender system on it, or set up a more sophisticated recommender system using candidates generated by it. It features:

  • Implements 7 rating based recommenders and 4 ranking based recommenders.
  • Supports data loading, data splitting, model training, model evaluation and model selection.
  • Provides the data import/export tool, model evaluation tool and RESTful recomender server.
  • Accelerates computations by SIMD instructions and multi-threading.

For more information:

  • Visit GoDoc for detailed documentation of codes.
  • Visit ReadTheDocs for tutorials, examples and usages.
  • Visit SteamLens for a Steam games recommender system based on gorse.

Install

  • Download from release.
  • Build from source:

Install Golang and run go get:

$ go get github.com/zhenghaoz/gorse/...

It will download all packages and build the gorse command line into your $GOBIN path.

If your CPU supports AVX2 and FMA3 instructions, use the avx2 build tag to enable AVX2 and FMA3 instructions.

$ go get -tags='avx2' github.com/zhenghaoz/gorse/...

Usage

gorse is an offline recommender system backend based on collaborative filtering written in Go.

Usage:
  gorse [flags]
  gorse [command]

Available Commands:
  export-feedback Export feedback to CSV
  export-items    Export items to CSV
  help            Help about any command
  import-feedback Import feedback from CSV
  import-items    Import items from CSV
  serve           Start a recommender sever
  test            Test a model by cross validation
  version         Check the version

Flags:
  -h, --help   help for gorse

Use "gorse [command] --help" for more information about a command.

Evaluate a Recommendation Model

gorse provides the tool to evaluate models. We can run gorse test -h or check online documents to learn its usage. For example:

$ gorse test bpr --load-csv u.data --csv-sep $'\t' --eval-precision --eval-recall --eval-ndcg --eval-map --eval-mrr
...
+--------------+----------+----------+----------+----------+----------+----------------------+
|              |  FOLD 1  |  FOLD 2  |  FOLD 3  |  FOLD 4  |  FOLD 5  |         MEAN         |
+--------------+----------+----------+----------+----------+----------+----------------------+
| [email protected] | 0.321041 | 0.327128 | 0.321951 | 0.318664 | 0.317197 | 0.321196(±0.005931)  |
| [email protected]    | 0.212509 | 0.213825 | 0.213336 | 0.206255 | 0.210764 | 0.211338(±0.005083)  |
| [email protected]      | 0.380665 | 0.385125 | 0.380003 | 0.369115 | 0.375538 | 0.378089(±0.008974)  |
| [email protected]       | 0.122098 | 0.123345 | 0.119723 | 0.116305 | 0.119468 | 0.120188(±0.003883)  |
| [email protected]       | 0.605354 | 0.601110 | 0.600359 | 0.577333 | 0.599930 | 0.596817(±0.019484)  |
+--------------+----------+----------+----------+----------+----------+----------------------+

u.data is the CSV file of ratings in MovieLens 100K dataset and u.item is the CSV file of items in MovieLens 100K dataset. All CLI tools are listed in the CLI-Tools section of Wiki.

Setup a Recommender Server

It's easy to setup a recomendation service with gorse.

  • Step 1: Import feedback and items.
$ gorse import-feedback ~/.gorse/gorse.db u.data --sep $'\t' --timestamp 2
$ gorse import-items ~/.gorse/gorse.db u.item --sep '|'

It imports feedback and items from CSV files into the database file ~/.gorse/gorse.db. The low level storage engine is implemented by BoltDB.

  • Step 2: Start a server.
$ gorse serve -c config.toml

It loads configurations from config.toml and start a recommendation server. It may take a while to generate all recommendations. Detailed information about configuration is in the Configuration section of Wiki. Before set hyper-parameters for the model, it is useful to test the performance of chosen hyper-parameters by the model evaluation tool.

  • Step 3: Get recommendations.
$ curl 127.0.0.1:8080/recommends/1?number=5

It requests 5 recommended items for the 1-th user. The response might be:

[
    {
        "ItemId": "919",
        "Popularity": 96,
        "Timestamp": "1995-01-01T00:00:00Z",
        "Score": 1
    },
    {
        "ItemId": "474",
        "Popularity": 194,
        "Timestamp": "1963-01-01T00:00:00Z",
        "Score": 0.9486470268850127
    },
    ...
]

"ItemId" is the ID of the item and "Score" is the score generated by the recommendation model used to rank. See RESTful APIs in Wiki for more information about RESTful APIs.

Use gorse in Go

Also, gorse could be imported and used in Go application. There is an example that fits a recommender and generate recommended items:

package main

import (
	"fmt"
	"github.com/zhenghaoz/gorse/base"
	"github.com/zhenghaoz/gorse/core"
	"github.com/zhenghaoz/gorse/model"
)

func main() {
	// Load dataset
	data := core.LoadDataFromBuiltIn("ml-100k")
	// Split dataset
	train, test := core.Split(data, 0.2)
	// Create model
	bpr := model.NewBPR(base.Params{
		base.NFactors:   10,
		base.Reg:        0.01,
		base.Lr:         0.05,
		base.NEpochs:    100,
		base.InitMean:   0,
		base.InitStdDev: 0.001,
	})
	// Fit model
	bpr.Fit(train, nil)
	// Evaluate model
	scores := core.EvaluateRank(bpr, test, train, 10, core.Precision, core.Recall, core.NDCG)
	fmt.Printf("[email protected] = %.5f\n", scores[0])
	fmt.Printf("[email protected] = %.5f\n", scores[1])
	fmt.Printf("[email protected] = %.5f\n", scores[1])
	// Generate recommendations for user(4):
	// Get all items in the full dataset
	items := core.Items(data)
	// Get user(4)'s ratings in the training dataset
	excludeItems := train.User("4")
	// Get top 10 recommended items (excluding rated items) for user(4) using BPR
	recommendItems, _ := core.Top(items, "4", 10, excludeItems, bpr)
	fmt.Printf("Recommend for user(4) = %v\n", recommendItems)
}

The output should be:

2019/11/14 08:07:45 Fit BPR with hyper-parameters: n_factors = 10, n_epochs = 100, lr = 0.05, reg = 0.01, init_mean = 0, init_stddev = 0.001
2019/11/14 08:07:45 epoch = 1/100, loss = 55451.70899118173
...
2019/11/14 08:07:49 epoch = 100/100, loss = 10093.29427682404
[email protected] = 0.31699
[email protected] = 0.20516
[email protected] = 0.20516
Recommend for 4-th user = [288 313 245 307 328 332 327 682 346 879]

Recommenders

There are 11 recommendation models implemented by gorse.

Model Data Task Multi-threading Fit
explicit implicit weight rating ranking
BaseLine ✔️ ✔️ ✔️
NMF ✔️ ✔️ ✔️
SVD ✔️ ✔️ ✔️
SVD++ ✔️ ✔️ ✔️ ✔️
KNN ✔️ ✔️ ✔️ ✔️
CoClustering ✔️ ✔️ ✔️ ✔️
SlopeOne ✔️ ✔️ ✔️ ✔️
ItemPop ✔️ ✔️ ✔️
KNN (Implicit) ✔️ ✔️ ✔️ ✔️ ✔️
WRMF ✔️ ✔️ ✔️ ✔️
BPR ✔️ ✔️ ✔️
  • Cross-validation of rating models on MovieLens 1M [Source].
Model RMSE MAE Time (AVX2)
SlopeOne 0.90683 0.71541 0:00:26
CoClustering 0.90701 0.71212 0:00:08
KNN 0.86462 0.67663 0:02:07
SVD 0.84252 0.66189 0:02:21 0:01:48
SVD++ 0.84194 0.66156 0:03:39 0:02:47
  • Cross-validation of ranking models on MovieLens 100K [Source].
Model [email protected] [email protected] [email protected] [email protected] [email protected] Time
ItemPop 0.19081 0.11584 0.05364 0.21785 0.40991 0:00:03
KNN 0.28584 0.19328 0.11358 0.34746 0.57766 0:00:41
BPR 0.32083 0.20906 0.11848 0.37643 0.59818 0:00:13
WRMF 0.34727 0.23665 0.14550 0.41614 0.65439 0:00:14

Performance

gorse is much faster than Surprise, and comparable to librec while using less memory space than both of them. The memory efficiency is achieved by sophisticated data structures.

  • Cross-validation of SVD on MovieLens 100K [Source]:

  • Cross-validation of SVD on MovieLens 1M [Source]:

Contributors

Any kind of contribution is expected: report a bug, give a advice or even create a pull request.

Acknowledgments

gorse is inspired by following projects:

Limitations

gorse has limitations and might not be applicable to some scenarios:

  • No Scalability: gorse is a recommendation service on a single host, so it's unable to handle large data.
  • No Features: gorse exploits interactions between items and users while features of items and users are ignored.
Issues
  • 执行gorse-cli cluster  server节点都会增加一个

    执行gorse-cli cluster server节点都会增加一个

    image image

    每次执行gorse-cli cluster server节点都会增加一个呢? docker-compose启动的,windows10系统

    opened by ouyangzhongmin 9
  • Prevent recommending an item for an user who provided feedback

    Prevent recommending an item for an user who provided feedback

    I have been trying the gorse with bpr model to recommend content to users, whom have option to give feedback which are fed into gorse.

    I'm using gorse as standalone web server and accessing with it's own apis.

    What I'm trying to accomplish is preventing the engine from recommending content the user already provided feedback. Bu I have looked both into the docs and the code but nothing really looked like what I wanted. There is somewhat related #33 but it recommends an entry only once and never again.

    Is there any way to accomplish what I want.

    opened by abdullahcanakci 8
  • 请问系统可以在windows上用么

    请问系统可以在windows上用么

    如题

    opened by dotcool 7
  • 访问http://127.0.0.1:8087/apidocs/失败

    访问http://127.0.0.1:8087/apidocs/失败

    image docker启动

    image docker-compose配置: image

    bug 
    opened by ouyangzhongmin 6
  • 关于Restful server 有一些问题。

    关于Restful server 有一些问题。

    如果用这个做个内容推荐,类似推特 实现的效果:用户每刷新一次,请求一次推荐,每次的推荐的内容不同。 根据用户的浏览点击和点赞来打分,把事实的数据同步给gorse(新发的推文,浏览量,点赞量打分更新等), 1.每一次请求出不同的推荐结果(已经推荐过的内容不会再推荐)? 2.是否可以控制每次最大推荐的数量(比如一次最多推荐20条条推文) ?

    feature/major 
    opened by davewang 5
  • 用户反馈记录里如何记录用户的评分值?

    用户反馈记录里如何记录用户的评分值?

    image 根据数据库表结构用户反馈记录只能记录用户行为,但是评分值是如何录入的呢?

    opened by ouyangzhongmin 5
  • docker-compose and dummy dataset of products/items

    docker-compose and dummy dataset of products/items

    Hi @zhenghaoz ,

    Hope you are all well !

    I am posting this issue as I was wondering if it could be possible to add a docker-compose file for bootstrapping gorse with all its components ?

    Also, I was wondering if you could provide a dummy dataset of products (e-commerce) to show case how to import and query for collaborative filtering scores. If you want, I can create such dataset if you give me the columns definition of the csv or jsonl file. :-)

    Thanks for any insights or inputs on these questions.

    Cheers, Luc Michalski

    opened by lucmichalski 5
  • worker error

    worker error"failed to decode click model"

    error log info:

    {"level":"error","ts":1627382137.876585,"caller":"worker/worker.go:256","msg":"failed to decode click model","error":"EOF","stacktrace":"github.com/zhenghaoz/gorse/worker.(*Worker).Pull\n\t/home/runner/work/gorse/gorse/worker/worker.go:256"}

    opened by lbw114007 0
  • Question about multi-tenancy

    Question about multi-tenancy

    I am curious how you would set this up to separate multiple clients under users and interaction data.

    feature/future 
    opened by cboothe 1
  • Clickhouse support ?

    Clickhouse support ?

    Any chance for clickhouse database support?

    duplicate 
    opened by kingslyChen 3
  • Metrics

    Metrics

    I see that metrics are a bit basic. I dealt with production ML models, DS, BA, and policy makers might want to use more advanced metrics.

    For example, if it is a classification problem then:

    Discrimination Power:

    • Area Under Curve (AUC ROC).
    • Kolmogorov-Smirnov Statistic (KS)
    • Accuracy Ratio (AR) based on Accuracy Profile
    • Pietra index
    • Precision at Recal

    Distribution and Skew:

    • Brier score
    • Hosmer-Lemeshow (HL)
    • Jensen-Shannon distance (JSD)
    • Herfindahl-Hirschman Index (HHI)

    Arguable, for ranking you would use other metrics. Best place would be to make a server of literature and recommendations from consulting companies like BCG or Accenture on what they recommend for the enterprises who develop recommendation systems.

    Also, would be good if you show how your models/system compare to other solutions.

    feature/minor 
    opened by nikolaydubina 1
  • XGBoost models

    XGBoost models

    Since you guys deal with tabular data XGBoost would be a nice fit.

    Also, you can avoid IPC by implementing it in native Go say with go-featureprocessing + leaves.

    Cheers,

    -- Nikolay

    wontfix 
    opened by nikolaydubina 5
  • does gorse keep track of items recommended to each user?

    does gorse keep track of items recommended to each user?

    In order to recommend new items to a user each time, recommendation history needs to be created for all users.

    Does gorse keep track of what items have been recommended to a user?

    question 
    opened by outranker 2
  • clickhouse as a datasource

    clickhouse as a datasource

    Hello,

    It would be great to integrate clickhouse as a datasource for recommendation data.

    feature/major 
    opened by talelk 4
  • Recommend new relevant products based on historical transaction records

    Recommend new relevant products based on historical transaction records

    In a business scenario, for user historical transactions, or sold out items, to recommend new items on the shelves can be based on feedback or labels.

    In other words, for some items, participate in the calculation, but not participate in the recommendation.

    feature/major 
    opened by hetao29 0
  • Different weights for types of feedback

    Different weights for types of feedback

    Hi,

    This is more of a feature request than an issue. Would it be possible to add weights to feedback?

    Usage: Our webshop has 2 possible feedbacks at the moment: "add to cart" and "placed order". I would like to add "view product" as a feedback but this would skew the current feedback in favour of "view product" while this is less important than "placed order".

    Thanks Maarten

    feature/minor 
    opened by maartendebal 0
  • 请问是否有实现物品相似度计算呢

    请问是否有实现物品相似度计算呢

    比如根据电影的属性和标签计算一个电影相似的推荐列表

    feature/major 
    opened by ouyangzhongmin 5
Releases(v0.2.3)
Owner
Zhenghao Zhang
Knowledge is power, France is bacon.
Zhenghao Zhang
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة‎ ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язы

Donne Martin 139.2k Jul 23, 2021
Collaborative Filtering (CF) Algorithms in Go!

Go Recommend Recommendation algorithms (Collaborative Filtering) in Go! Background Collaborative Filtering (CF) is oftentimes used for item recommenda

Tim Kaye 174 Jul 1, 2021
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

null 1.7k Jul 17, 2021
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Alonso Vidales 191 Apr 17, 2021
Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Ryan Bressler 687 Jul 17, 2021
Path to a Software Architect

Contents What is a Software Architect? Levels of Architecture Typical Activities Important Skills (1) Design (2) Decide (3) Simplify (4) Code (5) Docu

Justin Miller 7.1k Jul 27, 2021
Bigmachine is a library for self-managing serverless computing in Go

Bigmachine Bigmachine is a toolkit for building self-managing serverless applications in Go. Bigmachine provides an API that lets a driver process for

GRAIL 170 Jun 18, 2021
A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

Dan Wolf 10 Apr 28, 2021
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Volcano 1.8k Jul 27, 2021
Vald. A Highly Scalable Distributed Vector Search Engine

Vald is a highly scalable distributed fast approximate nearest neighbor dense vector search engine.

Vector Data as a Service 749 Jul 21, 2021
a cheat-sheet for mathematical notation in code form

math-as-code Chinese translation (中文版) Python version (English) This is a reference to ease developers into mathematical notation by showing compariso

Jam3 11.5k Jul 24, 2021
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 70 Jul 27, 2021
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

Mab Multi-Armed Bandits Go Library Description Installation Usage Creating a bandit and selecting arms Numerical integration with numint Documentation

Stitch Fix Technology 18 Jun 23, 2021
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

WeBankFinTech 198 Jul 26, 2021