A Go implementation of the core algorithm in paper

Overview

Boolean Expression Indexer Go library

A Go implementation of the core algorithm in paper <Indexing Boolean Expression>, which already supports the following features mentioned in paper:

  • DNF algorithm
  • Simple match
  • TopN match (ranking based)

usage

import 
	"github.com/csimplestring/bool-expr-indexer/api/dnf/expr"
	"github.com/csimplestring/bool-expr-indexer/api/dnf/indexer"
	"github.com/stretchr/testify/assert"
)

    	k := indexer.NewMemIndexer()

	k.Add(expr.NewConjunction(
		1,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{1}},
			{Name: "state", Values: []string{"NY"}, Contains: true, Weights: []uint32{40}},
		},
	))

	k.Add(expr.NewConjunction(
		2,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{1}},
			{Name: "gender", Values: []string{"F"}, Contains: true, Weights: []uint32{3}},
		},
	))

	k.Add(expr.NewConjunction(
		3,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3"}, Contains: true, Weights: []uint32{2}},
			{Name: "gender", Values: []string{"M"}, Contains: true, Weights: []uint32{5}},
			{Name: "state", Values: []string{"CA"}, Contains: false, Weights: []uint32{0}},
		},
	))

	k.Add(expr.NewConjunction(
		4,
		[]*expr.Attribute{
			{Name: "state", Values: []string{"CA"}, Contains: true, Weights: []uint32{15}},
			{Name: "gender", Values: []string{"M"}, Contains: true, Weights: []uint32{9}},
		},
	))

	k.Add(expr.NewConjunction(
		5,
		[]*expr.Attribute{
			{Name: "age", Values: []string{"3", "4"}, Contains: true, Weights: []uint32{1, 5}},
		},
	))

	k.Add(expr.NewConjunction(
		6,
		[]*expr.Attribute{
			{Name: "state", Values: []string{"CA", "NY"}, Contains: false, Weights: []uint32{0, 0}},
		},
	))

	k.Build()

    	matcher := &allMatcher{}

	matched := matcher.Match(k, expr.Assignment{
		expr.Label{Name: "age", Value: "3"},
		expr.Label{Name: "state", Value: "CA"},
		expr.Label{Name: "gender", Value: "M"},
	})

	assert.ElementsMatch(t, []int{4, 5}, matched)

	matched = matcher.Match(k, expr.Assignment{
		expr.Label{Name: "age", Value: "3"},
		expr.Label{Name: "state", Value: "NY"},
		expr.Label{Name: "gender", Value: "F"},
	})

	assert.ElementsMatch(t, []int{1, 2, 5}, matched)

see matcher_test.go

Use Scenario

It is an important component in online computational advertising system. As mentioned in the paper, the original motivation of this library is to provide an efficient way to select ads based on pre-defined targeting condition. Given a user with some labels, it finds the best-matched ads.

In a typical advertising management system, the following hierarchy models are used:

Advertisers
|__ LineItem 
    |__ InsertionOrder 
        |__ Campaign
            |__ Creative Banners With Targeting Condition

Ad selector in RTB

In the RTB or similar environment, when a bidding request comes, usually it comes with some user-profile context(in the paper, it is called assignment). The Ad server then needs to quickly find the matched creative banners that match the targeting condition(in the paper, it is expressed as conjunctions). A naive way is to iterate all the banners and compare the user-profile with targeting condition one-by-one, which is very slow when there are millions of banners in the system, however the whole RTB workflow must be done with 100 ms.

User segmentation in DMP

In a DMP environment, all the collected user data shall be processed, enriched and aggregated in nearly real-time. One key problem is to quickly identify a user belongs to which group, based on pre-defined group condition.

This library is developed for the above scenarios. The following features are supported

  • DNF algorithm

    For example: a targeting condition can be expressed as: (age = 30 AND city = A) OR (gender = male AND age = 20 AND region = B) etc

  • Simple match

    It returns all the matched expression ids

  • TopN match (ranking based)

    It returns the TopN matched expression ids, based on an adopted WAND algorithm. Note that the caller has to provide the upper bound and score of each conjunction, usually can be calculated in a Spark job.

Benchmark

Total number of expressions - size of key operations within 1s op per ns
Benchmark_Match_10000_20-12 106482 10142 ns/op
Benchmark_Match_100000_20-12 70419 14749 ns/op
Benchmark_Match_1000000_20-12 14438 87884 ns/op
Benchmark_Match_10000_30-12 80452 15832 ns/op
Benchmark_Match_100000_30-12 61867 20770 ns/op
Benchmark_Match_1000000_30-12 10000 103594 ns/op
Benchmark_Match_10000_40-12 61221 20778 ns/op
Benchmark_Match_100000_40-12 49161 25129 ns/op
Benchmark_Match_1000000_40-12 10000 110132 ns/op

Memory usage: 1 million of expressions are indexed and it takes 100 MB on average.

Roadmap

This library only implements the core DNF algorithm in that paper. However, it is more useful to use it to build up a production-ready Ad Server. Currently, the indexer does not support online update(all the expressions shall be indexed once in the beginning). Also, the speed will be slow if the expressions number increases. Therefore I plan to support more advanced features soon

  • online index update
  • metrics monitoring
  • index partitioning
  • canary rollout deployment
  • HA support
  • http/gRPC transport
  • web UI

PR issues are welcome

Owner
wangyi
wangyi
Document Indexing and Searching Library in Go

Fehrist Fehrist is a pure Go library for indexing different types of documents. Currently it supports only CSV and JSON but flexible architecture give

Adnan Siddiqi 16 May 22, 2022
Package for indexing zip files and storing a compressed index

zipindex zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index. It will o

High Performance, Kubernetes Native Object Storage 27 May 10, 2022
A Go implementation of the 64-bit xxHash algorithm (XXH64)

xxhash xxhash is a Go implementation of the 64-bit xxHash algorithm, XXH64. This is a high-quality hashing algorithm that is much faster than anything

Caleb Spare 1.2k Jun 17, 2022
Implementation of Boyer-Moore fast string search algorithm in Go

boyermoore Implementation of Boyer-Moore fast string search algorithm in Go

sarp dağ demirel 49 Jun 7, 2022
An interesting go struct tag expression syntax for field validation, etc.

An interesting go struct tag expression syntax for field validation, etc.

Bytedance Inc. 1.2k Jun 25, 2022
Exp-tree: go library for parsing expression tree

Vinshop expression tree Exp-tree is go library for parsing expression tree Installation go get -u github.com/vinshop/exp-tree Quick start Format Expre

VinShop 6 May 11, 2022
parody of some of the basic python core features (collections package)

collections import "github.com/marcsantiago/collections" Overview Index Subdirectories Overview Index func StringEncoder(encoder *bytes.Buffer, data D

Marc Santiago 18 Jan 14, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

stirlingx 667 Jun 23, 2022
Multi-String Pattern Matching Algorithm Using TrieHashNode

Multi-String Pattern Matching algorithm. This implementation is inspired from Aho-Corasick algorithm Getting Started modelA = mspm.NewModel("mspm_mode

Sujit Shakya 17 Jan 23, 2022
golang sorting algorithm and data construction.

data-structures-questions 算法和程序结构是我们学习编程的基础,但是很多的时候,我们很多都是只是在应用,而没有深入的去研究这些,所以自己也在不断的思考和探索,然后分析,学习,总结自己学习的过程,希望可以和大家一起学习和交流下算法! 目录 网络协议 数据结构 算法 数据库 Go

keke 1.7k Jun 20, 2022
记录算法学习和LeetCode、LintCode、codewars的学习路程。A record of algorithm learning.

Problem List Leetcode、LintCode、Codewars Algorithm problem solution written by golang. LeetCode id Name(Github) Name(Gitee) 00001 TwoSum TwoSum 00003 L

Yin Yang 4 Nov 3, 2021
The simplest sorting algorithm that sorts in quadratic time

Simplest sort Showcases the simplest sorting algorithm that works in quadratic time. From here. The pseudocode for this algo can be seen below (sorts

Joshua T. 6 May 8, 2022
mattn 14 May 15, 2022
Smartsort - A smart sorting algorithm for Go to sort filename containing digits that is not zero padded

smartsort A smart sorting algorithm for Go to sort filename containing digits th

Toby Chui 1 Mar 26, 2022
Randomly generated tile maps using Oskar Stålberg's wave function collapse algorithm

go-wfc Procedurally-generated tile maps using wave function collapse. Demos Live demo (wasm): https://zfedoran.github.io/go-wfc-example/ Live algorith

Zelimir Fedoran 20 Jun 21, 2022
Go implementation of Count-Min-Log

Count-Min-Log Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier TL;DR: Count-Min-Log Sketch

Seif Lotfy 57 Apr 6, 2022
A Go implementation of the Elias-Fano encoding

go-ef A Go implementation of the Elias-Fano encoding Example package main import ( "fmt" "github.com/amallia/go-ef" "os" ) func main() {

Antonio Mallia 20 May 22, 2022
Set is a useful collection but there is no built-in implementation in Go lang.

goset Set is a useful collection but there is no built-in implementation in Go lang. Why? The only one pkg which provides set operations now is golang

zoumo 46 Apr 4, 2022
A skip list implementation in Go

About This is a library implementing skip lists for the Go programming language (http://golang.org/). Skip lists are a data structure that can be used

Ric (Ryszard) Szopa 236 Jun 1, 2022