Go Machine Learning Benchmarks

Overview

Go Machine Learning Benchmarks

Given a raw data in a Go service, how quickly can I get machine learning inference for it?

Typically, Go is dealing with structured single sample data. Thus, we are focusing on tabular machine learning models only, such as popular XGBoost. It is common to run Go service in a backed form and on Linux platform, thus we do not consider other deployment options. In the work bellow, we compare typical implementations on how this inference task can be performed.

diagram

host: AWS EC2 t2.xlarge shared
os: Ubuntu 20.04 LTS 
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
BenchmarkXGB_Go_GoFeatureProcessing_GoLeaves_noalloc                              491 ns/op
BenchmarkXGB_Go_GoFeatureProcessing_GoLeaves                                      575 ns/op
BenchmarkXGB_Go_GoFeatureProcessing_UDS_RawBytes_Python_XGB                    243056 ns/op
BenchmarkXGB_CGo_GoFeatureProcessing_XGB                                       244941 ns/op
BenchmarkXGB_Go_GoFeatureProcessing_UDS_gRPC_CPP_XGB                           367433 ns/op
BenchmarkXGB_Go_GoFeatureProcessing_UDS_gRPC_Python_XGB                        785147 ns/op
BenchmarkXGB_Go_UDS_gRPC_Python_sklearn_XGB                                  21699830 ns/op
BenchmarkXGB_Go_HTTP_JSON_Python_Gunicorn_Flask_sklearn_XGB                  21935237 ns/op

Abbreviations and Frameworks

Dataset and Model

We are using classic Titanic dataset. It contains numerical and categorical features, which makes it a representative of typical case. Data and notebooks to train model and preprocessor is available in /data and /notebooks.

Some numbers for reference

How fast do you need to get?

                   200ps - 4.6GHz single cycle time
                1ns      - L1 cache latency
               10ns      - L2/L3 cache SRAM latency
               20ns      - DDR4 CAS, first byte from memory latency
               20ns      - C++ raw hardcoded structs access
               80ns      - C++ FlatBuffers decode/traverse/dealloc
              150ns      - PCIe bus latency
              171ns      - cgo call boundary, 2015
              200ns      - HFT FPGA
              475ns      - 2020 MLPerf winner recommendation inference time per sample
 ---------->  500ns      - go-featureprocessing + leaves
              800ns      - Go Protocol Buffers Marshal
              837ns      - Go json-iterator/go json unmarshal
           1µs           - Go protocol buffers unmarshal
           3µs           - Go JSON Marshal
           7µs           - Go JSON Unmarshal
          10µs           - PCIe/NVLink startup time
          17µs           - Python JSON encode/decode times
          30µs           - UNIX domain socket; eventfd; fifo pipes
         100µs           - Redis intrinsic latency; KDB+; HFT direct market access
         200µs           - 1GB/s network air latency; Go garbage collector pauses interval 2018
         230µs           - San Francisco to San Jose at speed of light
         500µs           - NGINX/Kong added latency
     10ms                - AWS DynamoDB; WIFI6 "air" latency
     15ms                - AWS Sagemaker latency; "Flash Boys" 300million USD HFT drama
     30ms                - 5G "air" latency
     36ms                - San Francisco to Hong-Kong at speed of light
    100ms                - typical roundtrip from mobile to backend
    200ms                - AWS RDS MySQL/PostgreSQL; AWS Aurora
 10s                     - AWS Cloudfront 1MB transfer time

Profiling and Analysis

[491ns/575ns] Leaves — we see that most of time taken in Leaves Random Forest code. Leaves code does not have mallocs. Inplace preprocessing does not have mallocs, with non-inplace version malloc happen and takes and takes half of time of preprocessing. leaves

[243µs] UDS Raw bytes Python — we see that Python takes much longer time than preprocessing in Go, however Go is at least visible on the chart. We also note that Python spends most of the time in libgomp.so call, this library is in GNU OpenMP written in C which does parallel operations.

uds

[244µs] CGo version — similarly, we see that call to libgomp.so is being done. It is much smaller compare to rest of o CGo code, as compared to Python version above. Over overall results are not better then? Likely this is due to performance degradation from Go to CGo. We also note that malloc is done.

cgo

[367µs] gRPC over UDS to C++ — we see that Go code is around 50% of C++ version. In C++ 50% of time spend on gRPC code. Lastly, C++ also uses libgomp.so. We don't see on this chart, but likely Go code also spends considerable time on gRPC code.

cgo

[785µs] gRPC over UDS to Python wihout sklearn — we see that Go code is visible in the chart. Python spends only portion on time in libgomp.so.

cgo

[21ms] gRPC over UDS to Python with sklearn — we see that Go code (main.test) is no longer visible the chart. Python spends only small fraction of time on libgomp.so.

cgo

[22ms] REST service version with sklearn — similarly, we see that Go code (main.test) is no longer visible in the chart. Python spends more time in libgomp.so as compared to Python + gRPC + skelarn version, however it is not clear why results are worse.

cgo

Future work

  • go-featureprocessing - gRPCFlatBuffers - C++ - XGB
  • batch mode
  • UDS - gRPC - C++ - ONNX (sklearn + XGBoost)
  • UDS - gRPC - Python - ONNX (sklearn + XGBoost)
  • cgo ONNX (sklearn + XGBoost) (examples: 1)
  • native Go ONNX (sklearn + XGBoost) — no official support, https://github.com/owulveryck/onnx-go is not complete
  • text
  • images
  • videos

Reference

Releases(2.4)
Owner
Nikolay Dubina
Nikolay Dubina
On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Conner DiPaolo 1.3k Jun 30, 2022
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.5k Jun 26, 2022
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Alonso Vidales 193 Jun 23, 2022
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Gorgonia 4.5k Jun 26, 2022
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

WeBankFinTech 344 Jun 22, 2022
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Cortex Labs 7.8k Jun 21, 2022
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

AUNUM 300 Jun 16, 2022
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Replicate 2.6k Jun 30, 2022
Katib is a Kubernetes-native project for automated machine learning (AutoML).

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architec

Kubeflow 1.2k Jun 23, 2022
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

null 58 Jun 23, 2022
Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

NLP Odyssey 1.2k Jun 24, 2022
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

AUNUM 272 Jun 29, 2022
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Spice.ai Spice.ai is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY Spice.ai is

Spice.ai 738 Jun 21, 2022
FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

Vadim Kulibaba 1 May 30, 2022
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Raven 1 Dec 2, 2021
A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

null 2 Oct 3, 2021
cmd tool for automatic storage and comparison of benchmarks results

prettybenchcmp prettybenchcmp is cmd tool for storage and comparison of benchmarks results. There is a standard tool benchcmp, but I don't think that

Petr 18 Apr 6, 2021
Benchmarks of Go serialization methods

Benchmarks of Go serialization methods This is a test suite for benchmarking various Go serialization methods. Tested serialization methods encoding/g

Alec Thomas 1.3k Jun 22, 2022
Benchmarks of common basic operations for the Go language.

gocostmodel This package was inspired by Brian W. Kernighan and Rob Pike's book "The Practice of Programming" (Addison-Wesley, 1999). In Chapter 7 on

Martin Angers 57 Jul 19, 2021
Go micro-benchmarks for calculating the speed of language constructs

== About == Gospeed is a library of micro-benchmarks for Go which evolved from the GoLightly project. It's main utility is for understanding and reas

Eleanor McHugh 110 Jun 19, 2022
benchmarks for implementation of servers which support 1 million connections

Benchmark for implementation of servers that support 1m connections inspired by handling 1M websockets connections in Go Servers 1_simple_tcp_server:

smallnest 1.6k Jun 23, 2022
Robust framework for running complex workload scenarios in isolation, using Go; for integration, e2e tests, benchmarks and more! 💪

e2e Go Module providing robust framework for running complex workload scenarios in isolation, using Go and Docker. For integration, e2e tests, benchma

null 100 Jun 5, 2022
A scanner for running security-related configuration checks such as CIS benchmarks

Localtoast Localtoast is a scanner for running security-related configuration checks such as CIS benchmarks in an easily configurable manner. The scan

Google 25 May 28, 2022
Benchmarks to compare Go Generics

This is a collection of various sorts implemnted both as []int only and as const

Jacob Alberty 3 Jan 30, 2022
Sqlbench runs benchmarks on an SQL database

sqlbench runs benchmarks on an SQL database. Right now this works for PostgreSQL

Martin Tournoij 1 Dec 20, 2021
Cloud-Z gathers information and perform benchmarks on cloud instances in multiple cloud providers.

Cloud-Z Cloud-Z gathers information and perform benchmarks on cloud instances in multiple cloud providers. Cloud type, instance id, and type CPU infor

CloudSnorkel 16 Jun 8, 2022
An operator that helps you perform benchmarks

Camunda-Benchmark-Operator ??️‍♀️ An operator that helps you perform benchmarks. Your first benchmark This requires that you know how to run the opera

Simon Zengerling 2 Mar 2, 2022
Advanced benchmarks for +15 Go ORMs.

Go ORM Benchmarks Advanced benchmarks for +10 Go ORMs. Originally forked from orm-benchmark. ORMs All package run in no-cache mode. beego/orm bun gorm

M. Efe Çetin 145 Jun 17, 2022
This is the repository for the LinkedIn Learning course Learning Go.

Learning Go This is the repository for the LinkedIn Learning course Learning Go. The full course is available from LinkedIn Learning. What is Go? Go i

Zhenguan Tang 0 Nov 2, 2021