sqlbench measures and compares the execution time of one or more SQL queries.

Overview

sqlbench

sqlbench measures and compares the execution time of one or more SQL queries.

screen recording

The main use case is benchmarking simple CPU-bound query variants against each other during local development.

Only PostgreSQL is supported at this point, but pull requests for MySQL or other databases are welcome.

Install

You can download a binary from the release page.

If you have Go 1.13 or later installed, you can install or update sqlbench from source:

$ go get -u github.com/felixge/sqlbench

Windows Users: You may also install the chocolatey package maintained by picolino:

$ choco install sqlbench

Examples

Below are a few one-liners to get you started. They assume you're running sqlbench from the directory of a clone of this repo.

# Benchmark a few queries until ctrl+c is hit. Output results in realtime.
sqlbench examples/sum/*.sql

# Benchmark using client wallclock time (instead of explain) until ctrl+c.
sqlbench -m client examples/sum/*.sql

# Run for 3 seconds and only print results once at the end.
sqlbench -t 3 -s examples/sum/*.sql

# Run for 1000 iterations and only print verbose results once at the end
sqlbench -n 1000 -s -v examples/sum/*.sql

# Record the results for 1000 iterations into a csv file.
sqlbench -n 1000 -o baseline.csv examples/sum/*.sql

# Compare 1000 iterations to a baseline recording.
sqlbench -n 1000 -i baseline.csv examples/sum/*.sql

Usage

Usage of sqlbench:
  -c string
    	Connection URL or DSN for connecting to PostgreSQL as understood by pgx [1].
    	E.g.: postgres://user:[email protected]:5432/my_db?sslmode=disable
    	
    	Alternatively you can use standard PostgreSQL environment variables [2] such as
    	PGHOST, PGPORT, PGPASSWORD, ... .
    	
    	[1] https://pkg.go.dev/github.com/jackc/pgx/v4/stdlib?tab=doc
    	[2] https://www.postgresql.org/docs/current/libpq-envars.html
    	(default "postgres://")
  -i string
    	Input path for CSV file with baseline measurements.
  -m string
    	Method for measuring the query time. One of: "client", "explain" (default "explain")
  -n int
    	Terminate after the given number of iterations. (default -1)
  -o string
    	Output path for writing individual measurements in CSV format.
  -p	Include the query planning time. For -m explain this is accomplished by adding
    	the "Planning Time" to the measurement. For -m client this is done by not using
    	prepared statements.
  -s	Silent mode for non-interactive use, only prints stats once after terminating.
  -t float
    	Terminate after the given number of seconds. (default -1)
  -v	Verbose output. Print the content of all SQL queries, as well as the
    	PostgreSQL version.
  -version
    	Print version and exit.

How It Works

sqlbench takes a list of SQL files and keeps executing them sequentially, measuring their execution times. By default the execution time is measured by prefixing the query with EXPLAIN (ANALYZE, TIMING OFF) and capturing the total Execution Time for it.

The query columns are ordered by mean execution time in ascending order, and the relative difference compared to the fastest query is shown in parentheses. If you provide a baseline csv via -i, the relative differences are comparing the corresponding queries in the baseline rather than the current queries with each other.

If the -m client flag is given, the time is measured using the wallclock time of sqlbench which includes network overhead.

Planning time is excluded by default, but can be included using the -p flag.

The filenames init.sql and destroy.sql are special, and are executed once before and after the benchmark respectively. They can be used to setup or teardown tables, indexes, etc..

Tutorial

Let's say you want to compare three different queries for computing the running total of all numbers from 1 to 1000. Your first idea is to use a window function:

SELECT i, sum(i) OVER (ORDER BY i) AS sum
FROM generate_series(1, 1000) g(i);

Then you decide to get fancy and implement it as a recursive CTE:

WITH RECURSIVE sums AS (
	SELECT 1 AS i, 1 AS sum
	UNION
	SELECT i+1, sum+i FROM sums WHERE i <= 1000
)

SELECT * FROM sums;

And finally you become wise and remember that 9 year old Gauss could probably beat both approaches:

SELECT i, (i * (i + 1)) / 2 AS sum
FROM generate_series(1, 1000) g(i);

Now that you have your queries in window.sql, recursive.sql, gauss.sql, you want to summarize the performance differences for your colleagues. However, you know they're a pedantic bunch, and will ask you annoying questions such as:

  • How many times did you run each query?
  • Were you running other stuff on your laptop in the background?
  • How can I reproduce this on my local machine?
  • What version of PostgreSQL were you running on your local machine?
  • Are you sure you're not just measuring the overhead of EXPLAIN ANALYZE?

This could normally be quite annoying to deal with, but luckily there is sqlbench. The command below lets you run your three queries 1000 times with EXPLAIN ANALYZE and report the statistics, the PostgreSQL version and even the SQL of your queries:

$ sqlbench -v -s -n 1000 examples/sum/*.sql | tee explain-bench.txt
         | gauss |    window     |   recursive    
---------+-------+---------------+----------------
  n      |  1000 |          1000 |          1000  
  min    |  0.35 | 1.31 (3.79x)  | 1.80 (5.22x)   
  max    |  4.18 | 23.76 (5.68x) | 11.41 (2.73x)  
  mean   |  0.50 | 1.94 (3.85x)  | 2.67 (5.30x)   
  stddev |  0.16 | 0.81 (4.93x)  | 0.63 (3.87x)   
  median |  0.53 | 2.02 (3.80x)  | 2.91 (5.49x)   
  p90    |  0.67 | 2.53 (3.80x)  | 3.41 (5.12x)   
  p95    |  0.68 | 2.57 (3.81x)  | 3.50 (5.18x)   

Stopping after 1000 iterations as requested.

postgres version: PostgreSQL 11.6 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
sqlbench -v -s -n 1000 examples/sum/gauss.sql examples/sum/recursive.sql examples/sum/window.sql

==> examples/sum/gauss.sql <==
SELECT i, (i * (i + 1)) / 2 AS sum
FROM generate_series(1, 1000) g(i);

==> examples/sum/window.sql <==
SELECT i, sum(i) OVER (ORDER BY i) AS sum
FROM generate_series(1, 1000) g(i);

==> examples/sum/recursive.sql <==
WITH RECURSIVE sums AS (
	SELECT 1 AS i, 1 AS sum
	UNION
	SELECT i+1, sum+i FROM sums WHERE i <= 1000
)

SELECT * FROM sums;

And finally, you can use the -m client flag to measure the query times without EXPLAIN ANALYZE to see if that had a significant overhead:

$ sqlbench -s -n 1000 -m client examples/sum/*.sql | tee client-bench.txt
         | gauss |    window    |  recursive    
---------+-------+--------------+---------------
  n      |  1000 |         1000 |         1000  
  min    |  0.66 | 1.44 (2.18x) | 2.03 (3.08x)  
  max    |  5.66 | 7.31 (1.29x) | 4.34 (0.77x)  
  mean   |  0.83 | 1.72 (2.08x) | 2.35 (2.83x)  
  stddev |  0.23 | 0.33 (1.41x) | 0.27 (1.18x)  
  median |  0.78 | 1.65 (2.11x) | 2.26 (2.89x)  
  p90    |  0.98 | 1.98 (2.03x) | 2.68 (2.75x)  
  p95    |  1.05 | 2.13 (2.03x) | 2.89 (2.76x)  

Stopping after 1000 iterations as requested.

Indeed, it appears that from the client's perspective the gauss query is a bit slower, while the others are a bit faster when measuring without EXPLAIN ANALYZE. Whether that's a rabbit hole worth exploring depends on you, but either way you now have a much better sense of the errors that might be contained in your measurements.

Todos

Below are a few ideas for todos that I might implement at some point or would welcome as pull requests.

  • Dynamically adjust unit between ms, s, etc.
  • Support specifying benchmarks using a single YAML file.
  • Support for other databases, e.g. MySQL.
  • Capture query plans for each query, ideally one close to the median execution time.
  • Provide an easy way to capture all inputs and outputs in a single tar.gz file or GitHub gist.
  • Plot query times as a histogram (made a proof of concept for this, but didn't like it enough yet to release)
  • Maybe add db name to verbose output, see request.
  • Compare benchmark results between PG versions
  • Oneliner examples for README
  • Warmup phase (can be done via init.sql and pg_prewarm()
  • Use TIMING OFF to reduce EXPLAIN overhead.
  • A flag to include planning time in -m explain mode.
  • A flag to use prepared queries in -m client mode.

License

sqlbench is licensed under the MIT license.

Issues
  • Support installation via Chocolatey

    Support installation via Chocolatey

    Chocolatey - is a nice windows package manager, so it will be cool if that software can be distributed by it.

    I have the experience to distribute packages through chocolatey, so if you do not mind with that - I willing to create and maintain that package in future

    opened by picolino 3
  • Some chores: CI and test fixes

    Some chores: CI and test fixes

    Hi Felix,

    Big fan of your work. I was about to raise an issue asking about adding support for another SQL engine, but I noticed that there wasn't yet any CI to validate that I wasn't breaking any tests - so I thought I'd add that. I also added goreleaser to publish binaries whenever you tag a release - I've included a screenshot of what it generates. Feel free to do with this PR whatever you want 😄

    screenshot of goreleaser-generated release notes

    opened by aidansteele 2
  • type reflect.Value has no field or method IsZero

    type reflect.Value has no field or method IsZero

    Hi, when i try to install sqlbench it fails with the following message. I'm with debian in WSL $uname -a Linux krilin 4.4.0-18362-Microsoft #1049-Microsoft Thu Aug 14 12:01:00 PST 2020 x86_64 GNU/Linux and this version of go $go version go version go1.11.6 linux/amd64

    $go get -u github.com/felixge/sqlbench
    # github.com/jackc/pgtype
    /home/julien/go/src/github.com/jackc/pgtype/aclitem_array.go:90:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/bool_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/bpchar_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/bytea_array.go:73:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/cidr_array.go:112:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/date_array.go:93:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/enum_array.go:90:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/float4_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/float8_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/hstore_array.go:73:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
    /home/julien/go/src/github.com/jackc/pgtype/hstore_array.go:73:49: too many errors
    
    opened by julienbalas 1
  • Baseline

    Baseline

    Support benchmarking against a recorded baseline. Useful when comparing different PostgreSQL versions, server configurations, indexes, etc.

    Hopefully this will help with the use cases suggested by @anarazel on twitter: https://twitter.com/AndresFreundTec/status/1308146736917962752

    # realtime output, and record the results for 1000 iterations into a csv file
    sqlbench -n 1000 -o baseline.csv examples/sum/*.sql
    # realtime output, and compare 1000 iterations to a baseline recording
    sqlbench -n 1000 -i baseline.csv examples/sum/*.sql
    

    Output from second run against the baseline (same PG version and settings, so no big diffs expected)

             |    gauss     |    window    |  recursive    
    ---------+--------------+--------------+---------------
      n      | 1000 (1.00x) | 1000 (1.00x) | 1000 (1.00x)  
      min    | 0.35 (0.99x) | 1.24 (0.98x) | 1.81 (0.98x)  
      max    | 0.80 (1.00x) | 2.20 (0.91x) | 3.26 (0.99x)  
      mean   | 0.39 (1.00x) | 1.36 (0.99x) | 2.00 (0.99x)  
      stddev | 0.04 (0.76x) | 0.09 (0.76x) | 0.14 (0.86x)  
      median | 0.37 (1.01x) | 1.32 (0.99x) | 1.95 (0.99x)  
      p90    | 0.42 (0.96x) | 1.47 (0.98x) | 2.17 (0.98x)  
      p95    | 0.45 (0.95x) | 1.54 (0.96x) | 2.28 (0.97x) 
    
    opened by felixge 1
  • add p-value

    add p-value

    Similar stat tools should have a p-value in output. It helps us to understand is difference is real or we are lucky.

    Look at https://godoc.org/golang.org/x/perf/cmd/benchstat for example.

    opened by kulti 0
  • How to approach supporting alternative drivers

    How to approach supporting alternative drivers

    In the README (and I think I remember on Twitter) you mentioned being open to PRs for alternative SQL implementations. At $dayjob we use MS SQL Server and so I thought I might try my hand at a PR to add support for that. Before getting started, I thought it would be useful to see if you had any thoughts / opinions on how this should be done.

    Some current code to think about

    There are at least a few things worth thinking about. The first is how we would specify the driver name, e.g. currently "pgx" here: https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/main.go#L77

    And how it relates to the connection string here: https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/main.go#L35

    There's also the EXPLAIN .. FORMAT JSON duration thing which I assume has equivalents in SQL Server and others, but to be honest right now I don't care so much about that, I think it would be fine to just have client-measurement only support at least to begin with!

    https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/query_duration.go#L76

    Some options

    I guess the "easy" option is to add another command-line flag where the user can specify a driver name if they don't want pgx. This is probably good enough for MVP - anyone using this tool at this stage is probably happy to get their hands dirty!

    Another option could be to deduce the driver name based on the schema in the connection string URL. I'm used to this from my days of Rails, but come to think of it, I've never seen it at $dayjob for MS SQL Server.. so maybe it wouldn't be obvious, and we'd have to document what schemas are available 🤔

    Any thoughts? I'm inclined to just go with the easy option and not over-engineer this thing and lock this project into unhelpful complexity on day 2 😂

    opened by aidansteele 1
Releases(v1.1.0)
Owner
Felix Geisendörfer
Felix Geisendörfer
Run SQL queries against JSON, CSV, Excel, Parquet, and more.

Run SQL queries against JSON, CSV, Excel, Parquet, and more This is a CLI companion to DataStation (a GUI) for running SQL queries against data files.

Multiprocess Labs LLC 2.6k Aug 4, 2022
Parses a file and associate SQL queries to a map. Useful for separating SQL from code logic

goyesql This package is based on nleof/goyesql but is not compatible with it any more. This package introduces support for arbitrary tag types and cha

null 0 Oct 20, 2021
write APIs using direct SQL queries with no hassle, let's rethink about SQL

SQLer SQL-er is a tiny portable server enables you to write APIs using SQL query to be executed when anyone hits it, also it enables you to define val

Mohammed Al Ashaal 2k Aug 1, 2022
A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

query2metric A tool to run db queries in defined frequency and expose the count as prometheus metrics. Why ? Product metrics play an important role in

S Santhosh Nagaraj 19 Jul 1, 2022
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.

trdsql CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. It is a tool like q, textql and others. The difference from these tools is t

Noboru Saito 1.2k Aug 2, 2022
Querycrate - A simple library that loads and keeps SQL queries from files

QueryCrate A simple library for loading & getting string queries from files. How

A Really Fake Floordiv Since 4th January, 2022 1 Feb 15, 2022
Mergestat - a command-line tool for running SQL queries on git repositories and related data sources

Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. ?? ??

MergeStat 3.1k Aug 3, 2022
pggen - generate type safe Go methods from Postgres SQL queries

pggen - generate type safe Go methods from Postgres SQL queries pggen is a tool that generates Go code to provide a typesafe wrapper around Postgres q

Joe Schafer 176 Aug 7, 2022
SQL API is designed to be able to run queries on databases without any configuration by simple HTTP call.

SQL API SQL API is designed to be able to run queries on databases without any configuration by simple HTTP call. The request contains the DB credenti

Çiçeksepeti Tech 24 Jun 21, 2022
Get data from .csv files use SQL-like queries.

csvql Get data from .csv files use SQL-like queries. Задание Необходимо написать консольную программу, которая по заданному клиентом запросу осуществл

Mikhail Belov 0 Dec 7, 2021
Opionated sql formatter for use with .go files containing backticked queries

fumpt-the-sql Opionated sql formatter for use with .go files containing backticked queries. Uses https://sqlformat.darold.net/ for the actual sql form

Geoff Cowan 1 Dec 10, 2021
Use SQL to instantly query users, groups, applications and more from Okta. Open source CLI. No DB required.

Okta Plugin for Steampipe Use SQL to query infrastructure including users, groups, applications and more from Okta. Get started → Documentation: Table

Turbot 4 Dec 15, 2021
Use SQL to instantly query instances, networks, databases, and more from Scaleway. Open source CLI. No DB required.

Scaleway Plugin for Steampipe Use SQL to query infrastructure servers, networks, databases and more from your Scaleway project. Get started → Document

Turbot 6 Jul 16, 2022
Use SQL to query information including Users, Groups, Clients, Roles and more from Keycloak.

Keycloak Plugin for Steampipe [WIP] THIS IS NOT ACTIVE NOR WORKING YET - DO NOT USE Use SQL to query information including Users, Groups, Clients, Rol

APS Group 2 Jul 1, 2022
Use SQL to query instances, domains and more from Prometheus.

Use SQL to instantly query Prometheus metrics, alerts, labels and more. Open source CLI. No DB required.

Turbot 15 Jul 31, 2022
Use SQL to instantly query resources, data sources and more from Terraform code. Open source CLI. No DB required.

Terraform Plugin for Steampipe Use SQL to query data from Terraform configuration files. Get started → Documentation: Table definitions & examples Com

Turbot 11 Jul 8, 2022
Use SQL to query databases, logs and more from PlanetScale

Use SQL to instantly query PlanetScale databases, branches and more. Open source CLI. No DB required.

Turbot 1 Feb 13, 2022
Go-sql-reader - Go utility to read the externalised sql with predefined tags

go-sql-reader go utility to read the externalised sql with predefined tags Usage

null 0 Jan 25, 2022
This Service collects slow queries and returns them in response.

pgsql-api-go This Service collects slow queries and returns them in response. Status This service is the very first version of the project. App is up

ngh 0 Dec 30, 2021