Easy JSON Query Processor with a Lispy syntax in Go

Overview

jql

Hey there!

You're probably here cause you're fed up with other json query processors being too complicated to use for anything surpassing simple single field selection.

Well, at least that's what led me here. And that's why I've written jql, a json query processor with an even more cryptic - lispy - syntax (Maybe not that cryptic after all? I quite like it :) )

jql aims to be friendly, helpful, pat you on the back when you're faced with a monstrosity of a JSON blob. Help you mold it into something useful, step by step.

If you want to see benchmarks, they're at the end of the README.

If you want to see a cheatsheet with function types, it's right before the benchmarks.

Ok, let's check it out now, but first things first, you have to install it:

go get github.com/cube2222/jql

Ok. Done.

If you don't have the Go toolchain installed, just download one of the release binaries.

If you want an interactive window to get live output for queries, use:

echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'

Let's check out a few simple examples. (remember? That's explicitly not why we're here. But it aids understanding of the more complex examples, so stay with me just a little bit longer!)

We'll be working with this piece of json:

{ 
  "count": 3,
  "countries": [
    {
      "name": "Poland",
      "population": 38000000,
      "european": true,
      "eu_since": "2004"
    },
    {
      "name": "United States",
      "population": 327000000,
      "european": false
    },
    {
      "name": "Germany",
      "population": 83000000,
      "european": true,
      "eu_since": "1993"
    }
  ]
}

To start with, let's get the countries array only.

> cat test.json | jql '(elem "countries")'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "european": false,
    "name": "United States",
    "population": 327000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Whoa, whoa, whoa! What's up with the parentheses?!

Remember what I said before? Lispy syntax. In short, whenever you see (add 1 2), it's basically the same as add(1, 2). I like that syntax for big hierarchical expressions, if you really really really don't like it, then you can probably stop right here, though you'll be losing out!

I've warned you.

What if we wanted only the first country from that list?

> cat test.json | jql '(elem "countries" (elem 0))'
{
  "eu_since": "2004",
  "european": true,
  "name": "Poland",
  "population": 38000000
}

Let's break down what happened here. First we took the "countries" field. elem takes an additional argument, it says "how to transform the element" and is also an expression. Here we say we want to take the first element of the countries array. The default function is id, which stands for identity.

array

We can also pass an array of positions to elem, to get more than one country:

> cat test.json | jql '(elem "countries" (elem (array 0 2)))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

elem can work with single strings, single integers, arrays of those, and objects with them as values (but we won't cover those now to keep things simple).

keys

What if we want to get all the country names? A new friend - keys - is key in this situation. 🗝

cat test.json | jql '(elem "countries" (elem (keys) (elem "name")))'
[
  "Poland",
  "United States",
  "Germany"
]

It returns an array of all the keys of the given collection. Fields and Indices for Objects and Arrays respectively.

To illustrate, here's keys used on an object:

> cat test.json | jql '(elem "countries" (elem 0 (keys)))'
[
  "name",
  "population",
  "european",
  "eu_since"
]

Attention

Now we have to understand a very important mechanism underlying jql. All functions operate in the context of the JSON we're operating on.

Some functions, like elem, will cut down the context for expressions it evaluates. The first argument - which should evaluate to the positions we need - gets evaluated in the context of the entire array, that's why keys returns all the indices. The second one on the other hand, operates in the context of a single element.

In theory we're really just creating and composing a big function - pipeline, so to say - which gets applied to our JSON blob.

This may sound complicated, but I find it becomes intuitive quite quickly.


You can see that elem is the most used function, and in fact it's what you'll usually be using when munging data, so there's a shortcut. If you put a value in function name position, it implicitly converts it to an elem.

This way we can rewrite the previous query to be much shorter, and better match the shape of the data.

> cat test.json | jql '("countries" ((keys) ("name")))'
[
  "Poland",
  "United States",
  "Germany"
]

Little showcase

Remember when I said you can use integers, strings, arrays and objects as positions?

> cat test.json | jql '("countries" ((array (array 0 (array 0 (array 0 (array 0 2)))) 1 (object "key1" 1 "key2" (array 0 (object "key1" 1 "key2" (array 0 2))))) ("population")))'
[
  [
    38000000,
    [
      38000000,
      [
        38000000,
        [
          38000000,
          83000000
        ]
      ]
    ]
  ],
  327000000,
  {
    "key1": 327000000,
    "key2": [
      38000000,
      {
        "key1": 327000000,
        "key2": [
          38000000,
          83000000
        ]
      }
    ]
  }
]

🔥 🔥 🔥

Don't do this.


range

We can also select a range of elements, using the... you guessed it - range function.

> cat test.json | jql '("countries" ((range 1 3) ("name")))'
[
  "United States",
  "Germany"
]

You can use the array function in value position too obviously. If you want a list of name-population tuples you can just

> cat test.json | jql '("countries" ((keys) (array ("name") ("population"))))'
[
  [
    "Poland",
    38000000
  ],
  [
    "United States",
    327000000
  ],
  [
    "Germany",
    83000000
  ]
]

Here you can see that array passes the whole context given to it to each of its arguments. (Reminder: We're using "name" and "population" as elem shortcuts here.)

Most functions work like this. Only elem is the "context-cutdowner", so to say.

object

You can also use object to create objects, with arguments alternating keys and values.

> cat test.json | jql '(object
                            "names" ("countries" ((keys) ("name")))
                            "populations" ("countries" ((array 0 0 1) ("population"))))'
{
  "names": [
    "Poland",
    "United States",
    "Germany"
  ],
  "populations": [
    38000000,
    38000000,
    327000000
  ]
}

Now we're done with the core functionality of jql. The stuff so far will probably suffice for most use cases and even very complex data structures.

However, here come more functions:

String manipulation 🎻

join

If you ever need to join an array of expressions into a string, join's the mate you're looking for! join will also stringify anything it meets. Without separator:

> cat test.json | jql '("countries" ((keys) (join (array ("name") ("population") ("european")))))'
[
  "Poland3.8e+07true",
  "United States3.27e+08false",
  "Germany8.3e+07true"
]

With separator:

> cat test.json | jql '("countries" ((keys) (join (array ("name") ("population") ("european")) ", ")))'
[
  "Poland, 3.8e+07, true",
  "United States, 3.27e+08, false",
  "Germany, 8.3e+07, true"
]

sprintf

Whenever I learn a new language, I feel much more comfortable when I know there's a sprintf function. (and how to use it)

Anyways, here you go, the syntax is the same as that of the go standard library fmt.Sprintf function:

> cat test.json | jql '("countries" ((keys) (sprintf "%s population: %.0f" ("name") ("population"))))'
[
  "Poland population: 38000000",
  "United States population: 327000000",
  "Germany population: 83000000"
]

Hope you're feeling comfortable 🛋 now :)

error

There's a little helper function - error - for those times when you're debugging your queries.

It's an expression which errors on evaluation and gives you a stack trace. It can also evaluate and print any expression you want in it's context.

> cat test.json | jql '("countries" ((keys) (sprintf "%s population: %.0f" ("name") (error "test message"))))'
2019/12/26 00:17:04 error getting expression value for object: couldn't get transformed value for field countries with value [map[eu_since:2004 european:true name:Poland population:3.8e+07] map[european:false name:United States population:3.27e+08] map[eu_since:1993 european:true name:Germany population:8.3e+07]]: couldn't get element using position at array index 0: couldn't get transformed value for index 0 with value map[eu_since:2004 european:true name:Poland population:3.8e+07]: couldn't evaluate sprintf argument with index 1: Message: test message
goroutine 1 [running]:
runtime/debug.Stack(0xc0000723f0, 0x1114e00, 0xc0000744e0)
        /usr/local/Cellar/go/1.13.4/libexec/src/runtime/debug/stack.go:24 +0x9d
github.com/cube2222/jql/jql.Error.Get(0x1164680, 0xc0000723f0, 0x1114e00, 0xc0000744e0, 0x110c5c0, 0xc000072440, 0x0, 0x0)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:715 +0x102
github.com/cube2222/jql/jql.Sprintf.Get(0x1164680, 0xc0000723b0, 0xc000074460, 0x2, 0x2, 0x1114e00, 0xc0000744e0, 0x100b136, 0xc000072490, 0x10, ...)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:299 +0x16a
github.com/cube2222/jql/jql.GetElement(0x110bb80, 0x12479e0, 0x1109ec0, 0xc00008a1e0, 0x1164a20, 0xc000074480, 0x122c560, 0x128b6d0, 0x0, 0x100ba08)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:101 +0xa96
github.com/cube2222/jql/jql.GetElement(0x1109ec0, 0xc00008a220, 0x1109ec0, 0xc00008a1e0, 0x1164a20, 0xc000074480, 0x0, 0x15fffff, 0xc0000afba0, 0x194)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:71 +0x46d
github.com/cube2222/jql/jql.Element.Get(0x1164980, 0x12470c8, 0x1164a20, 0xc000074480, 0x1109ec0, 0xc00008a1e0, 0xc0000c6018, 0x1ee, 0xc0000c6000, 0x103301c)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:135 +0x12f
github.com/cube2222/jql/jql.GetElement(0x110c5c0, 0xc0000722b0, 0x1114e00, 0xc0000744b0, 0x11648c0, 0xc00008a160, 0x0, 0x10, 0xc0000724b0, 0xa)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:118 +0x862
github.com/cube2222/jql/jql.Element.Get(0x1164680, 0xc0000723a0, 0x11648c0, 0xc00008a160, 0x1114e00, 0xc0000744b0, 0x10, 0x1114860, 0x1, 0xc0000724b0)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:135 +0x12f
main.main()
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/main.go:39 +0x2b4

Logic

Yey, back to the basics!

You've got eq, lt and gt, all working as you'd probably expect:

> cat test.json | jql '(eq "test" "test")'
true
> cat test.json | jql '(eq "test" "test2")'
false
> cat test.json | jql '(lt "a" "b")'
true
> cat test.json | jql '(lt "b" "a")'
false
> cat test.json | jql '(gt 5 4)'
true

In case you're wondering, eq does a reflect.DeepEqual on both arguments.

You've also got and, or, not, to cover your back when tackling those primal and primitive (some would say fundamental) problems you may encounter:

> cat test.json | jql '(and true true true)'
true
> cat test.json | jql '(and true true false)'
false
> cat test.json | jql '(and true true null)'
false
> cat test.json | jql '(or true true false)'
true
> cat test.json | jql '(or)'
false
> cat test.json | jql '(and)'
true
> cat test.json | jql '(not true)'
false
> cat test.json | jql '(not false)'
true
> cat test.json | jql '(not null)'
true
> cat test.json | jql '(not (array false))'
false

and and or are both lazy.

Truthiness

This brings us to the topic of truthiness. What does and consider to be "true"? Well, it's quite simple actually.

  • null is not truthy.
  • false is not truthy.
  • anything else is truthy.

ifte

ifte sounds kinda fluffy, but unfortunately it only stands for If Then Else.

> cat test.json | jql '(ifte true "true" "false")'
"true"

It's lazy too. If it weren't, this would error:

l> cat test.json | jql '(ifte true "true" (error ":("))'
"true"

Fluffy and lazy. Like a cat. Who doesn't like cats? Who doesn't like ifte? 🐈

filter 🍰

Sometimes you want just part of the cake, the part with no <insert disliked fruit here>.

I've got no data set on cakes though, so let's get back to our beloved countries:

> cat test.json | jql '("countries" (filter (gt ("population") 50000000)))'
[
  {
    "european": false,
    "name": "United States",
    "population": 327000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Also, because null is not truthy, and elem returns false if it encounters missing fields or indices, you can use filter to get rid of wrong-schema data:

> cat test.json | jql '("countries" (filter ("eu_since")))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

pipe

pipe is a fairly useless function because you can just use a bash pipe. But if for some reason you want to save cpu cycles:

> cat test.json | jql '(pipe
                           ("countries")
                           ((range 2))
                           ((keys) ("name")))'
[
  "Poland",
  "United States"
]

is equal to

> cat test.json | jql '("countries")' | jql '((range 2))' | jql '((keys) ("name"))'
[
  "Poland",
  "United States"
]

recover

Finally, the one whose name shall not be spoken out loud. 👹 Needing him means you either encountered a bug in jql, or that your dataset is seriously botched.

He'll help you catch errors and panics, nullyfing the expression, leaving only void behind.

> cat test.json | jql '("countries" ((keys) (rec*** (ifte ("european") (id) (error "not european")))))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  null,
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Combine him with filter and even the void will be gone!

In practice you obviously have to spell out his name, otherwise it won't work, but that's on you!

Summary

Hope you enjoyed this incredible journey!

Moreover, I hope it's not the end of it! Hordes of JSON blobs still await and I hope jql will become your weapon of choice for dealing with them from now on! ⚔️

Issues, ⭐️ stars ⭐️ , comments, messages, reviews, benchmarks, you name it! - all are very appreciated! 😉

Type Cheatsheet

JSON: Any value
Expression[T]: (JSON -> T)
Expression can be seen as a Continuation
elem:
    With one arg: (Expression[Position]) -> (Expression[JSON]) = (elem position (id))
    With two args: (Expression[Position] x Expression[T]) -> (Expression[T])
keys: () -> (Expression[JSON])
id: () -> (Expression[JSON])
array: (Expression[T]...) -> (Expression[Array[T]]) (T's can vary)
object: ((Expression x Expression[T])...) -> (Expression[T])
pipe: (Expression...) -> (Expression)
sprintf: (Expression[String] x Expression[T]...) -> (Expression[String])
join: (Expression[T]...) -> (Expression[String])
filter: (Expression[Bool]) -> (Expression[JSON])
eq,lt,gt: (Expression x Expression) -> (Expression[Bool])
range: 
    With one arg: (Expression[Int]) -> (Expression[Array[Int]])
    With two args: (Expression[Int] x Expression[Int]) -> (Expression[Array[Int]])
and,or: (Expression[Bool]...) -> (Expression[Bool])
not: (Expression[Bool]) -> (Expression[Bool])
ifte: (Expression[Bool] x Expression[A] x Expression[B]) -> (Expression[A|B])
error: (Expression[JSON]) -> (!)
recover: (Expression[JSON]) -> (Expression[JSON])

Benchmarks

This is a fairly synthetic benchmark (it's the first that came to my mind), so I won't say jql is faster than jq. Especially since jq has various much more advanced functionalities.

What I will say though, is that jql is definitely not slow and you can freely cut your way through gigabytes of data quickly, as in the following benchmarks it was 2-3 times faster than jq.

The benchbig.json file contains 2^20 repetitions of the json we've been using in the examples so far, and the benchmedium.json file contains 20.

Command Mean [s] Min [s] Max [s] Relative
cat benchbig.json | jql '("countries" ((keys) ("name")))' >> out.json 5.923 ± 0.034 5.890 6.003 797.90 ± 52.78
cat benchbig.json | jq '.countries[].name' >> out.json 14.960 ± 0.047 14.906 15.047 2015.24 ± 132.94
cat benchmedium.json | jql '("countries" ((keys) ("name")))' >> out.json 0.007 ± 0.000 0.007 0.010 1.00
cat benchmedium.json | jq '.countries[].name' >> out.json 0.024 ± 0.001 0.022 0.032 3.23 ± 0.26

The benchmarks were run using hyperfine on a preheated (very loud) Macbook Pro 13 mid-2019 i7 2.8GHz 16GB 256GB.

You can generate the benchmark data with benchmarks/gen.sh.

Special Thanks

A big thank you to the following people for reviewing the current or previous versions of this document:

  • Jan Chomiak ( @JasiekChomiak )
  • Hugo Dutka ( @hugodutka )
  • Katarzyna Gajos
  • Andrzej Głuszak ( @agluszak )
  • Wojciech Kuźmiński ( @woojtek )
You might also like...
Simple query builder for MongoDB

🌱 greenleaf - simple, type safe and easy to use query builder for MongoDB Installation To install use: go get github.com/slavabobik/greenleaf Quick

gosq is a parsing engine for a simplicity-focused, template-based SQL query builder for Go.

gosq is a parsing engine for a simplicity-focused, template-based SQL query builder for Go.

SQL query helper

SQL query helper

Bluge, will this document match this query?

sour Will this bluge.Document match this bluge.Query? This library allows you to efficiently answer this question. s := sour.New(bluge.InMemoryOnlyCo

Tag based url Query parameters Constructor.

taqc 🚕 Tag based url Query parameters Constructor. (This is pronounced as same as "taxi") Synopsis type Query struct { Foo string `ta

qclean lets you to clean up search query in japanese.

qclean qclean lets you to clean up search query in japanese. This is mainly used to remove wasted space. Quick Start package main var cleaner *qclean

Querydecoder - Optional query parameter decoder for Golang

Optional query parameter decoder for Golang Example import ( "github.com/ritwic

Write your SQL queries in raw files with all benefits of modern IDEs, use them in an easy way inside your application with all the profit of compile time constants

About qry is a general purpose library for storing your raw database queries in .sql files with all benefits of modern IDEs, instead of strings and co

An easy-use SQL builder.

EQL An easy-use SQL builder. Design We are not English native speaker, so we use Chinese to write the design documents. We plan to translate them to E

Comments
  • go get fails with official golang docker image

    go get fails with official golang docker image

    docker run --rm golang bash -c 'go version && go get -v github.com/cube2222/jql'
    go version go1.13.5 linux/amd64
    
    ... // lots of things
    
    github.com/alecthomas/chroma/lexers
    github.com/tj/go-termd
    # github.com/tj/go-termd
    src/github.com/tj/go-termd/termd.go:21:11: *blackfriday.Markdown is not a type
    
    opened by avishayp 2
  • Remove useless outer parens

    Remove useless outer parens

    Consider removing the redundant outer parens since all your examples have it. Why can't it just be imputed? Why do I have to remember to type two extra characters that convey zero extra information?

    opened by impredicative 4
Releases(v0.2.0)
Owner
Jacob Martin
LinkedIn: www.linkedin.com/in/jakubmartin
Jacob Martin
Go database query builder library for PostgreSQL

buildsqlx Go Database query builder library Installation Selects, Ordering, Limit & Offset GroupBy / Having Where, AndWhere, OrWhere clauses WhereIn /

Arthur Kushman 88 Dec 23, 2022
SQL builder and query library for golang

__ _ ___ __ _ _ _ / _` |/ _ \ / _` | | | | | (_| | (_) | (_| | |_| | \__, |\___/ \__, |\__,_| |___/ |_| goqu is an expressive SQL bu

Doug Martin 1.8k Dec 30, 2022
SQL query builder for Go

GoSQL Query builder with some handy utility functions. Documentation For full documentation see the pkg.go.dev or GitBook. Examples // Open database a

Travis Harmon 27 Dec 12, 2022
Type safe SQL builder with code generation and automatic query result data mapping

Jet Jet is a complete solution for efficient and high performance database access, consisting of type-safe SQL builder with code generation and automa

null 940 Jan 6, 2023
A Go (golang) package that enhances the standard database/sql package by providing powerful data retrieval methods as well as DB-agnostic query building capabilities.

ozzo-dbx Summary Description Requirements Installation Supported Databases Getting Started Connecting to Database Executing Queries Binding Parameters

Ozzo Framework 589 Dec 31, 2022
Type safe SQL query builder and struct mapper for Go

sq (Structured Query) ?? ?? sq is a code-generated, type safe query builder and struct mapper for Go. ?? ?? Documentation • Reference • Examples This

null 185 Dec 19, 2022
Fast SQL query builder for Go

sqlf A fast SQL query builder for Go. sqlf statement builder provides a way to: Combine SQL statements from fragments of raw SQL and arguments that ma

Vlad Glushchuk 92 Dec 23, 2022
Query AWS Athena and download the result as CSV.

Overview This tool can download an Athena SQL query results in CSV format. Installation Using Homebrew: $ brew tap flowerinthenight/tap $ brew install

null 5 Nov 11, 2021
Bulk query SQLite database over the network

SQLiteQueryServer Bulk query SQLite database over the network. Way faster than SQLiteProxy!

Assaf Morami 48 May 20, 2022
Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. 🔍 📊

askgit askgit is a command-line tool for running SQL queries on git repositories. It's meant for ad-hoc querying of git repositories on disk through a

Augmentable 3.2k Jan 5, 2023