Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Overview

What is Miller?

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed.

What can Miller do for me?

With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.

cover-art

  • Miller operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map.

  • Miller handles a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON/JSON Lines. (Miller can handle positionally-indexed data too!)

In the above image you can see how Miller embraces the common themes of key-value-pair data in a variety of data formats.

Getting started

More documentation links

Installing

There's a good chance you can get Miller pre-built for your system:

Ubuntu Ubuntu 16.04 LTS Fedora Debian Gentoo

Pro-Linux Arch Linux

NetBSD FreeBSD

Anaconda Homebrew/MacOSX MacPorts/MacOSX Chocolatey

OS Installation command
Linux yum install miller
apt-get install miller
Mac brew install miller
port install miller
Windows choco install miller

See also README-versions.md for a full list of package versions. Note that long-term-support (LtS) releases will likely be on older versions.

See also building from source.

Community

GitHub stars Homebrew downloads Conda downloads

All Contributors

Build status

Multi-platform build status CodeQL status Codespell status

Building from source

  • With make:
    • To build: make. This takes just a few seconds and produces the Miller executable, which is ./mlr (or .\mlr.exe on Windows).
    • To run tests: make check.
    • To install: make install. This installs the executable /usr/local/bin/mlr and manual page /usr/local/share/man/man1/mlr.1 (so you can do man mlr).
    • You can do ./configure --prefix=/some/install/path before make install if you want to install somewhere other than /usr/local.
  • Without make:
    • To build: go build github.com/johnkerl/miller/cmd/mlr.
    • To run tests: go test github.com/johnkerl/miller/internal/pkg/... and mlr regtest.
    • To install: go install github.com/johnkerl/miller/cmd/mlr will install to GOPATH/bin/mlr.
  • See also the doc page on building from source.
  • For more developer information please see README-go-port.md.

License

License: BSD2

Features

  • Miller is multi-purpose: it's useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.

  • You can use Miller to snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.

  • Miller complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.

  • Miller complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.

  • Miller also goes beyond the classic Unix tools by stepping fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.

  • Miller is streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system’s available RAM, and you can use Miller in tail -f contexts.

  • Miller is pipe-friendly and interoperates with the Unix toolkit.

  • Miller's I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, TSV, JSON, JSON Lines, and others.

  • Miller does conversion between formats.

  • Miller's processing is format-aware: e.g. CSV sort and tac keep header lines first.

  • Miller has high-throughput performance on par with the Unix toolkit.

  • Miller is written in portable, modern Go, with zero runtime dependencies. You can download or compile a single binary, scp it to a faraway machine, and expect it to work.

What people are saying about Miller

Today I discovered Miller—it's like jq but for CSV: https://t.co/pn5Ni241KM

Also, "Miller complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data." @GreatBlueC @nfmcclure

— Adrien Trouillaud (@adrienjt) September 24, 2020

Underappreciated swiss-army command-line chainsaw.

"Miller is like awk, sed, cut, join, and sort for [...] CSV, TSV, and [...] JSON." https://t.co/TrQqSUK3KK

— Dirk Eddelbuettel (@eddelbuettel) February 28, 2017

Miller looks like a great command line tool for working with CSV data. Sed, awk, cut, join all rolled into one: http://t.co/9BBb6VCZ6Y

— Mike Loukides (@mikeloukides) August 16, 2015

Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV: http://t.co/1zPbfg6B2W - handy tool!

— Ilya Grigorik (@igrigorik) August 22, 2015

Btw, I think Miller is the best CLI tool to deal with CSV. I used to use this when I need to preprocess too big CSVs to load into R (now we have vroom, so such cases might be rare, though...)https://t.co/kUjrSSGJoT

— Hiroaki Yutani (@yutannihilat_en) April 21, 2020

Miller: a *format-aware* data munging tool By @__jo_ker__ to overcome limitations with *line-aware* workshorses like awk, sed et al https://t.co/LCyPkhYvt9

The project website is a fantastic example of good software documentation!!

— Donny Daniel (@dnnydnl) September 9, 2018

Holy holly data swiss army knife batman! How did no one suggest Miller https://t.co/JGQpmRAZLv for solving database cleaning / ETL issues to me before

Congrats to @__jo_ker__ for amazingly intuitive tool for critical data management tasks!#DataScienceandLaw #ComputationalLaw

— James Miller (@japanlawprof) June 12, 2018

🤯 @__jo_ker__'s Miller easily reads, transforms, + writes all sorts of tabular data. It's standalone, fast, and built for streaming data (operating on one line at a time, so you can work on files larger than memory).

And the docs are dream. I've been reading them all morning! https://t.co/Be2pGPZK6t

— Benjamin Wolfe (he/him) (@BenjaminWolfe) September 9, 2021

Contributors

Thanks to all the fine people who help make Miller better (emoji key):


Andrea Borruso

🤔 🎨

Shaun Jackman

🤔

Fred Trotter

🤔 🎨

komosa

🤔

jungle-boogie

🤔

Thomas Klausner

🚇

Stephen Kitt

📦

Leah Neukirchen

🤔

Luigi Baldoni

📦

Hiroaki Yutani

🤔

Daniel M. Drucker

🤔

Nikos Alexandris

🤔

kundeng

📦

Victor Sergienko

📦

Adrian Ho

🎨

zachp

📦

David Selassie

🤔

Joel Parker Henderson

🤔

Michel Ace

🤔

Matus Goljer

🤔

Richard Patel

📦

Jakub Podlaha

🎨

Miodrag Milić

📦

Derek Mahar

🤔

spmundi

🤔

Peter Körner

🛡️

rubyFeedback

🤔

rbolsius

📦

awildturtok

🤔

agguser

🤔

jganong

🤔

Fulvio Scapin

🤔

Jordan Torbiak

🤔

Andreas Weber

🤔

vapniks

📦

Zombo

📦

Brian Fulton-Howard

📦

ChCyrill

🤔

Jauder Ho

💻

Paweł Sacawa

🐛

schragge

📖

Jordi

📖 🤔

This project follows the all-contributors specification. Contributions of any kind are welcome!

Issues
  • Golang port / Miller 6 tracking issue

    Golang port / Miller 6 tracking issue

    Split out from https://github.com/johnkerl/miller/issues/369. See also https://github.com/johnkerl/miller/blob/master/go/README.md.

    Pre-release/rough-draft docs are at http://johnkerl.org/miller6.

    Things which may change:

    As noted in go/README.md, I want to preserve as much user experience as possible. That said:

    • --jvstack and --jsonx will still be supported as command-line flags, but JSON output will be pretty-printed (like --jvstack) by default.
    • --csvlite will still be different from --csv, as detailed below.
    • emitf and emitp were invented before I had for-loops in the DSL. If people really want to keep these and are using these, I can keep them; but maybe we're better off leaving them behind. Please let me know.
    • CR vs CR/LF (line-endings) will be platform-appropriate using Go's own portability -- Windows files will be written correctly on Windows, and likewise for Linux and MacOS. That said, I don't know if we need any longer to preserve CR/LF-to-CR/LF even on Linux (line endings which are non-standard for the platform) -- again, please let me know.
    • mlr put -S and mlr put -F will become unnecessary since string-conversion will be done just-in -time as suggested by @gromgit on https://github.com/johnkerl/miller/issues/151.

    Please include here any thoughts you have on the Go port.

    help wanted active go-port 
    opened by johnkerl 76
  • make failing

    make failing

    Hello,

    On a gnu/linux system, make is failing with a simple make:

    make: *** No targets specified and no makefile found.  Stop.
    
    cc --version
    gcc-4.8.real (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
    Copyright (C) 2013 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    Is there a new method to make mlr now?

    opened by jungle-boogie 54
  • Addition of a build system generator

    Addition of a build system generator

    opened by elfring 36
  • filter then put (regex)

    filter then put (regex)

    I'd like to do some kind of regex based parsing, like that:

    mlr filter '$FIELD =~ "([A-Z]+)([0-9]+)" ' then put '$F1  = "\1"; $F2 = "\2" '
    

    how can I do that? I've succeeded with sub() like this, but it's not optimal :

    mlr filter '$FIELD =~ "([A-Z]+)([0-9]+)" ' then put '$F1  = sub($FIELD, "([A-Z]+)([0-9]+)", '\1") '
    

    is there a shorter way?

    opened by gregfr 33
  • supporting double quotes

    supporting double quotes

    I love the idea of Miller. It is clearly a needed tool that is missing from the standard unix toolbox.

    However, you really cannot say you have a tool that is designed to support csv, without supporting csv.

    CSV is a standard file format, and has an RFC: https://tools.ietf.org/html/rfc4180

    Not supporting double quotes is the same thing as saying that you do not support csv, since double quotes are central to the way that the standard handles other characters... comma being just one example. Your tool is young enough that supporting the standard now will make later development much simpler. This will prevent the situation years from now where you have a 'normal mode' and a 'standards mode'. If you make the change now you can just have the one correct mode.

    You have an ambitious work-list, but I would suggest taking a pause and thinking about how you will support the RFC version of the file format.

    People like me (open data advocates) spend alot of time trying to ensure that organizations that release csv do so under the standard format, rather than releasing unparsable garbage. Having a library like yours that supported the standard too would be a huge boon.. I could say things like:

    "See by using the RFC for your data output, all kinds of open tools will work out of the box on your data... like Miller (link)"

    Thank you for working on such a clever tool...

    Regards, -FT

    opened by ftrotter 33
  • Documentation of flatten and split*/join*

    Documentation of flatten and split*/join*

    More documentation details. flatten method does not have a complete description. It explains what it does, but it does not explain which are the arguments. You have to look and understand the examples to grasp the meaning. The usage should be clear from the description itself.

    Similarly, for join* and split*, the different arguments are not clear without looking at the examples. Some assumptions could be made, like the parameters for joink are the array/map keys and the string to use to make the join. But what about joinkv? Which parameter is the separator of key and value and which parameter is the separator of records? In fact, the current implementation is antiintuitive: the joink and joinv methods use the second argument for the record separator while the joinkv uses the third field for that and the second field is used for the key-value separator.

    pending feedback to close active needs-documentation 
    opened by Poshi 29
  • Conda build fails with

    Conda build fails with "undefined reference to `mlr_dsl_ParseTrace'"

    When I compile the latest version (5.10.2) without using anaconda, it successfully compiles and runs the unit tests, but when I try and build it as an anaconda package, I get the following error while it runs make:

    /bin/sh ../libtool  --tag=CC   --mode=link $BUILD_PREFIX/bin/x86_64-conda-linux-gnu-cc -Wall -std=gnu99 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/miller-5.10.2 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -static -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -o mlr mlrmain.o cli/libcli.la containers/libcontainers.la stream/libstream.la input/libinput.la dsl/libdsl.la mapping/libmapping.la output/liboutput.la lib/libmlr.la parsing/libdsl.la auxents/libauxents.la -lm
    libtool: link: $BUILD_PREFIX/bin/x86_64-conda-linux-gnu-cc -Wall -std=gnu99 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/miller-5.10.2 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z -Wl,relro -Wl,-z -Wl,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath -Wl,$PREFIX/lib -Wl,-rpath-link -Wl,$PREFIX/lib -o mlr mlrmain.o  -L$PREFIX/lib cli/.libs/libcli.a containers/.libs/libcontainers.a stream/.libs/libstream.a input/.libs/libinput.a dsl/.libs/libdsl.a mapping/.libs/libmapping.a output/.libs/liboutput.a lib/.libs/libmlr.a parsing/.libs/libdsl.a auxents/.libs/libauxents.a -lm
    /sc/arion/work/fultob01/conda/envs/py3.9/conda-bld/miller_1636495473849/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: parsing/.libs/libdsl.a(mlr_dsl_wrapper.o): in function `mlr_dsl_parse':
    mlr_dsl_wrapper.c:(.text.mlr_dsl_parse+0x103): undefined reference to `mlr_dsl_ParseTrace'
    

    My build script is like so:

    #!/bin/sh
    
    ./configure --prefix=$PREFIX
    make
    make check
    make install
    

    I have made the gcc toolchain, make and flex available. It also fails when I install gcc, make and flex using Anaconda then make manually.

    Do you have any idea what's going on?

    opened by BEFH 26
  • W32/X64 release please

    W32/X64 release please

    Wow, Miller seems like a great command-line tool!

    I would love to use it, but there doesn't seem to be a Windows version yet... Could you make/compile one?

    duplicate 
    opened by MatrixView 24
  • Alpine Linux package

    Alpine Linux package

    I found miller to be very useful with SRE tasks and often use it in Docker containers.

    Sadly, there doesn't seem to be a package for Alpine available, a distribution very popular with Docker because of its small footprint. So for now I'm stuck with the large debian and ubuntu images.

    Let's fix this: https://wiki.alpinelinux.org/wiki/Creating_an_Alpine_package

    opened by terorie 22
  • clang support? / freebsd support?

    clang support? / freebsd support?

    Hello,

    How would I go about compiling your program with clang?

    clang -v
    FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
    Target: i386-unknown-freebsd10.2
    Thread model: posix
    Selected GCC installation:
    

    Thanks, Sean

    opened by jungle-boogie 22
  • Discussion forum

    Discussion forum

    opened by derekmahar 21
  • Needing something more than `system`

    Needing something more than `system`

    I may be missing it, but I'm feeling that just having system() for external calls is too limiting.

    In Node we have child_process.spawn(command[, args][, options]) I know Rust has something similar too.

    In GoLang (which I'm not familiar with) I found a struct that looks like a similar set of options.

    Maybe all those options are overkill (particularly things like managing STDIN), but I have a HTML file, with items extracted using pup and put into JSON via jq so they're still HTML encoded... I have a command line program for decoding HTML entities but I have no sane way to call it because, well perhaps there anything in my data...

    Current thinking is a function like exec("htmlentities", ["decode"] { stdin_value: $the_field_i_want_to_decode }) which has the API exec(cmd, args, options)

    I might even write it myself if you think it's a good idea and I can get golang going!

    opened by forbesmyester 1
  • Use '+' as alternative for 'then'?

    Use '+' as alternative for 'then'?

    Only recently discovered miller and wow does it fill a gap! It's now the monkey wrench in my toolbox, giving well-worn grep, sed, cut and awk a rest. Thanks so much for conceiving and developing this!

    My suggestion would be to offer + as an alternative for then. The plus symbol has no meaning for shells, conveys some of the 'pipe' or 'then' meaning, and above all provides a clear visual cue of separating two sides of a pipeline - including suggesting (correctly) that it has lowest binding priority.

    I'm well aware that this comes down to personal taste, so feel free to drop the issue.

    opened by zwets 6
  • segfault on joining files with comments and unexpected behavior when joining more than 2 files

    segfault on joining files with comments and unexpected behavior when joining more than 2 files

    When I run the join verb on files with header comments, it segfaults like so:

    with two files

    # something
    foo	bar	a	b	c
    1	a	1	2	3
    2	b	3	2	1
    

    and

    # something
    foo	bar	d	e	f
    1	a	4	5	6
    2	b	6	5	4
    

    The following happens:

    $ mlr --tsv --pass-comments join -j 'foo,bar' -f ex1_com.tsv ex2_com.tsv
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x670fa3]
    
    goroutine 21 [running]:
    github.com/johnkerl/miller/internal/pkg/mlrval.(*Mlrmap).findEntry(...)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/mlrval/mlrmap_accessors.go:196
    github.com/johnkerl/miller/internal/pkg/mlrval.(*Mlrmap).GetSelectedValuesAndJoined(0x0, {0xc0000d6840, 0x2, 0x0})
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/mlrval/mlrmap_accessors.go:591 +0x103
    github.com/johnkerl/miller/internal/pkg/transformers.(*TransformerJoin).ingestLeftFile(0xc0000d83c0)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/join.go:531 +0x578
    github.com/johnkerl/miller/internal/pkg/transformers.(*TransformerJoin).transformHalfStreaming(0xc0000d83c0, 0xc00007c140, 0xc000200000, 0x28, 0xc000150070)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/join.go:387 +0x32
    github.com/johnkerl/miller/internal/pkg/transformers.(*TransformerJoin).Transform(0xc0000d83c0, 0xc0000bab40, 0x0, 0x0, 0x0)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/join.go:368 +0x5d
    github.com/johnkerl/miller/internal/pkg/transformers.runSingleTransformerBatch(0xc00007e060, {0x90d760, 0xc0000d83c0}, 0x1, 0x0, 0x0, 0x0, 0xc000083860)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/aaa_chain_transformer.go:269 +0x2ce
    github.com/johnkerl/miller/internal/pkg/transformers.runSingleTransformer({0x90d760, 0xc0000d83c0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/aaa_chain_transformer.go:212 +0x92
    created by github.com/johnkerl/miller/internal/pkg/transformers.ChainTransformer
            /home/conda/feedstock_root/build_artifacts/miller_1647792828362/work/internal/pkg/transformers/aaa_chain_transformer.go:187 +0x2ef
    

    This does not occur with the same command if the comment is removed in the input files.

    I also identified an issue with multiple input files being concatenated instead of joined.

    With a third file like this:

    foo	bar	g	h	i
    1	a	7	8	9
    2	b	9	8	7
    

    The output is

    $ mlr --tsv --pass-comments join -j 'foo,bar' -f ex1.tsv ex2.tsv ex3.tsv
    foo     bar     a       b       c       d       e       f
    1       a       1       2       3       4       5       6
    2       b       3       2       1       6       5       4
    
    foo     bar     a       b       c       g       h       i
    1       a       1       2       3       7       8       9
    2       b       3       2       1       9       8       7
    

    I can actually work around the segfault issue by removing comments, but the multi-file merge is problematic.

    opened by BEFH 4
  • Miller 6: about tsv and carriage return inside cells

    Miller 6: about tsv and carriage return inside cells

    Hi, if I run mlr --tsv cat tmp.tsv on this tsv

    I have

    mlr :  mlr: TSV header/data length mismatch 2 != 1 at filename tmp.tsv line  2.
    

    If I use mlr 5 I have no error and it interprets properly the file. Shouldn't version 6 handle it well too?

    In 6 I must set it as csv, running mlr --csv --fs "\t" cat tmp.tsv.

    opened by aborruso 0
  • Reshape not acting as (I) expected

    Reshape not acting as (I) expected

    Let's say I run the following command:

    mlr --pprint reshape -s item,value << EOF
        time       item value
        2009-01-01 X    0.65473572
        2009-01-01 Y    2.4520609
        2009-01-02 X    -0.89248112
        2009-01-03 X    0.98012375
        2009-01-03 Y    1.3179287
    EOF
    

    I would expect to get out something like:

        time       X           Y
        2009-01-01 0.65473572  2.4520609
        2009-01-02 -0.89248112 (error)
        2009-01-03 0.98012375  1.3179287
    

    Instead, I get:

     time       X          Y
     2009-01-01 0.65473572 2.4520609
    
     time       X
     2009-01-02 -0.89248112
    
     time       X          Y
     2009-01-03 0.98012375 1.3179287
    

    Two questions:

    1. Is the latter output (the actual output) expected, or is this a bug?
    2. Is there a way to get output as I expected (one table with empty values handled some how)? Ideally, I'd like to supply it with some fill value (-99999).

    Thanks!

    bug active 
    opened by holmescharles 2
  • Can you define an

    Can you define an "empty data" representation?

    I have a pprint formatted file where I represent NULL or EMPTY data with the string "-99999". In python if I want to treat that string as null while I read the file, I use the command pandas.read_csv(FILENAME, na_values=-99999). Is there a way to do this with miller?

    opened by holmescharles 3
Releases(v6.2.0)
  • v6.2.0(Mar 19, 2022)

    Overview

    The primary purpose of this release is to restore --tsvlite which, by its own, would merit a 6.1.1 bugfix release. But since a couple other new features are present as well, this is a 6.2.0 minor release.

    All the "Plans for 6.2.0" listed at https://github.com/johnkerl/miller/releases/tag/v6.1.0 are all still in-plan, but since this 6.2.0 exists sooner than later, those issues are planned for a 6.3.0.

    Details

    PRs:

    • Restore --tsvlite by @johnkerl in https://github.com/johnkerl/miller/pull/984
    • Let dhms2sec accept input like "8h" by @johnkerl in https://github.com/johnkerl/miller/pull/983
    • Use fixed OFMT for multi-platform regression-testing by @johnkerl in https://github.com/johnkerl/miller/pull/988
    • Bump github.com/stretchr/testify from 1.7.0 to 1.7.1 by @dependabot in https://github.com/johnkerl/miller/pull/986
    • gssub DSL function by @johnkerl in https://github.com/johnkerl/miller/pull/989

    Full Changelog: https://github.com/johnkerl/miller/compare/v6.1.0...v6.2.0

    Source code(tar.gz)
    Source code(zip)
    miller-6.2.0-1.src.rpm(2.45 MB)
    miller-6.2.0-aix-ppc64.tar.gz(4.36 MB)
    miller-6.2.0-checksums.txt(3.01 KB)
    miller-6.2.0-freebsd-386.tar.gz(3.25 MB)
    miller-6.2.0-freebsd-amd64.tar.gz(3.59 MB)
    miller-6.2.0-freebsd-arm64.tar.gz(3.30 MB)
    miller-6.2.0-linux-386.deb(3.41 MB)
    miller-6.2.0-linux-386.rpm(3.27 MB)
    miller-6.2.0-linux-386.tar.gz(3.26 MB)
    miller-6.2.0-linux-amd64.deb(3.77 MB)
    miller-6.2.0-linux-amd64.rpm(3.57 MB)
    miller-6.2.0-linux-amd64.tar.gz(3.56 MB)
    miller-6.2.0-linux-arm64.deb(3.50 MB)
    miller-6.2.0-linux-arm64.rpm(3.32 MB)
    miller-6.2.0-linux-arm64.tar.gz(3.30 MB)
    miller-6.2.0-linux-armv6.deb(3.41 MB)
    miller-6.2.0-linux-armv6.rpm(3.28 MB)
    miller-6.2.0-linux-armv6.tar.gz(3.27 MB)
    miller-6.2.0-linux-armv7.deb(3.41 MB)
    miller-6.2.0-linux-armv7.rpm(3.27 MB)
    miller-6.2.0-linux-armv7.tar.gz(3.26 MB)
    miller-6.2.0-linux-ppc64le.deb(3.36 MB)
    miller-6.2.0-linux-ppc64le.rpm(3.17 MB)
    miller-6.2.0-linux-ppc64le.tar.gz(3.16 MB)
    miller-6.2.0-linux-riscv64.deb(3.64 MB)
    miller-6.2.0-linux-riscv64.rpm(3.44 MB)
    miller-6.2.0-linux-riscv64.tar.gz(3.43 MB)
    miller-6.2.0-linux-s390x.deb(3.79 MB)
    miller-6.2.0-linux-s390x.rpm(3.59 MB)
    miller-6.2.0-linux-s390x.tar.gz(3.54 MB)
    miller-6.2.0-macos-amd64.tar.gz(3.67 MB)
    miller-6.2.0-macos-arm64.tar.gz(4.15 MB)
    miller-6.2.0-windows-386.zip(3.49 MB)
    miller-6.2.0-windows-amd64.zip(3.70 MB)
    miller-6.2.0.tar.gz(2.69 MB)
  • v6.1.0(Mar 7, 2022)

    Please see:

    • https://miller.readthedocs.io/en/latest/ for more about Miller
    • https://miller.readthedocs.io/en/latest/installing-miller/ for installation

    Features

    Major features:

    • Natural sort by @johnkerl in https://github.com/johnkerl/miller/pull/932
    • mlr split verb by @johnkerl in https://github.com/johnkerl/miller/pull/898
    • Make TSV finally true TSV by @johnkerl in https://github.com/johnkerl/miller/pull/923
    • Sliding window averages by @johnkerl in https://github.com/johnkerl/miller/pull/894
    • Implement shift-lead option for mlr step by @johnkerl in https://github.com/johnkerl/miller/pull/893

    New DSL functions:

    • New fmtifnum DSL function; make fmtnum/fmtifnum recursive over maps and arrays by @johnkerl in https://github.com/johnkerl/miller/pull/946
    • New unformat DSL function by @johnkerl in https://github.com/johnkerl/miller/pull/871
    • New format DSL function by @johnkerl in https://github.com/johnkerl/miller/pull/869
    • New concat DSL function for arrays by @johnkerl in https://github.com/johnkerl/miller/pull/868

    DSL improvements:

    • Support more Go regex patterns, like "\d" by @johnkerl in https://github.com/johnkerl/miller/pull/974
    • Include \U support in addition to \u for DSL Unicode string literals by @johnkerl in https://github.com/johnkerl/miller/pull/917
    • Support unicode literals in the Miller DSL by @johnkerl in https://github.com/johnkerl/miller/pull/916
    • Allow 0o... octal literals in the DSL by @johnkerl in https://github.com/johnkerl/miller/pull/864

    New command-line flags:

    • Add --left-keep-fields option for mlr join by @johnkerl in https://github.com/johnkerl/miller/pull/967
    • New --lazy-quotes flag for helping with malformed CSV by @johnkerl in https://github.com/johnkerl/miller/pull/925

    REPL and on-line help:

    • Let :resetblocks/:rb in the REPL take optional begin/main/end by @johnkerl in https://github.com/johnkerl/miller/pull/924
    • Add :resetblocks / :rb to REPL by @johnkerl in https://github.com/johnkerl/miller/pull/920
    • ?foo and ??foo for :help foo / :help find foo in the REPL by @johnkerl in https://github.com/johnkerl/miller/pull/915

    Improvements and bugfixes

    • Support Latin-1 supplement a0-ff as DSL string literals by @johnkerl in https://github.com/johnkerl/miller/pull/957
    • Fix "%%" in strptime; more test cases for strptime by @johnkerl in https://github.com/johnkerl/miller/pull/951
    • Support %F, %T, and more in strptime by @johnkerl in https://github.com/johnkerl/miller/pull/944
    • Fix handling of mlr nest abbrevs by @johnkerl in https://github.com/johnkerl/miller/pull/937
    • Add Inf and NaN literals to the DSL by @johnkerl in https://github.com/johnkerl/miller/pull/933
    • Boolean inference for issue 908 by @johnkerl in https://github.com/johnkerl/miller/pull/931
    • strptime %j format for 3-digit day in year by @johnkerl in https://github.com/johnkerl/miller/pull/930
    • Fix is_non_empty for absent case by @johnkerl in https://github.com/johnkerl/miller/pull/928
    • --nidx --fs x should be the same as --fs x --nidx by @johnkerl in https://github.com/johnkerl/miller/pull/912
    • Update default colorization by @johnkerl in https://github.com/johnkerl/miller/pull/904
    • Make is_null/is_not_null DSL functions include new JSON-null type by @johnkerl in https://github.com/johnkerl/miller/pull/883
    • Fix #853 by @johnkerl in https://github.com/johnkerl/miller/pull/860

    Documentation

    • New doc page: Parsing and formatting fields by @johnkerl in https://github.com/johnkerl/miller/pull/973
    • More doc material for :context in the REPL by @johnkerl in https://github.com/johnkerl/miller/pull/966
    • Fix typo in on-line help for splitax DSL function by @johnkerl in https://github.com/johnkerl/miller/pull/964
    • More doc-sites for the funct keyword by @johnkerl in https://github.com/johnkerl/miller/pull/963
    • Doc updates for funct keyword by @johnkerl in https://github.com/johnkerl/miller/pull/961
    • FAQ entry for #351 by @johnkerl in https://github.com/johnkerl/miller/pull/958
    • docs: add Poshi as a contributor for doc by @allcontributors in https://github.com/johnkerl/miller/pull/956
    • docs: add schragge as a contributor for doc by @allcontributors in https://github.com/johnkerl/miller/pull/955
    • FAQ entry for #285: carriage returns in field names by @johnkerl in https://github.com/johnkerl/miller/pull/953
    • Add --implicit-tsv-header as alias for --implicit-csv-header, etc by @johnkerl in https://github.com/johnkerl/miller/pull/952
    • Fix: multiple documentation tweaks by @Poshi in https://github.com/johnkerl/miller/pull/949
    • fix typo in reference-verbs.md by @zachvalenta in https://github.com/johnkerl/miller/pull/945
    • Add on mouse over permalink anchor for titles by @aborruso in https://github.com/johnkerl/miller/pull/942
    • Webdoc information on Unicode string literals by @johnkerl in https://github.com/johnkerl/miller/pull/935
    • 'mlr help function nonesuch' should not be silent by @johnkerl in https://github.com/johnkerl/miller/pull/934
    • Clarify strftime on-line help by @johnkerl in https://github.com/johnkerl/miller/pull/929
    • Expand on-line help for split* DSL functions by @johnkerl in https://github.com/johnkerl/miller/pull/927
    • On-line help for -s flag by @johnkerl in https://github.com/johnkerl/miller/pull/926
    • Multiple on-line-help issues from #908 by @johnkerl in https://github.com/johnkerl/miller/pull/921
    • Multiple on-line-help issues from #908 by @johnkerl in https://github.com/johnkerl/miller/pull/913
    • Fix operator-precedence doc table to match DSL grammar by @johnkerl in https://github.com/johnkerl/miller/pull/911
    • Fix multiple on-line-help issues from #907 by @johnkerl in https://github.com/johnkerl/miller/pull/910
    • Clarify source for printf-style formatting by @johnkerl in https://github.com/johnkerl/miller/pull/895
    • Fix #891 by @johnkerl in https://github.com/johnkerl/miller/pull/892
    • Improve mlr top documentation for #861 by @johnkerl in https://github.com/johnkerl/miller/pull/875
    • Continue #856 by @johnkerl in https://github.com/johnkerl/miller/pull/865
    • misspelling by @Gary-Armstrong in https://github.com/johnkerl/miller/pull/863
    • fix typo by @vapniks in https://github.com/johnkerl/miller/pull/862
    • Update installing-miller.md by @jauderho in https://github.com/johnkerl/miller/pull/859
    • Emit notes by @johnkerl in https://github.com/johnkerl/miller/pull/858
    • Conda/Docker install notes by @johnkerl in https://github.com/johnkerl/miller/pull/857
    • Fix typo: columnn -> column by @vapniks in https://github.com/johnkerl/miller/pull/856
    • Fix typo by @vapniks in https://github.com/johnkerl/miller/pull/855
    • Fix typo by @vapniks in https://github.com/johnkerl/miller/pull/854
    • A small typo by @aborruso in https://github.com/johnkerl/miller/pull/846

    Code quality

    • Code-dedupe logic for array slices and string slices by @johnkerl in https://github.com/johnkerl/miller/pull/960
    • Let mlr repl print empty strings by @johnkerl in https://github.com/johnkerl/miller/pull/959
    • Neaten strptime.go by @johnkerl in https://github.com/johnkerl/miller/pull/950
    • More dead code removal by @skitt in https://github.com/johnkerl/miller/pull/905
    • Remove unreachable code by @skitt in https://github.com/johnkerl/miller/pull/903
    • Use int64 wherever "64-bit integer" is assumed by @skitt in https://github.com/johnkerl/miller/pull/902
    • More of #884: types in enum-consts by @johnkerl in https://github.com/johnkerl/miller/pull/887
    • Clean up file output handler error handling by @skitt in https://github.com/johnkerl/miller/pull/886
    • Use raw strings to avoid escapes by @skitt in https://github.com/johnkerl/miller/pull/885
    • Specify constant types except with iota by @skitt in https://github.com/johnkerl/miller/pull/884
    • Mlrval arrayval from []Mlrval to []*Mlrval by @johnkerl in https://github.com/johnkerl/miller/pull/880
    • Append slices directly instead of looping by @skitt in https://github.com/johnkerl/miller/pull/879
    • Fix mlrmap.Equals FieldCount comparison by @skitt in https://github.com/johnkerl/miller/pull/878
    • Ensure regression-test has a binary to test by @skitt in https://github.com/johnkerl/miller/pull/877
    • Avoid assuming ./mlr is the mlr to test by @skitt in https://github.com/johnkerl/miller/pull/876
    • Update release.yml by @jauderho in https://github.com/johnkerl/miller/pull/867
    • Update .goreleaser.yml by @jauderho in https://github.com/johnkerl/miller/pull/866
    • Goreleaser binary names by @johnkerl in https://github.com/johnkerl/miller/pull/852
    • Add CodeQL support by @jauderho in https://github.com/johnkerl/miller/pull/838

    New Contributors

    • @vapniks made their first contribution in https://github.com/johnkerl/miller/pull/854
    • @Gary-Armstrong made their first contribution in https://github.com/johnkerl/miller/pull/863
    • @zachvalenta made their first contribution in https://github.com/johnkerl/miller/pull/945
    • @Poshi made their first contribution in https://github.com/johnkerl/miller/pull/949

    Plans for 6.2.0

    Update: planned now for 6.3.0 as 6.2.0 was quick and early.

    • Extended JSON-style field accessors for verbs: https://github.com/johnkerl/miller/issues/763 and https://github.com/johnkerl/miller/issues/948
    • AWK-like exit DSL function: https://github.com/johnkerl/miller/issues/341
    • DSL strict mode: https://github.com/johnkerl/miller/issues/440
    • YAML support: https://github.com/johnkerl/miller/issues/614
    • Datediff: https://github.com/johnkerl/miller/issues/708
    • Rank: https://github.com/johnkerl/miller/issues/383

    Full Changelog: https://github.com/johnkerl/miller/compare/v6.0.0...v6.1.0

    Source code(tar.gz)
    Source code(zip)
    miller-6.1.0-1.src.rpm(2.48 MB)
    miller-6.1.0-aix-ppc64.tar.gz(4.36 MB)
    miller-6.1.0-checksums.txt(3.01 KB)
    miller-6.1.0-freebsd-386.tar.gz(3.26 MB)
    miller-6.1.0-freebsd-amd64.tar.gz(3.59 MB)
    miller-6.1.0-freebsd-arm64.tar.gz(3.30 MB)
    miller-6.1.0-linux-386.deb(3.41 MB)
    miller-6.1.0-linux-386.rpm(3.27 MB)
    miller-6.1.0-linux-386.tar.gz(3.26 MB)
    miller-6.1.0-linux-amd64.deb(3.77 MB)
    miller-6.1.0-linux-amd64.rpm(3.58 MB)
    miller-6.1.0-linux-amd64.tar.gz(3.56 MB)
    miller-6.1.0-linux-arm64.deb(3.49 MB)
    miller-6.1.0-linux-arm64.rpm(3.32 MB)
    miller-6.1.0-linux-arm64.tar.gz(3.30 MB)
    miller-6.1.0-linux-armv6.deb(3.42 MB)
    miller-6.1.0-linux-armv6.rpm(3.28 MB)
    miller-6.1.0-linux-armv6.tar.gz(3.27 MB)
    miller-6.1.0-linux-armv7.deb(3.41 MB)
    miller-6.1.0-linux-armv7.rpm(3.27 MB)
    miller-6.1.0-linux-armv7.tar.gz(3.26 MB)
    miller-6.1.0-linux-ppc64le.deb(3.36 MB)
    miller-6.1.0-linux-ppc64le.rpm(3.17 MB)
    miller-6.1.0-linux-ppc64le.tar.gz(3.16 MB)
    miller-6.1.0-linux-riscv64.deb(3.64 MB)
    miller-6.1.0-linux-riscv64.rpm(3.44 MB)
    miller-6.1.0-linux-riscv64.tar.gz(3.43 MB)
    miller-6.1.0-linux-s390x.deb(3.79 MB)
    miller-6.1.0-linux-s390x.rpm(3.59 MB)
    miller-6.1.0-linux-s390x.tar.gz(3.54 MB)
    miller-6.1.0-macos-amd64.tar.gz(3.67 MB)
    miller-6.1.0-macos-arm64.tar.gz(4.15 MB)
    miller-6.1.0-windows-386.zip(3.49 MB)
    miller-6.1.0-windows-amd64.zip(3.71 MB)
    miller-6.1.0.tar.gz(2.71 MB)
  • v6.0.0.rc1(Jan 8, 2022)

    This is an update after https://github.com/johnkerl/miller/releases/tag/v6.0.0-beta, including several performance-optimization PRs since then.

    This is a release-candidate tag -- it doesn't include https://github.com/johnkerl/miller/issues/827 or https://github.com/johnkerl/miller/discussions/755 both of which are blockers for the Miller 6.0.0 release per se.

    The main purpose is for a Conda build by @BEFH as tracked at https://github.com/johnkerl/miller/issues/372#issuecomment-1007576714.

    After #755 and #827 are resolved we will have either 6.0.0.rc2 (if other issues arise) or simply 6.0.0.

    Source code(tar.gz)
    Source code(zip)
    mlr-macos-6.0.0.rc1.zip(6.23 MB)
    mlr-ubuntu-6.0.0.rc1.zip(6.33 MB)
    mlr-windows-6.0.0.rc1.zip(6.48 MB)
  • v6.0.0-beta(Nov 27, 2021)

    This is a beta release for the upcoming 6.0.0 release of Miller.

    Update: please see https://github.com/johnkerl/miller/releases/tag/v6.0.0.rc1.

    Status

    This is marked as a pre-release -- you can get the binaries (for Linux, Mac, and Windows) by downloading them from this release page. Meanwhile tools like brew, apt, chocolatey, etc will still give you Miller 5 until the official Miller 6.0.0 release which is forthcoming.

    Release notes

    https://miller.readthedocs.io/en/latest/new-in-miller-6

    Documentation

    Please see https://miller.readthedocs.io/en/latest

    Goals for the beta

    This is a major, exciting release with lots of features, documentation improvements, full Windows support, and more. Please comment on this page, or file an issue at https://github.com/johnkerl/miller/issues, with any and all feedback, criticism, comments, etc.

    Performance updates

    • 2021/12/01: new binaries attached to this pre-release today incorporate https://github.com/johnkerl/miller/pull/765 which is a 40% reduction in runtime for large files. Two more performance PRs are in prep.
    • 2021/12/21: new binaries attached to this pre-release today incorporate several recent performance-related PRs -- see https://github.com/johnkerl/miller/pull/786 for details.
    • 2021/12/27: new binaries attached to this pre-release today incorporate the performance-related PR https://github.com/johnkerl/miller/pull/809. See also https://miller.readthedocs.io/en/latest/new-in-miller-6/#performance-benchmarks.
    • 2021/12/30: new binaries attached this pre-release today incorporate all currently known release-blocking issues. The mlr version output now shows 6.0.0-rc to indicate this is a release candidate. I hope to release this very soon, barring any new feedback.

    Note that the source tar file attached to this pre-release predates these performance improvements -- if you want binaries, they're current on this pre-release; if you want source, please clone HEAD.

    Update: please see https://github.com/johnkerl/miller/releases/tag/v6.0.0.rc1

    Source code(tar.gz)
    Source code(zip)
    mlr-macos-latest.zip(6.24 MB)
    mlr-ubuntu-latest.zip(6.35 MB)
    mlr-windows-latest.zip(6.50 MB)
  • v5.10.3(Nov 17, 2021)

    This release exists solely to resolve a Conda-build issue as discussed on https://github.com/johnkerl/miller/issues/740. If you're not actively working on Conda packaging for Miller, this release has no added value for you above 5.10.2.

    Likewise, there's no Windows mlr.exe for this final (technical & specific) Miller 5.x release -- for Miller 6.0.0 (coming soon!) and above there will be mlr.exe as a reliably standard part of each release.

    Also note that the tarball is named miller-5.10.3.tar.gz, in contrast to mlr-5.10.2.tar.gz and likewise for all earlier releases. This is being done for forward compatibility with Miller 6.0.0 and beyond which will use names of the form miller-6.0.0.tar.gz, as proposed in https://github.com/johnkerl/miller/issues/360.

    Source code(tar.gz)
    Source code(zip)
    miller-5.10.3-1.src.rpm(1.20 MB)
    miller-5.10.3.tar.gz(1.20 MB)
    mlr.linux_x86_64(2.52 MB)
    mlr.macosx(887.11 KB)
  • v5.10.2(Mar 24, 2021)

    Between 5.9 and 5.10, in the move of docs from https://johnkerl.org/miller/doc to https://miller.readthedocs.io/, I inadvertently made a change which kept the Miller manpage (man mlr) from being included in the distribution file.

    The sole purpose of this release is to fix that.

    If your way to access Miller versions is by downloading pre-built executables from the release page, or by building from source, this release doesn't do much for you. It's most useful for OS-specfic distro-build systems, so that man mlr will again work correctly.

    Source code(tar.gz)
    Source code(zip)
    miller-5.10.2-1.src.rpm(1.21 MB)
    mlr-5.10.2.tar.gz(1.21 MB)
    mlr.linux.x86_64(3.29 MB)
    mlr.macosx(887.25 KB)
  • v5.10.1(Mar 22, 2021)

    This release fixes the following:

    • https://github.com/johnkerl/miller/issues/427
    • https://github.com/johnkerl/miller/issues/431
    • https://github.com/johnkerl/miller/issues/443

    Note: The Miller Appveyor build is again broken and I find it very frustrating to keep running. Two bits of good news: (1) I am recently in possession of a local Windows machine where I hope to produce a mlr.exe; (2) for the Go port (whenever I'm done with it), building for Windows will be a breeze with no special magic.

    Source code(tar.gz)
    Source code(zip)
    miller-5.10.1-1.src.rpm(1.17 MB)
    mlr-5.10.1.tar.gz(1.18 MB)
    mlr.linux.x86_64(3.32 MB)
    mlr.macosx(887.22 KB)
  • v5.10.0(Nov 29, 2020)

    Features

    Bugfixes

    • The count -n feature was not implemented as intended. This fulfills https://github.com/johnkerl/miller/issues/370, reported by @aborruso.
    • Pretty-print format now works correctly with --headerless-csv-output as reported on https://github.com/johnkerl/miller/issues/384, reported by @agguser.
    • The seqgen verb now correctly tracks NR and FNR in the records it emits.
    • An intermittent JSON-parsing bug reported on https://github.com/johnkerl/miller/issues/394 by @sjackman has been fixed.

    Documentation

    This is the first release since the readthedocs move as requested by @pabloab on https://github.com/johnkerl/miller/issues/375. The intention is that you will be able to select documentation specific to 5.10.0 there; I may have something to fix here.

    Go-port preview

    While the mods for this 5.10.1 release are quite minor, intense development time has been spent over the last few months on the Go port, tracked here and here, which will ultimately become Miller 6.

    The completion of the port is still some months away. While most verbs, and most of the DSL, have been ported -- with many new features in place as tracked here -- significant gaps remain. This include the "big" verbs join, nest, reshape, stats1, and stats2, along with all the date-time-related DSL functions, etc.

    Nonetheless, if you wish to experiment with the Go executables for the Miller 6 beta, please find MacOS and Linux versions attached. (I don't know how to make these for Windows yet, sorry!)

    I'd love any and all advance help with the Go port including bug reports, feature requests, etc. -- both from Miller end-users as well as developers. This is exciting and fulfilling work, and I look forward to getting it completed.

    Source code(tar.gz)
    Source code(zip)
    miller-5.10.0-1.src.rpm(1.17 MB)
    mlr-5.10.0.tar.gz(1.18 MB)
    mlr.exe(4.51 MB)
    mlr.go.linux.x86_64(24.79 MB)
    mlr.go.macosx(24.93 MB)
    mlr.linux.x86_64(3.32 MB)
    mlr.macosx(887.21 KB)
    msys-2.0.dll(3.38 MB)
  • v5.9.1(Sep 3, 2020)

    As of Miller 5.9.0, you can have a .mlrrc file containing preferred flags.

    As reported in https://github.com/johnkerl/miller/issues/363, it would be possible for someone to prepare a repository or some other zipfile/tarfile, for example, containing datasets, and send it to you. They could have a line of the form prepipe do_something_bad; cat in that repository, so when you ran any mlr commands in there, it would run the do_something_bad command (whatever that might be).

    The fix is (a) disallow prepipe within .mlrrc files; (b) as a consolation, allow new prepipe-zcat and prepipe-gunzip options which are safe to use.

    This is published as CVE-2020-15167. Many thanks to @koernepr for the report!

    Source code(tar.gz)
    Source code(zip)
    miller-5.9.1-1.src.rpm(1.21 MB)
    mlr-5.9.1.tar.gz(1.21 MB)
    mlr.exe(4.85 MB)
    mlr.linux.x86_64(3.31 MB)
    mlr.macosx(886.87 KB)
    msys-2.0.dll(3.38 MB)
  • v5.9.0(Aug 19, 2020)

    • You can now save common defaults in a ~/.mlrrc. For example, if you normally process CSV files, you can say that in your ~/.mlrrc and you can leave off the --csv flag from your mlr commands. You can read more about this feature here, or in man mlr, or in mlr --help. This feature was requested in https://github.com/johnkerl/miller/issues/339.
    • The AppVeyor build is now unbroken and as a result there are Windows artifacts for this build. Sorry about the delay!! :^/
    Source code(tar.gz)
    Source code(zip)
    miller-5.9.0-1.src.rpm(1.21 MB)
    mlr-5.9.0.tar.gz(1.21 MB)
    mlr.exe(4.84 MB)
    mlr.linux.x86_64(3.31 MB)
    mlr.macosx(886.87 KB)
    msys-2.0.dll(3.38 MB)
  • v5.8.0(Aug 3, 2020)

    Features

    • The new count verb is a keystroke-saver for stats1 -a count -f {some field name}.
    • --jsonx and --ojsonx are keystroke-savers for --json --jvstack and --ojson --jvstack, which is to say, multi-line pretty-printed JSON format.
    • The new -s name=value feature for mlr put and mlr filter gives you simpler access to environment variables in your Miller script, as requested in https://github.com/johnkerl/miller/issues/315.

    Bugfixes

    • mlr format-values is no longer SEGVing on CSV/TSV input. This was reported on https://github.com/johnkerl/miller/issues/330.
    • https://github.com/johnkerl/miller/issues/313 fixes a corner case when field names within command-line arguments have embedded newlines.
    • Line/column indicators for JSON-formatting error messages are now correct (previously they were showing up as 0).
    • end {print NF} no longer SEGVs. This was reported in https://github.com/johnkerl/miller/issues/330.
    • Several broken doc links were fixed up as reported on https://github.com/johnkerl/miller/issues/329.

    Windows note

    • The AppVeyor build has been broken for a while so there is no Windows executable attached to this release -- when I fix that there will be a 5.8.1 with Windows binaries. My apologies for the delay. Issue https://github.com/johnkerl/miller/issues/354 is open to track this.
    Source code(tar.gz)
    Source code(zip)
    miller-5.8.0-1.src.rpm(1.20 MB)
    mlr-5.8.0.tar.gz(1.20 MB)
    mlr.linux.x86_64(3.29 MB)
    mlr.macosx(870.64 KB)
  • v5.7.0(Mar 17, 2020)

    Ports

    • Miller is available via MacPorts thanks to @herbygillot. Miller tracking issue is https://github.com/johnkerl/miller/pull/273.

    • An Alpine Linux port is pending this release thanks to @terorie. Miller tracking issue is https://github.com/johnkerl/miller/issues/293.

    Features

    Bugfixes

    • A bug regarding optional regex-pattern groups was fixed in https://github.com/johnkerl/miller/issues/277.
    • As of https://github.com/johnkerl/miller/issues/294 you can now specify --implicit-csv-header for the join-file in mlr join.
    • A bug with spaces in XTAB-file values was fixed on https://github.com/johnkerl/miller/issues/296.
    • A bug with missing final newline for XTAB-formatted files using MMAP files was fixed on https://github.com/johnkerl/miller/issues/301.

    Documentation

    • Look-and-feel at http://johnkerl.org/miller/doc/ is (hopefully) improved, including clearer visual indication of which section/page you're currently looking at. Note that this change has been live for a few weeks, as look-and-feel-related doc-mods from post-5.6.2 were backported to http://johnkerl.org/miller/doc/.

    • https://github.com/johnkerl/miller/issues/282 improves DSL-function documentation at http://johnkerl.org/miller/doc/reference-dsl.html#Built-in_functions_for_filter_and_put,_summary

    Note

    Support for mmap mode has been entirely discontinued. This is an invisible change and should not affect you at all. For anyone interested in lower-level details, though, the summary is as follows:

    • For an incremental performance gain (perhaps 10-20% run time at most, but see below), within the C source code one can use the mmap system call to access input files via pointer arithmetic rather than malloc-and-memcopy using stdio.
    • However mmap is not available when reading from standard input -- it cannot be memory-mapped.
    • This means all file-format readers are implemented twice within the Miller source code.
    • While I try to regression-test Miller thoroughly, running all canned tests through mmap and stdio mode, I've nonetheless found my mmap implementations liable to corner-cases which I miss but users find: for example https://github.com/johnkerl/miller/issues/29, https://github.com/johnkerl/miller/issues/102, and https://github.com/johnkerl/miller/issues/296.
    • As tracked on https://github.com/johnkerl/miller/issues/160, various operating systems do not release mmapped pages after use as one might intuit, meaning that for large files and/or large numbers of files, I've for a long time now needed to have Miller opt out of mmap usage for precisely those cases which most need the performance gain: see https://github.com/johnkerl/miller/issues/160, https://github.com/johnkerl/miller/issues/181, and https://github.com/johnkerl/miller/issues/256.
    • Additionally, mmap is not used at all for Windows/MSYS2 so there is nothing to lose there.

    For these reasons, keeping mmap mode isn't worth the development overhead.

    As of release 5.7.0, the mlr executable will still accept the --mmap and --no-mmap command-line flags as no-ops, for backward compatibility.

    The caveat for you is that for everyday small files, the default was previously mmap mode and is now stdio (except mlr ... < filename or ... | mlr ... which have always used stdio). There is the off chance that this will newly reveal an old, latent bug or two somewhere.

    I've re-run regressions in valgrind mode to aggressively catch any errors, but, please let me know ASAP via GitHub issue of any unexpected behavior in 5.7.0.

    Source code(tar.gz)
    Source code(zip)
    miller-5.7.0-1.src.rpm(1.20 MB)
    mlr-5.7.0.tar.gz(1.20 MB)
    mlr.exe(4.76 MB)
    mlr.linux.x86_64(3.28 MB)
    mlr.macosx(854.28 KB)
    msys-2.0.dll(3.17 MB)
  • v5.6.2(Sep 22, 2019)

    Bug fixes:

    • https://github.com/johnkerl/miller/issues/271 fixes a corner-case bug with more than 100 CSV/TSV files with headers of varying lengths.

    Documentation:

    • The new http://johnkerl.org/miller/doc/whyc-details.html is an elaboration on http://johnkerl.org/miller/doc/whyc.html which answers a question posed by @burntsushi on Reddit a couple years ago which I did not address in detail at the time.
    Source code(tar.gz)
    Source code(zip)
    mlr-5.6.2.tar.gz(1.22 MB)
    mlr.exe(4.98 MB)
    mlr.linux_x86_64(3.33 MB)
    mlr.macosx(879.08 KB)
    msys-2.0.dll(3.17 MB)
  • v5.6.1(Sep 17, 2019)

    The only change is that http://johnkerl.org/miller/doc is now more mobile-friendly.

    All build artifacts are the same as at https://github.com/johnkerl/miller/releases/tag/v5.6.0

    Before

    Before

    After

    After

    Source code(tar.gz)
    Source code(zip)
  • v5.6.0(Sep 13, 2019)

    Features:

    • The new system DSL function allows you to run arbitrary shell commands and store them in field values. Some example usages are documented here. This is in response to issues https://github.com/johnkerl/miller/issues/246 and https://github.com/johnkerl/miller/issues/209.

    • There is now support for ASV and USV file formats. This is in response to issue https://github.com/johnkerl/miller/issues/245.

    • The new format-values verb allows you to apply numerical formatting across all record values. This is in response to issue https://github.com/johnkerl/miller/issues/252.

    Documentation:

    • The new DKVP I/O in Python sample code now works for Python 2 as well as Python 3.

    • There is a new cookbook entry on doing multiple joins. This is in response to issue https://github.com/johnkerl/miller/issues/235.

    Bugfixes:

    • The toupper, tolower, and capitalize DSL functions are now UTF-8 aware, thanks to @sheredom's marvelous https://github.com/sheredom/utf8.h. The internationalization page has also been expanded. This is in response to issue https://github.com/johnkerl/miller/issues/254.

    • https://github.com/johnkerl/miller/issues/250 fixes a bug using in-place mode in conjunction with verbs (such as rename or sort) which take field-name lists as arguments.

    • https://github.com/johnkerl/miller/issues/253 fixes a bug in the label when one or more names are common between old and new.

    • https://github.com/johnkerl/miller/issues/251 fixes a corner-case bug when (a) input is CSV; (b) the last field ends with a comma and no newline; (c) input is from standard input and/or --no-mmap is supplied.

    Note:

    Thanks to @aborruso @davidselassie @joelparkerhenderson for the bug reports and feature requests!! :)

    Source code(tar.gz)
    Source code(zip)
    mlr-5.6.0.tar.gz(1.21 MB)
    mlr.exe(4.98 MB)
    mlr.linux.x86_64(3.33 MB)
    mlr.macosx(879.08 KB)
    msys-2.0.dll(3.17 MB)
  • v5.5.0(Sep 1, 2019)

    Features:

    • The new positional-indexing feature resolves https://github.com/johnkerl/miller/issues/236 from @aborruso. You can now get the name of the 3rd field of each record via $[[3]], and its value by $[[[3]]]. These are both usable on either the left-hand or right-hand side of assignment statements, so you can more easily do things like renaming fields progrmatically within the DSL.

    • There is a new capitalize DSL function, complementing the already-existing toupper. This stems from https://github.com/johnkerl/miller/issues/236.

    • There is a new skip-trivial-records verb, resolving https://github.com/johnkerl/miller/issues/197. Similarly, there is a new remove-empty-columns verb, resolving https://github.com/johnkerl/miller/issues/206. Both are useful for data-cleaning use-cases.

    • Another pair is https://github.com/johnkerl/miller/issues/181 and https://github.com/johnkerl/miller/issues/256. While Miller uses mmap internally (and invisibily) to get approximately a 20% performance boost over not using it, this can cause out-of-memory issues with reading either large files, or too many small ones. Now, Miller automatically avoids mmap in these cases. You can still use --mmap or --no-mmap if you want manual control of this.

    • There is a new --ivar option for the nest verb which complements the already-existing --evar. This is from https://github.com/johnkerl/miller/pull/260 thanks to @jgreely.

    • There is a new keystroke-saving urandrange DSL function: urandrange(low, high) is the same as low + (high - low) * urand(). This arose from https://github.com/johnkerl/miller/issues/243.

    • There is a new -v option for the cat verb which writes a low-level record-structure dump to standard error.

    • There is a new -N option for mlr which is a keystroke-saver for --implicit-csv-header --headerless-csv-output.

    Documentation:

    • The new FAQ entry http://johnkerl.org/miller/doc/faq.html#How_to_escape_'%3F'_in_regexes%3F resolves https://github.com/johnkerl/miller/issues/203.

    • The new FAQ entry http://johnkerl.org/miller/doc/faq.html#How_can_I_filter_by_date%3F resolves https://github.com/johnkerl/miller/issues/208.

    • https://github.com/johnkerl/miller/issues/244 fixes a documentation issue while highlighting the need for https://github.com/johnkerl/miller/issues/241.

    Bugfixes:

    • There was a SEGV using nest within then-chains, fixed in response to https://github.com/johnkerl/miller/issues/220.

    • Quotes and backslashes weren't being escaped in JSON output with --jvquoteall; reported on https://github.com/johnkerl/miller/issues/222.

    An extra thank-you:

    I've never code-named releases but if I were to code-name 5.5.0 I would call it "aborruso". Andrea has contributed many fantastic feature requests, as well as driving a huge volume of Miller-related discussions in StackExchange (https://github.com/johnkerl/miller/issues/212). Mille grazie al mio amico @aborruso!

    Source code(tar.gz)
    Source code(zip)
    mlr-5.5.0.tar.gz(1.18 MB)
    mlr.exe(4.90 MB)
    mlr.linux_x86_64(2.93 MB)
    mlr.macosx(864.87 KB)
    msys-2.0.dll(3.17 MB)
  • 5.4.0(Oct 14, 2018)

    Features:

    • The new clean-whitespace verb resolves https://github.com/johnkerl/miller/issues/190 from @aborruso. Along with the new functions strip, lstrip, rstrip, collapse_whitespace, and clean_whitespace, there is now both coarse-grained and fine-grained control over whitespace within field names and/or values. See the linked-to documentation for examples.

    • The new altkv verb resolves https://github.com/johnkerl/miller/issues/184 which was originally opened via an email request. This supports mapping value-lists such as a,b,c,d to alternating key-value pairs such as a=b,c=d.

    • The new fill-down verb resolves https://github.com/johnkerl/miller/issues/189 by @aborruso. See the linked-to documentation for examples.

    • The uniq verb now has a uniq -a which resolves https://github.com/johnkerl/miller/issues/168 from @sjackman.

    • The new regextract and regextract_or_else functions resolve https://github.com/johnkerl/miller/issues/183 by @aborruso.

    • The new ssub function arises from https://github.com/johnkerl/miller/issues/171 by @dohse, as a simplified way to avoid escaping characters which are special to regular-expression parsers.

    • There are new localtime functions in response to https://github.com/johnkerl/miller/issues/170 by @sitaramc. However note that as discussed on https://github.com/johnkerl/miller/issues/170 these do not undo one another in all circumstances. This is a non-issue for timezones which do not do DST. Otherwise, please use with disclaimers: localdate, localtime2sec, sec2localdate, sec2localtime, strftime_local, and strptime_local.

    Builds:

    • Windows build-artifacts are now available in Appveyor at https://ci.appveyor.com/project/johnkerl/miller/build/artifacts, and will be attached to this and future releases. This resolves https://github.com/johnkerl/miller/issues/167, https://github.com/johnkerl/miller/issues/148, and https://github.com/johnkerl/miller/issues/109.

    • Travis builds at https://travis-ci.org/johnkerl/miller/builds now run on OSX as well as Linux.

    • An Ubuntu 17 build issue was fixed by @singalen on https://github.com/johnkerl/miller/issues/164.

    Documentation:

    • put/filter documentation was confusing as reported by @NikosAlexandris on https://github.com/johnkerl/miller/issues/169.

    • The new FAQ entry http://johnkerl.org/miller-releases/miller-head/doc/faq.html#How_to_rectangularize_after_joins_with_unpaired? resolves https://github.com/johnkerl/miller/issues/193 by @aborruso.

    • The new cookbook entry http://johnkerl.org/miller/doc/cookbook.html#Options_for_dealing_with_duplicate_rows arises from https://github.com/johnkerl/miller/issues/168 from @sjackman.

    • The unsparsify documentation had some words missing as reported by @tst2005 on https://github.com/johnkerl/miller/issues/194.

    • There was a typo in the cookpage page http://johnkerl.org/miller/doc/cookbook.html#Full_field_renames_and_reassigns as fixed by @tst2005 in https://github.com/johnkerl/miller/pull/192.

    Bugfixes:

    • There was a memory leak for TSV-format files only as reported by @treynr on https://github.com/johnkerl/miller/issues/181.

    • Dollar sign in regular expressions were not being escaped properly as reported by @dohse on https://github.com/johnkerl/miller/issues/171.

    Source code(tar.gz)
    Source code(zip)
    mlr-5.4.0.tar.gz(1.14 MB)
    mlr.exe(4.40 MB)
    mlr.linux.x86_64(2.91 MB)
    mlr.osx(847.09 KB)
    mlr.spec(1.81 KB)
    msys-2.0.dll(3.19 MB)
  • v5.3.0(Jan 6, 2018)

    Features:

    • Comment strings in data files: mlr --skip-comments allows you to filter out input lines starting with #, for all file formats. Likewise, mlr --skip-comments-with X lets you specify the comment-string X. Comments are only supported at start of data line. mlr --pass-comments and mlr --pass-comments-with X allow you to forward comments to program output as they are read.

    • The count-similar verb lets you compute cluster sizes by cluster labels.

    • While Miller DSL arithmetic gracefully overflows from 64-integer to double-precision float (see also here), there are now the integer-preserving arithmetic operators .+ .- .* ./ .// for those times when you want integer overflow.

    • There is a new bitcount function: for example, echo x=0xf0000206 | mlr put '$y=bitcount($x)' produces x=0xf0000206,y=7.

    • Issue 158: mlr -T is an alias for --nidx --fs tab, and mlr -t is an alias for mlr --tsvlite.

    • The mathematical constants π and e have been renamed from PI and E to M_PI and M_E, respectively. (It's annoying to get a syntax error when you try to define a variable named E in the DSL, when A through D work just fine.) This is a backward incompatibility, but not enough of us to justify calling this release Miller 6.0.0.

    Documentation:

    • As noted here, while Miller has its own DSL there will always be things better expressible in a general-purpose language. The new page Sharing data with other languages shows how to seamlessly share data back and forth between Miller, Ruby, and Python. SQL-input examples and SQL-output examples contain detailed information the interplay between Miller and SQL.

    • Issue 150 raised a question about suppressing numeric conversion. This resulted in a new FAQ entry How do I suppress numeric conversion?, as well as the longer-term follow-on issue 151 which will make numeric conversion happen on a just-in-time basis.

    • To my surprise, csvlite format options weren’t listed in mlr --help or the manpage. This has been fixed.

    • Documentation for auxiliary commands has been expanded, including within the manpage.

    Bugfixes:

    • Issue 159 fixes regex-match of literal dot.

    • Issue 160 fixes out-of-memory cases for huge files. This is an old bug, as old as Miller, and is due to inadequate testing of huge-file cases. The problem is simple: Miller prefers memory-mapped I/O (using mmap) over stdio since mmap is fractionally faster. Yet as any processing (even mlr cat) steps through an input file, more and more pages are faulted in -- and, unfortunately, previous pages are not paged out once memory pressure increases. (This despite gallant attempts with madvise.) Once all processing is done, the memory is released; there is no leak per se. But the Miller process can crash before the entire file is read. The solution is equally simple: to prefer stdio over mmap for files over 4GB in size. (This 4GB threshold is tunable via the --mmap-below flag as described in the manpage.)

    • Issue 161 fixes a CSV-parse error (with error message "unwrapped double quote at line 0") when a CSV file starts with the UTF-8 byte-order-mark ("BOM") sequence 0xef 0xbb 0xbf and the header line has double-quoted fields. (Release 5.2.0 introduced handling for UTF-8 BOMs, but missed the case of double-quoted header line.)

    • Issue 162 fixes a corner case doing multi-emit of aggregate variables when the first variable name is a typo.

    • The Miller JSON parser used to error with Unable to parse JSON data: Line 1 column 0: Unexpected 0x00 when seeking value on empty input, or input with trailing whitespace; this has been fixed.

    There is no prebuilt Windows executable for this release; my apologies.

    Source code(tar.gz)
    Source code(zip)
    mlr-5.3.0.tar.gz(1.16 MB)
    mlr.linux.x86_64(2.87 MB)
    mlr.macosx(831.97 KB)
  • v5.2.2(Jul 20, 2017)

  • v5.2.1(Jun 20, 2017)

    This bugfix release addresses https://github.com/johnkerl/miller/issues/142.

    I'm not attaching prebuilt binaries beyond those already in https://github.com/johnkerl/miller/releases/tag/v5.2.0 since the binaries there are fine for their respective architectures.

    This unblocks Miller on openSUSE.

    Source code(tar.gz)
    Source code(zip)
  • v5.2.0(Jun 13, 2017)

    This release contains mostly feature requests.

    Features:

    • The stats1 verb now lets you use regular expressions to specify which field names to compute statistics on, and/or which to group by. Full details are here.

    • The min and max DSL functions, and the min/max/percentile aggregators for the stats1 and merge-fields verbs, now support numeric as well as string field values. (For mixed string/numeric fields, numbers compare before strings.) This means in particular that order statistics -- min, max, and non-interpolated percentiles -- as well as mode, antimode, and count are now possible on string-only (or mixed) fields. (Of course, any operations requiring arithmetic on values, such as computing sums, averages, or interpolated percentiles, yield an error on string-valued input.)

    • There is a new DSL function mapexcept which returns a copy of the argument with specified key(s), if any, unset. The motivating use-case is to split records to multiple filenames depending on particular field value, which is omitted from the output: mlr --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, "a")' Likewise, mapselect returns a copy of the argument with only specified key(s), if any, set. This resolves https://github.com/johnkerl/miller/issues/137.

    • A new -u option for count-distinct allows unlashed counts for multiple field names. For example, with -f a,b and without -u, count-distinct computes counts for distinct pairs of a and b field values. With -f a,b and with -u, it computes counts for distinct a field values and counts for distinct b field values separately.

    • If you build from source, you can now do ./configure without first doing autoreconf -fiv. This resolves https://github.com/johnkerl/miller/issues/131.

    • The UTF-8 BOM sequence 0xef 0xbb 0xbf is now automatically ignored from the start of CSV files. (The same is already done for JSON files.) This resolves https://github.com/johnkerl/miller/issues/138.

    • For put and filter with -S, program literals such as the 6 in $x = 6 were being parsed as strings. This is not sensible, since the -S option for put and filter is intended to suppress numeric conversion of record data, not program literals. To get string 6 one may use $x = "6".

    Documentation:

    Bugfixes:

    • CRLF line-endings were not being correctly autodetected when I/O formats were specified using --c2j et al.

    • Integer division by zero was causing a fatal runtime exception, rather than computing inf or nan as in the floating-point case.

    Binaries:

    As below. Additionally, the MacOSX version is available in Homebrew. For Windows, you need the .exe file along with both .dll files, with instructions as in https://github.com/johnkerl/miller/releases/tag/v5.1.0w.

    Source code(tar.gz)
    Source code(zip)
    libpcre-1.dll(275.26 KB)
    libpcreposix-0.dll(43.50 KB)
    mlr-5.2.0.tar.gz(1.13 MB)
    mlr.exe(37.00 KB)
    mlr.i686(994.94 KB)
    mlr.linux.x86_64(2.81 MB)
    mlr.macosx(786.29 KB)
  • v5.1.0w(Apr 16, 2017)

    I'm happy to announce a Windows port of Miller. Features in this 5.1.0w release are identical to 5.1.0; the only delivery here is an executable compiled for 64-bit Windows.

    Details are here.

    One of the reasons I'm calling this a beta is that at present you need two DLLs in addition to the mlr.exe executable attached below. All three need to be somewhere in your Windows PATH.

    For example, you can do

    C:\> mkdir \mbin
    

    Then place libpcreposix-0.dll, libpcre-1.dll, and mlr.exe all into C:\mbin. Then

    C:\> set PATH=%PATH%;\mbin
    

    The Windows port is still beta: please open an issue at https://github.com/johnkerl/miller/issues if you encounter any problems.

    Update a few hours later: Due to simple fat-fingering on my part, one of the files was misnamed. The binaries have been reattached correctly.

    Information about the binaries:

    FILE SIZES
    4,379,627 mlr.exe
      281,871 libpcre-1.dll
       44,554 libpcreposix-0.dll
    
    FILE MD5SUMS
    e46a2bfcda001f3698eee4f09409fc04 *mlr.exe
    003b71bce60e63d745bac45740c277f8 *libpcre-1.dll
    d5920106bdbccf736fd8c459959fabbe *libpcreposix-0.dll
    
    Source code(tar.gz)
    Source code(zip)
    libpcre-1.dll(275.26 KB)
    libpcreposix-0.dll(43.50 KB)
    mlr.exe(4.17 MB)
  • v5.1.0(Apr 15, 2017)

    This is a relatively minor release of Miller, containing feature requests and bugfixes while I've been working on the Windows port (which is nearly complete).

    Features:

    • JSON arrays: as described here, Miller being a tabular data processor isn't well-position to handle arbitrary JSON. (See jq for that.) But as of 5.1.0, arrays are converted to maps with integer keys, which are then at least processable using Miller. Details are here. The short of it is that you now have three options for the main mlr executable:
    --json-map-arrays-on-input    Convert JSON array indices to Miller map keys. (This is the default.)
    --json-skip-arrays-on-input   Disregard JSON arrays.
    --json-fatal-arrays-on-input  Raise a fatal error when JSON arrays are encountered in the input.
    

    This resolves https://github.com/johnkerl/miller/issues/133.

    • The new mlr fraction verb makes possible in a few keystrokes what was only possible before using two-pass DSL logic: here you can turn numerical values down a column into their fractional/percentage contribution to column totals, optionally grouped by other key columns.

    • The DSL functions strptime and strftime now handle fractional seconds. For parsing, use %S format as always; for formatting, there are now %1S through %9S which allow you to configure a specified number of decimal places. The return value from strptime is now floating-point, not integer, which is a minor backward incompatibility not worth labeling this release as 6.0.0. (You can work around this using int(strptime(...)).) The DSL functions gmt2sec and sec2gmt, which are keystroke-savers for strptime and strftime, are similarly modified, as is the sec2gmt verb. This resolves https://github.com/johnkerl/miller/issues/125.

    • A few nearly-standalone programs -- which do not have anything to do with record streams -- are packaged within the Miller. (For example, hex-dump, unhex, and show-line-endings commands.) These are described here.

    • The stats1 and merge-fields verbs now support an antimode aggregator, in addition to the existing mode aggregator.

    • The join verb now by default does not require sorted input, which is the more common use case. (Memory-parsimonious joins which require sorted input, while no longer the default, are available using -s.) This another minor backward incompatibility not worth making a 6.0.0 over. This resolves https://github.com/johnkerl/miller/issues/134.

    • mlr nest has a keystroke-saving --evar option for a common use case, namely, exploding a field by value across records.

    Documentation:

    Bugfixes:

    • mlr join -j -l was not functioning correctly. This resolves https://github.com/johnkerl/miller/issues/136.

    • JSON escapes on output (\t and so on) were incorrect. This resolves https://github.com/johnkerl/miller/issues/135.

    Source code(tar.gz)
    Source code(zip)
    mlr-5.1.0.tar.gz(1.13 MB)
    mlr.linux.i386(1.85 MB)
    mlr.linux.x86_64(2.80 MB)
    mlr.osx(773.06 KB)
  • v5.0.1(Mar 12, 2017)

  • v5.0.0(Feb 28, 2017)

    This major release significantly expands the expressiveness of the DSL for mlr put and mlr filter. (The upcoming 5.1.0 release will add the ability to aggregate across all columns for non-DSL verbs such as mlr stats1 and mlr stats2. As well, a Windows port is underway.)

    Please also see the Miller main docs.

    Simple but impactful features:

    Major DSL features:

    • You can now define your own functions and subroutines: e.g. func f(x, y) { return x**2 + y**2 }.
    • New local variables are completely analogous to out-of-stream variables: sum retains its value for the duration of the expression it's defined in; @sum retains its value across all records in the record stream.
    • Local variables, function parameters, and function return types may be defined untyped or typed as in x = 1 or int x = 1, respectively. There are also expression-inline type-assertions available. Type-checking is up to you: omit it if you want flexibility with heterogeneous data; use it if you want to help catch misspellings in your DSL code or unexpected irregularities in your input data.
    • There are now four kinds of maps. Out-of-stream variables have always been scalars, maps, or multi-level maps: @a=1, @b[1]=2, @c[1][2]=3. The same is now true for local variables, which are new to 5.0.0. Stream records have always been single-level maps; $* is a map. And as of 5.0.0 there are now map literals, e.g. {"a":1, "b":2}, which can be defined using JSON-like syntax (with either string or integer keys) and which can be nested arbitrarily deeply.
    • You can loop over maps -- $*, out-of-stream variables, local variables, map-literals, and map-valued function return values -- using for (k, v in ...) or the new for (k in ...) (discussed next). All flavors of map may also be used in emit and dump statements.
    • User-defined functions and subroutines may take map-valued arguments, and may return map values.
    • Some built-in functions now accept map-valued input: typeof, length, depth, leafcount, haskey. There are built-in functions producing map-valued output: mapsum and mapdiff. There are now string-to-map and map-to-string functions: splitnv, splitkv, splitnvx, splitkvx, joink, joinv, and joinkv.

    Minor DSL features:

    • For iterating over maps (namely, local variables, out-of-stream variables, stream records, map literals, or return values from map-valued functions) there is now a key-only for-loop syntax: e.g. for (k in $*) { ... }. This is in addition to the already-existing for (k, v in ...) syntax.
    • There are now triple-statement for-loops (familiar from many other languages), e.g. for (int i = 0; i < 10; i += 1) { ... }.
    • mlr put and mlr filter now accept multiple -f for script files, freely intermixable with -e for expressions. The suggested use case is putting user-defined functions in script files and one-liners calling them using -e. Example: myfuncs.mlr defines the function f(...), then mlr put -f myfuncs.mlr -e '$o = f($i)' myfile.dat. More information is here.
    • mlr filter is now almost identical to mlr put: it can have multiple statements, it can use begin and/or end blocks, it can define and invoke functions. Its final expression must evaluate to boolean which is used as the filter criterion. More details are here.
    • The min and max functions are now variadic: $o = max($a, $b, $c).
    • There is now a substr function.
    • While ENV has long provided read-access to environment variables on the right-hand side of assignments (as a getenv), it now can be at the left-hand side of assignments (as a putenv). This is useful for subsidiary processes created by tee, emit, dump, or print when writing to a pipe.
    • Handling for the # in comments is now handled in the lexer, so you can now (correctly) include # in strings.
    • Separators are now available as read-only variables in the DSL: IPS, IFS, IRS, OPS, OFS, ORS. These are particularly useful with the split and join functions: e.g. with mlr --ifs tab ..., the IFS variable within a DSL expression will evaluate to a string containing a tab character.
    • Syntax errors in DSL expressions now have a little more context.
    • DSL parsing and execution are a bit more transparent. There have long been -v and -t options to mlr put and mlr filter, which print the expression's abstract syntax tree and do a low-level parser trace, respectively. There are now additionally -a which traces stack-variable allocation and -T which traces statements line by line as they execute. While -v, -t, and -a are most useful for development of Miller, the -T option gives you more visibility into what your Miller scripts are doing. See also here.

    Verbs:

    • most-frequent and least-frequent as requested in https://github.com/johnkerl/miller/issues/110.
    • seqgen makes it easy to generate data from within Miller: please also see here for a usage example.
    • unsparsify makes it easy to rectangularize data where not all records have the same fields.
    • cat -n now takes a group-by (-g) option, making it easy to number records within categories.
    • count-distinct, uniq, most-frequent, least-frequent, top, and histogram now take a -o option for specifying their output field names, as requested in https://github.com/johnkerl/miller/issues/122.
    • Median is now a synonym for p50 in stats1.
    • You can now start a then chain with an initial then, which is nice in backslashy/multiline-continuation contexts. This was requested in https://github.com/johnkerl/miller/issues/130.

    I/O options:

    • The print statement may now be used with no arguments, which prints a newline, and a no-argument printn prints nothing but creates a zero-length file in redirected-output context.
    • Pretty-print format now has a --pprint --barred option (for output only, not input). For an example, please see here.
    • There are now keystroke-savers of the form --c2p which abbreviate --icsvlite --opprint, and so on.
    • Miller's map literals are JSON-looking but allow integer keys which JSON doesn't. The --jknquoteint and --jvquoteall flags for mlr (when using JSON output) and mlr put (for dump) provide control over double-quoting behavior.

    Documents new since the previous release:

    • Miller in 10 minutes is a long-overdue addition: while Miller's detailed documentation is evident, there has been a lack of more succinct examples.
    • The cookbook has likewise been expanded, and has been split out into three parts: part 1, part 2, part 3.
    • A bit more background on C performance compared to other languages I experimented with, early on in the development of Miller, is here.

    On-line help:

    • Help for DSL built-in functions, DSL keywords, and verbs is accessible using mlr -f, mlr -k, and mlr -l respectively; name-only lists are available with mlr -F, mlr -K, and mlr -L.

    Bugfixes:

    • A corner-case bug causing a segmentation violation on two sub/gsub statements within a single put, the first one matching its pattern and the second one not matching its pattern, has been fixed.

    Backward incompatibilities: This is Miller 5.0.0, not 4.6.0, due to the following (all relatively minor):

    • The v variables bound in for-loops such as for (k, v in some_multi_level_map) { ... } can now be map-valued if the v specifies a non-terminal in the map.
    • There are new keywords such as var, int, float, num, str, bool, map, IPS, IFS, IRS, OPS, OFS, ORS which can no longer be used as variable names. See mlr -k for the complete list.
    • Unset of the last key in an map-valued variable's map level no longer removes the level: e.g. with @v[1][2]=3 and unset @v[1][2] the @v variable would be empty. As of 5.0.0, @v has key 1 with an empty-map value.
    • There is no longer type-inference on literals: "3"+4 no longer gives 7. (That was never a good idea.)
    • The typeof function used to say things like MT_STRING; now it says things like string.

    Homebrew request pending: https://github.com/Homebrew/homebrew-core/pull/10426

    Source code(tar.gz)
    Source code(zip)
    mlr-5.0.0.tar.gz(1.09 MB)
    mlr.linux.i686(1.83 MB)
    mlr.linux.x86_64(2.78 MB)
    mlr.osx(751.37 KB)
  • v4.5.0(Aug 21, 2016)

    In a natural follow-on to the 4.4.0 redirected-output feature, the 4.5.0 release allows your tap-files to be in a different output format from the main program output.

    For example, using

    mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
    

    the input is CSV, the output is pretty-print tabular, but the tee-files output is written in JSON format. Likewise --ofs, --ors, --ops, --jvstack, and all other output-formatting options from the main help at mlr -h and/or man mlr default to the main command-line options, and may be overridden with flags supplied to mlr put and mlr tee.

    Documentation: http://johnkerl.org/miller/doc/reference.html#Redirected-output_statements_for_put

    Brew update: https://github.com/Homebrew/homebrew-core/pull/4098

    Source code(tar.gz)
    Source code(zip)
    mlr-4.5.0-1.el6.src.rpm(984.23 KB)
    mlr-4.5.0.tar.gz(986.50 KB)
    mlr.linux.x86_64(725.12 KB)
    mlr.osx(567.27 KB)
  • v4.4.0(Aug 12, 2016)

    The principal feature of Miller 4.4.0 is redirected output. Inspired by awk, Miller lets you tap/tee your data as it's processed, run output through subordinate processes such as gzip and jq, split a single file into multiple files per an account-ID column, and so on.

    Details: http://johnkerl.org/miller/doc/reference.html#Redirected-output_statements_for_put

    Other features:

    • mlr step -a shift allows you to place the previous record's values alongside the current record's values: http://johnkerl.org/miller/doc/reference.html#step
    • mlr head, when used without the group-by flag (-g), stops after the specified number of records has been output. For example, even with a multi-gigabyte data file, mlr head -n 10 hugefile.dat will complete quickly after producing the first ten records from the file.
    • The sec2gmtdate verb, and sec2gmtdate function for filter/put, is new: please see http://johnkerl.org/miller/doc/reference.html#sec2gmtdate and http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put.
    • sec2gmt and sec2gmtdate both leave non-numbers as-is, rather than formatting them as (error). This is particularly relevant for formatting nullable epoch-seconds columns in SQL-table output: if a column value is NULL then after sec2gmt or sec2gmtdate it will still be NULL.
    • The dot operator has been universalized to work with any data type and produce a string. For example, if the field n has integers, then instead of typing mlr put '$name = "value:".string($n)' you can now simply domlr put '$name = "value:".$n'. This is particularly timely for creating filenames for redirected print/dump/tee/emit output.
    • The online documents now have a copy of the Miller manpage: http://johnkerl.org/miller/doc/manpage.html
    • Bugfix: inside filter/put, $x=="" was distinct from isempty($x). This was nonsensical; now both are the same.

    Brew update: https://github.com/Homebrew/homebrew-core/pull/3820

    Source code(tar.gz)
    Source code(zip)
    mlr-4.4.0-1.el6.src.rpm(903.16 KB)
    mlr-4.4.0.tar.gz(905.43 KB)
    mlr.linux.x86_64(715.54 KB)
    mlr.osx(562.54 KB)
  • v4.3.0(Jul 3, 2016)

    Major features:

    • Interpolated percentiles are now available using mlr stats1 -i or mlr merge-fields -i. Non-interpolated percentiles are the default. The former resemble R's type=7 quantiles and the latter resemble R's type=1 quantiles. See also http://johnkerl.org/miller/doc/reference.html#stats1 and http://johnkerl.org/miller/doc/reference.html#merge-fields.
    • Markdown-tabular output format is now available using --omd: please see http://johnkerl.org/miller/doc/file-formats.html#Markdown_tabular and https://github.com/johnkerl/miller/issues/106.
    • For files using CSV input as well as CSV output, there is now a --quote-original option which outputs fields with quotes if they had them on input. The was-quoted flag isn't tracked on derived fields, e.g. if fields a and b were quoted on input, then in mlr put '$c = $a . $b the c field won't be quoted on output. As such, this option is most useful with mlr cut, mlr filter, etc. The use-case from the original feature request https://github.com/johnkerl/miller/issues/77#issuecomment-226640596 is in trimming down a huge CSV file in order to facilitate subsequent in-memory processing using spreadsheet software.
    • The cookbook at http://johnkerl.org/miller/doc/cookbook.html has been extended significantly.

    Minor features:

    • You can now set a MLR_CSV_DEFAULT_RS=lf environment variable if you're tired of always putting --rs lf arguments for your CSV files: http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
    • The printn and eprintn commands for mlr put are identical to print and eprint except they don't print final newlines.
    • It is now an error if boundvars in the same for-loop expression have duplicate names, e.g. for (a,a in $*) {...} results in the error message mlr: duplicate for-loop boundvars "a" and "a".
    • The strptime function would announce an internal coding error on malformed format strings; now, it correctly points out the user-level error.

    Bug fixes:

    • Percentiles in merge-fields were not working. This was fixed; also, the lacking unit-test cases which would have caught this sooner have been filled in.
    • Miller's CSV output-quoting was non-RFC-compliant: double-quotes within field names were not being duplicated. This has been fixed (https://github.com/johnkerl/miller/issues/104).

    Brew update: https://github.com/Homebrew/homebrew-core/pull/2698

    Source code(tar.gz)
    Source code(zip)
    mlr-4.3.0-1.el6.src.rpm(876.74 KB)
    mlr-4.3.0.tar.gz(879.16 KB)
    mlr.linux.x86_64(612.13 KB)
    mlr.osx(515.36 KB)
    mlr.spec(1.21 KB)
  • v4.2.0(Jun 21, 2016)

    You can now emit multiple out-of-stream variables side-by-side.

    Doc link: http://johnkerl.org/miller/doc/reference.html#Multi-emit_statements_for_put

    Example:

    $ mlr --from data/medium --opprint put -q '
      @x_count[$a][$b] += 1;
      @x_sum[$a][$b] += $x;
      end {
          for ((a, b), _ in @x_count) {
              @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b]
          }
          emit (@x_sum, @x_count, @x_mean), "a", "b"
      }
    '
    a   b   x_sum      x_count x_mean
    pan pan 219.185129 427     0.513314
    pan wye 198.432931 395     0.502362
    pan eks 216.075228 429     0.503672
    pan hat 205.222776 417     0.492141
    pan zee 205.097518 413     0.496604
    eks pan 179.963030 371     0.485076
    eks wye 196.945286 407     0.483895
    eks zee 176.880365 357     0.495463
    eks eks 215.916097 413     0.522799
    eks hat 208.783171 417     0.500679
    wye wye 185.295850 377     0.491501
    wye pan 195.847900 392     0.499612
    wye hat 212.033183 426     0.497730
    wye zee 194.774048 385     0.505907
    wye eks 204.812961 386     0.530604
    zee pan 202.213804 389     0.519830
    zee wye 233.991394 455     0.514267
    zee eks 190.961778 391     0.488393
    zee zee 206.640635 403     0.512756
    zee hat 191.300006 409     0.467726
    hat wye 208.883010 423     0.493813
    hat zee 196.349450 385     0.509999
    hat eks 189.006793 389     0.485879
    hat hat 182.853532 381     0.479931
    hat pan 168.553807 363     0.464336
    

    Note that this example simply recapitulates the easier-to-type

    mlr --from ../data/medium --opprint stats1 -a sum,count,mean -f x -g a,b
    

    Brew update: https://github.com/Homebrew/homebrew-core/pull/2213

    Source code(tar.gz)
    Source code(zip)
    mlr-4.2.0-1.el6.src.rpm(861.84 KB)
    mlr-4.2.0.tar.gz(862.21 KB)
    mlr.linux.x86_64(603.25 KB)
    mlr.osx(510.87 KB)
Owner
John Kerl
Who: Nerd/dad What: () => {this}
John Kerl
Culture - A package that gets a random name from the Culture series' ships Minds.

culture A package that gets a random name from the Culture series' ships Minds. Getting started This project requires Go to be installed. On OS X with

Shea Hartley 0 Jan 2, 2022
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.5k Jun 30, 2022
hack-browser-data is an open-source tool that could help you decrypt data from the browser.

hack-browser-data is an open-source tool that could help you decrypt data ( password|bookmark|cookie|history|credit card|download

idiotc4t 99 Jun 17, 2022
Finds common flaws in passwords. Like cracklib, but written in Go.

crunchy Finds common flaws in passwords. Like cracklib, but written in Go. Detects: ErrEmpty: Empty passwords ErrTooShort: Too short passwords ErrNoDi

Christian Muehlhaeuser 375 May 25, 2022
A russian roulette-like programme that has a 1/6 chance to delete your OS.

russianRouletteGo russianRouletteGo - a russian roulette-like programme that has a 1/6 chance to delete your OS. Last tested and built in Go 1.17.3 Us

wowil 1 Jan 3, 2022
sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP

sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP. (demo)

Mozilla 10.1k Jun 26, 2022
Tool for monitoring your Ethereum clients. Client-agnostic as it queries the standardized JSON-RPC APIs

e7mon Tool for monitoring your Ethereum clients. Client-agnostic as it queries the standardized JSON-RPC APIs. However, the execution client should be

null 24 May 28, 2022
Bitcoin futures curve from Deribit as a JSON webservice

Curve Bitcoin futures curve from Deribit as a JSON webservice Building go build . Running ./curve Expiration date and annualised yield of each contr

Steven Wilkin 0 Dec 13, 2021
Ethereum-vanity-wallet - A fork of https://github.com/meehow/ethereum-vanity-wallet but the key can be exported to a JSON keystore file

ethereum-vanity-wallet See https://github.com/meehow/ethereum-vanity-wallet This version: doesn't display the private key let's you interactively expo

null 0 Jan 2, 2022
Fallback to build simdjson-go tape using only encoding/json

fakesimdjson builds a simdjson-go tape using the stdlib's JSON parser. It is slow and does a lot of allocations. This is a workaround to run programs

Kiwi.com 1 Mar 11, 2022
Small utility to sign a small json containing basic kyc information. The key generated by it is fully compatible with cosmos based chains.

Testnet signer utility This utility generates a signed JSON-formatted ID to prove ownership of a key used to submit tx on the blockchain. This testnet

Archway Network 63 May 30, 2022
Get any cryptocurrencies ticker and trade data in real time from multiple exchanges and then save it in multiple storage systems.

Cryptogalaxy is an app which will get any cryptocurrencies ticker and trade data in real time from multiple exchanges and then saves it in multiple storage systems.

Pavan Shetty 97 Jun 6, 2022
Sign, verify, encrypt and decrypt data with GPG in your browser.

keygaen Sign, verify, encrypt and decrypt data with GPG in your browser. ⚠️ keygaen has not yet been audited! While we try to make keygaen as secure a

Felix Pojtinger 75 Jun 21, 2022
A blockchain-based demo that shows an alternative strategy for ensuring data and log integrity on aircraft

A blockchain-based demo that shows an alternative strategy for ensuring data and log integrity on aircraft. (Coded in less than 24 hours for GunnHack)

sckzor 0 Feb 6, 2022
run ABI encoded data against the ethereum blockchain

Run EVM code against a database at a certain block height - Note You can't run this against a running geth node - because that would share the db and

Edgar Aroutiounian 60 Nov 11, 2021
Store data on Bitcoin for 350 sats/KB up to 185 KB by using P2SH-P2WSH witness scripts

Bitcandle Store data on Bitcoin for 350 sats/KB up to 185 kB by using P2SH-P2WSH witness scripts. 225ed8bc432d37cf434f80717286fd5671f676f12b573294db72

Aurèle Oulès 10 Nov 14, 2021
Easily encrypt data for the Adyen payment platform

adyen Encrypt secrets for the Adyen payment platform. This library uses crypto/rand to generate cryptographically secure AES keys and nonces, and re-u

CrimsonAIO 30 Jun 26, 2022
Dump BitClout chain data into MongoDB

mongodb-dumper mongodb-dumper runs a full BitClout node and dumps the chain data into a MongoDB database Build Running the following commands will cre

null 16 May 17, 2022
collection of tools to gleam insights from a full bitclout node's data

bitcloutscripts collection of tools to gleam insights from a full bitclout node's data bitcloutscripts $ ./bcs bcs posts # print all posts

Andrew Arrow 4 Jul 11, 2021