Optimized compression packages

Related tags

Compression compress
Overview

compress

This package provides various compression algorithms.

  • zstandard compression and decompression in pure Go.
  • S2 is a high performance replacement for Snappy.
  • Optimized deflate packages which can be used as a dropin replacement for gzip, zip and zlib.
  • huff0 and FSE implementations for raw entropy encoding.
  • pgzip is a separate package that provides a very fast parallel gzip implementation.
  • fuzz package for fuzz testing all compressors/decompressors here.

Go Reference Build Status Sourcegraph Badge

changelog

  • Jan 14, 2021 (v1.11.7)

    • Use Bytes() interface to get bytes across packages. #309
    • s2: Add 'best' compression option. #310
    • s2: Add ReaderMaxBlockSize, changes s2.NewReader signature to include varargs. #311
    • s2: Fix crash on small better buffers. #308
    • s2: Clean up decoder. #312
  • Jan 7, 2021 (v1.11.6)

    • zstd: Make decoder allocations smaller #306
    • zstd: Free Decoder resources when Reset is called with a nil io.Reader #305
  • Dec 20, 2020 (v1.11.4)

    • zstd: Add Best compression mode #304
    • Add header decoder #299
    • s2: Add uncompressed stream option #297
    • Simplify/speed up small blocks with known max size. #300
    • zstd: Always reset literal dict encoder #303
  • Nov 15, 2020 (v1.11.3)

    • inflate: 10-15% faster decompression #293
    • zstd: Tweak DecodeAll default allocation #295
  • Oct 11, 2020 (v1.11.2)

    • s2: Fix out of bounds read in "better" block compression #291
  • Oct 1, 2020 (v1.11.1)

    • zstd: Set allLitEntropy true in default configuration #286
  • Sept 8, 2020 (v1.11.0)

    • zstd: Add experimental compression dictionaries #281
    • zstd: Fix mixed Write and ReadFrom calls #282
    • inflate/gz: Limit variable shifts, ~5% faster decompression #274
See changes prior to v1.11.0
  • July 8, 2020 (v1.10.11)

    • zstd: Fix extra block when compressing with ReadFrom. #278
    • huff0: Also populate compression table when reading decoding table. #275
  • June 23, 2020 (v1.10.10)

    • zstd: Skip entropy compression in fastest mode when no matches. #270
  • June 16, 2020 (v1.10.9):

    • zstd: API change for specifying dictionaries. See #268
    • zip: update CreateHeaderRaw to handle zip64 fields. #266
    • Fuzzit tests removed. The service has been purchased and is no longer available.
  • June 5, 2020 (v1.10.8):

    • 1.15x faster zstd block decompression. #265
  • June 1, 2020 (v1.10.7):

    • Added zstd decompression dictionary support
    • Increase zstd decompression speed up to 1.19x. #259
    • Remove internal reset call in zstd compression and reduce allocations. #263
  • May 21, 2020: (v1.10.6)

    • zstd: Reduce allocations while decoding. #258, #252
    • zstd: Stricter decompression checks.
  • April 12, 2020: (v1.10.5)

    • s2-commands: Flush output when receiving SIGINT. #239
  • Apr 8, 2020: (v1.10.4)

  • Mar 11, 2020: (v1.10.3)

    • s2: Use S2 encoder in pure Go mode for Snappy output as well. #245
    • s2: Fix pure Go block encoder. #244
    • zstd: Added "better compression" mode. #240
    • zstd: Improve speed of fastest compression mode by 5-10% #241
    • zstd: Skip creating encoders when not needed. #238
  • Feb 27, 2020: (v1.10.2)

    • Close to 50% speedup in inflate (gzip/zip decompression). #236 #234 #232
    • Reduce deflate level 1-6 memory usage up to 59%. #227
  • Feb 18, 2020: (v1.10.1)

    • Fix zstd crash when resetting multiple times without sending data. #226
    • deflate: Fix dictionary use on level 1-6. #224
    • Remove deflate writer reference when closing. #224
  • Feb 4, 2020: (v1.10.0)

    • Add optional dictionary to stateless deflate. Breaking change, send nil for previous behaviour. #216
    • Fix buffer overflow on repeated small block deflate. #218
    • Allow copying content from an existing ZIP file without decompressing+compressing. #214
    • Added S2 AMD64 assembler and various optimizations. Stream speed >10GB/s. #186
See changes prior to v1.10.0
  • Jan 20,2020 (v1.9.8) Optimize gzip/deflate with better size estimates and faster table generation. #207 by luyu6056, #206.
  • Jan 11, 2020: S2 Encode/Decode will use provided buffer if capacity is big enough. #204
  • Jan 5, 2020: (v1.9.7) Fix another zstd regression in v1.9.5 - v1.9.6 removed.
  • Jan 4, 2020: (v1.9.6) Regression in v1.9.5 fixed causing corrupt zstd encodes in rare cases.
  • Jan 4, 2020: Faster IO in s2c + s2d commandline tools compression/decompression. #192
  • Dec 29, 2019: Removed v1.9.5 since fuzz tests showed a compatibility problem with the reference zstandard decoder.
  • Dec 29, 2019: (v1.9.5) zstd: 10-20% faster block compression. #199
  • Dec 29, 2019: zip package updated with latest Go features
  • Dec 29, 2019: zstd: Single segment flag condintions tweaked. #197
  • Dec 18, 2019: s2: Faster compression when ReadFrom is used. #198
  • Dec 10, 2019: s2: Fix repeat length output when just above at 16MB limit.
  • Dec 10, 2019: zstd: Add function to get decoder as io.ReadCloser. #191
  • Dec 3, 2019: (v1.9.4) S2: limit max repeat length. #188
  • Dec 3, 2019: Add WithNoEntropyCompression to zstd #187
  • Dec 3, 2019: Reduce memory use for tests. Check for leaked goroutines.
  • Nov 28, 2019 (v1.9.3) Less allocations in stateless deflate.
  • Nov 28, 2019: 5-20% Faster huff0 decode. Impacts zstd as well. #184
  • Nov 12, 2019 (v1.9.2) Added Stateless Compression for gzip/deflate.
  • Nov 12, 2019: Fixed zstd decompression of large single blocks. #180
  • Nov 11, 2019: Set default s2c block size to 4MB.
  • Nov 11, 2019: Reduce inflate memory use by 1KB.
  • Nov 10, 2019: Less allocations in deflate bit writer.
  • Nov 10, 2019: Fix inconsistent error returned by zstd decoder.
  • Oct 28, 2019 (v1.9.1) ztsd: Fix crash when compressing blocks. #174
  • Oct 24, 2019 (v1.9.0) zstd: Fix rare data corruption #173
  • Oct 24, 2019 zstd: Fix huff0 out of buffer write #171 and always return errors #172
  • Oct 10, 2019: Big deflate rewrite, 30-40% faster with better compression #105
See changes prior to v1.9.0
  • Oct 10, 2019: (v1.8.6) zstd: Allow partial reads to get flushed data. #169
  • Oct 3, 2019: Fix inconsistent results on broken zstd streams.
  • Sep 25, 2019: Added -rm (remove source files) and -q (no output except errors) to s2c and s2d commands
  • Sep 16, 2019: (v1.8.4) Add s2c and s2d commandline tools.
  • Sep 10, 2019: (v1.8.3) Fix s2 decoder Skip.
  • Sep 7, 2019: zstd: Added WithWindowSize, contributed by ianwilkes.
  • Sep 5, 2019: (v1.8.2) Add WithZeroFrames which adds full zero payload block encoding option.
  • Sep 5, 2019: Lazy initialization of zstandard predefined en/decoder tables.
  • Aug 26, 2019: (v1.8.1) S2: 1-2% compression increase in "better" compression mode.
  • Aug 26, 2019: zstd: Check maximum size of Huffman 1X compressed literals while decoding.
  • Aug 24, 2019: (v1.8.0) Added S2 compression, a high performance replacement for Snappy.
  • Aug 21, 2019: (v1.7.6) Fixed minor issues found by fuzzer. One could lead to zstd not decompressing.
  • Aug 18, 2019: Add fuzzit continuous fuzzing.
  • Aug 14, 2019: zstd: Skip incompressible data 2x faster. #147
  • Aug 4, 2019 (v1.7.5): Better literal compression. #146
  • Aug 4, 2019: Faster zstd compression. #143 #144
  • Aug 4, 2019: Faster zstd decompression. #145 #143 #142
  • July 15, 2019 (v1.7.4): Fix double EOF block in rare cases on zstd encoder.
  • July 15, 2019 (v1.7.3): Minor speedup/compression increase in default zstd encoder.
  • July 14, 2019: zstd decoder: Fix decompression error on multiple uses with mixed content.
  • July 7, 2019 (v1.7.2): Snappy update, zstd decoder potential race fix.
  • June 17, 2019: zstd decompression bugfix.
  • June 17, 2019: fix 32 bit builds.
  • June 17, 2019: Easier use in modules (less dependencies).
  • June 9, 2019: New stronger "default" zstd compression mode. Matches zstd default compression ratio.
  • June 5, 2019: 20-40% throughput in zstandard compression and better compression.
  • June 5, 2019: deflate/gzip compression: Reduce memory usage of lower compression levels.
  • June 2, 2019: Added zstandard compression!
  • May 25, 2019: deflate/gzip: 10% faster bit writer, mostly visible in lower levels.
  • Apr 22, 2019: zstd decompression added.
  • Aug 1, 2018: Added huff0 README.
  • Jul 8, 2018: Added Performance Update 2018 below.
  • Jun 23, 2018: Merged Go 1.11 inflate optimizations. Go 1.9 is now required. Backwards compatible version tagged with v1.3.0.
  • Apr 2, 2018: Added huff0 en/decoder. Experimental for now, API may change.
  • Mar 4, 2018: Added FSE Entropy en/decoder. Experimental for now, API may change.
  • Nov 3, 2017: Add compression Estimate function.
  • May 28, 2017: Reduce allocations when resetting decoder.
  • Apr 02, 2017: Change back to official crc32, since changes were merged in Go 1.7.
  • Jan 14, 2017: Reduce stack pressure due to array copies. See Issue #18625.
  • Oct 25, 2016: Level 2-4 have been rewritten and now offers significantly better performance than before.
  • Oct 20, 2016: Port zlib changes from Go 1.7 to fix zlib writer issue. Please update.
  • Oct 16, 2016: Go 1.7 changes merged. Apples to apples this package is a few percent faster, but has a significantly better balance between speed and compression per level.
  • Mar 24, 2016: Always attempt Huffman encoding on level 4-7. This improves base 64 encoded data compression.
  • Mar 24, 2016: Small speedup for level 1-3.
  • Feb 19, 2016: Faster bit writer, level -2 is 15% faster, level 1 is 4% faster.
  • Feb 19, 2016: Handle small payloads faster in level 1-3.
  • Feb 19, 2016: Added faster level 2 + 3 compression modes.
  • Feb 19, 2016: Rebalanced compression levels, so there is a more even progresssion in terms of compression. New default level is 5.
  • Feb 14, 2016: Snappy: Merge upstream changes.
  • Feb 14, 2016: Snappy: Fix aggressive skipping.
  • Feb 14, 2016: Snappy: Update benchmark.
  • Feb 13, 2016: Deflate: Fixed assembler problem that could lead to sub-optimal compression.
  • Feb 12, 2016: Snappy: Added AMD64 SSE 4.2 optimizations to matching, which makes easy to compress material run faster. Typical speedup is around 25%.
  • Feb 9, 2016: Added Snappy package fork. This version is 5-7% faster, much more on hard to compress content.
  • Jan 30, 2016: Optimize level 1 to 3 by not considering static dictionary or storing uncompressed. ~4-5% speedup.
  • Jan 16, 2016: Optimization on deflate level 1,2,3 compression.
  • Jan 8 2016: Merge CL 18317: fix reading, writing of zip64 archives.
  • Dec 8 2015: Make level 1 and -2 deterministic even if write size differs.
  • Dec 8 2015: Split encoding functions, so hashing and matching can potentially be inlined. 1-3% faster on AMD64. 5% faster on other platforms.
  • Dec 8 2015: Fixed rare one byte out-of bounds read. Please update!
  • Nov 23 2015: Optimization on token writer. ~2-4% faster. Contributed by @dsnet.
  • Nov 20 2015: Small optimization to bit writer on 64 bit systems.
  • Nov 17 2015: Fixed out-of-bound errors if the underlying Writer returned an error. See #15.
  • Nov 12 2015: Added io.WriterTo support to gzip/inflate.
  • Nov 11 2015: Merged CL 16669: archive/zip: enable overriding (de)compressors per file
  • Oct 15 2015: Added skipping on uncompressible data. Random data speed up >5x.

deflate usage

The packages are drop-in replacements for standard libraries. Simply replace the import path to use them:

old import new import Documentation
compress/gzip github.com/klauspost/compress/gzip gzip
compress/zlib github.com/klauspost/compress/zlib zlib
archive/zip github.com/klauspost/compress/zip zip
compress/flate github.com/klauspost/compress/flate flate
  • Optimized deflate packages which can be used as a dropin replacement for gzip, zip and zlib.

You may also be interested in pgzip, which is a drop in replacement for gzip, which support multithreaded compression on big files and the optimized crc32 package used by these packages.

The packages contains the same as the standard library, so you can use the godoc for that: gzip, zip, zlib, flate.

Currently there is only minor speedup on decompression (mostly CRC32 calculation).

Stateless compression

This package offers stateless compression as a special option for gzip/deflate. It will do compression but without maintaining any state between Write calls.

This means there will be no memory kept between Write calls, but compression and speed will be suboptimal.

This is only relevant in cases where you expect to run many thousands of compressors concurrently, but with very little activity. This is not intended for regular web servers serving individual requests.

Because of this, the size of actual Write calls will affect output size.

In gzip, specify level -3 / gzip.StatelessCompression to enable.

For direct deflate use, NewStatelessWriter and StatelessDeflate are available. See documentation

A bufio.Writer can of course be used to control write sizes. For example, to use a 4KB buffer:

	// replace 'ioutil.Discard' with your output.
	gzw, err := gzip.NewWriterLevel(ioutil.Discard, gzip.StatelessCompression)
	if err != nil {
		return err
	}
	defer gzw.Close()

	w := bufio.NewWriterSize(gzw, 4096)
	defer w.Flush()
	
	// Write to 'w' 

This will only use up to 4KB in memory when the writer is idle.

Compression is almost always worse than the fastest compression level and each write will allocate (a little) memory.

Performance Update 2018

It has been a while since we have been looking at the speed of this package compared to the standard library, so I thought I would re-do my tests and give some overall recommendations based on the current state. All benchmarks have been performed with Go 1.10 on my Desktop Intel(R) Core(TM) i7-2600 CPU @3.40GHz. Since I last ran the tests, I have gotten more RAM, which means tests with big files are no longer limited by my SSD.

The raw results are in my updated spreadsheet. Due to cgo changes and upstream updates i could not get the cgo version of gzip to compile. Instead I included the zstd cgo implementation. If I get cgo gzip to work again, I might replace the results in the sheet.

The columns to take note of are: MB/s - the throughput. Reduction - the data size reduction in percent of the original. Rel Speed relative speed compared to the standard library at the same level. Smaller - how many percent smaller is the compressed output compared to stdlib. Negative means the output was bigger. Loss means the loss (or gain) in compression as a percentage difference of the input.

The gzstd (standard library gzip) and gzkp (this package gzip) only uses one CPU core. pgzip, bgzf uses all 4 cores. zstd uses one core, and is a beast (but not Go, yet).

Overall differences.

There appears to be a roughly 5-10% speed advantage over the standard library when comparing at similar compression levels.

The biggest difference you will see is the result of re-balancing the compression levels. I wanted by library to give a smoother transition between the compression levels than the standard library.

This package attempts to provide a more smooth transition, where "1" is taking a lot of shortcuts, "5" is the reasonable trade-off and "9" is the "give me the best compression", and the values in between gives something reasonable in between. The standard library has big differences in levels 1-4, but levels 5-9 having no significant gains - often spending a lot more time than can be justified by the achieved compression.

There are links to all the test data in the spreadsheet in the top left field on each tab.

Web Content

This test set aims to emulate typical use in a web server. The test-set is 4GB data in 53k files, and is a mixture of (mostly) HTML, JS, CSS.

Since level 1 and 9 are close to being the same code, they are quite close. But looking at the levels in-between the differences are quite big.

Looking at level 6, this package is 88% faster, but will output about 6% more data. For a web server, this means you can serve 88% more data, but have to pay for 6% more bandwidth. You can draw your own conclusions on what would be the most expensive for your case.

Object files

This test is for typical data files stored on a server. In this case it is a collection of Go precompiled objects. They are very compressible.

The picture is similar to the web content, but with small differences since this is very compressible. Levels 2-3 offer good speed, but is sacrificing quite a bit of compression.

The standard library seems suboptimal on level 3 and 4 - offering both worse compression and speed than level 6 & 7 of this package respectively.

Highly Compressible File

This is a JSON file with very high redundancy. The reduction starts at 95% on level 1, so in real life terms we are dealing with something like a highly redundant stream of data, etc.

It is definitely visible that we are dealing with specialized content here, so the results are very scattered. This package does not do very well at levels 1-4, but picks up significantly at level 5 and levels 7 and 8 offering great speed for the achieved compression.

So if you know you content is extremely compressible you might want to go slightly higher than the defaults. The standard library has a huge gap between levels 3 and 4 in terms of speed (2.75x slowdown), so it offers little "middle ground".

Medium-High Compressible

This is a pretty common test corpus: enwik9. It contains the first 10^9 bytes of the English Wikipedia dump on Mar. 3, 2006. This is a very good test of typical text based compression and more data heavy streams.

We see a similar picture here as in "Web Content". On equal levels some compression is sacrificed for more speed. Level 5 seems to be the best trade-off between speed and size, beating stdlib level 3 in both.

Medium Compressible

I will combine two test sets, one 10GB file set and a VM disk image (~8GB). Both contain different data types and represent a typical backup scenario.

The most notable thing is how quickly the standard library drops to very low compression speeds around level 5-6 without any big gains in compression. Since this type of data is fairly common, this does not seem like good behavior.

Un-compressible Content

This is mainly a test of how good the algorithms are at detecting un-compressible input. The standard library only offers this feature with very conservative settings at level 1. Obviously there is no reason for the algorithms to try to compress input that cannot be compressed. The only downside is that it might skip some compressible data on false detections.

linear time compression (huffman only)

This compression library adds a special compression level, named HuffmanOnly, which allows near linear time compression. This is done by completely disabling matching of previous data, and only reduce the number of bits to represent each character.

This means that often used characters, like 'e' and ' ' (space) in text use the fewest bits to represent, and rare characters like '¤' takes more bits to represent. For more information see wikipedia or this nice video.

Since this type of compression has much less variance, the compression speed is mostly unaffected by the input data, and is usually more than 180MB/s for a single core.

The downside is that the compression ratio is usually considerably worse than even the fastest conventional compression. The compression ratio can never be better than 8:1 (12.5%).

The linear time compression can be used as a "better than nothing" mode, where you cannot risk the encoder to slow down on some content. For comparison, the size of the "Twain" text is 233460 bytes (+29% vs. level 1) and encode speed is 144MB/s (4.5x level 1). So in this case you trade a 30% size increase for a 4 times speedup.

For more information see my blog post on Fast Linear Time Compression.

This is implemented on Go 1.7 as "Huffman Only" mode, though not exposed for gzip.

snappy package

The standard snappy package has now been improved. This repo contains a copy of the snappy repo.

I would advise to use the standard package: https://github.com/golang/snappy

license

This code is licensed under the same conditions as the original Go code. See LICENSE file.

Issues
  • Translate assembly text templates into avo programs

    Translate assembly text templates into avo programs

    The assembler procedures are currently generated from text templates. As @klauspost suggested, it would be easier to maintain this quite complex code with avo (https://github.com/mmcloughlin/avo).

    opened by WojciechMula 25
  • zstd: add no-goroutine option to encoding

    zstd: add no-goroutine option to encoding

    To start, I appreciate that the concurrency allows for greater speed, it's great!

    Backstory

    I have a project that manages different types of encoding options for thousands of configs. As a rule, when compressing data for an individual config, we only want to compress serially; we have thousands of other concurrent compressors and we do not want one utilizing more CPU than necessary.

    Our current use of the zstd encoder is to create one per config and use the WithEncoderConcurrency(1) option. This works for the most part, but we now have to be careful about properly closing the zstd compressor in the face of write errors. For other compressors, we can just drop the writer and have it be garbage collected with no dangling resources. For the zstd compressor, unless things are closed properly, goroutines leak.

    An alternative option for me would be to use a global zstd encoder with the default encoder concurrency and just never close it. I'm not much of a fan of this approach, though, since with poor scheduling, some goroutines could sit in the compressing zstd goroutines for longer than they need to and block other configs that need to be zstd compressing. It's likely not a large risk, but it's one I'm concerned about.

    Feature Request

    I think it'd be both convenient and faster if there was an option to not spawn goroutines for encoding. Convenient in that when I am done with the compressor, I can just drop it. This would also make the zstd encoder an option to use in sync.Pool's, where it is not an option today. Faster in that there will not be goroutine synchronization and message passing overhead, especially so since I know I'm always encoding serially per encoder.

    enhancement 
    opened by twmb 22
  • zstd: x86 assembler implementation of sequenceDecs.decode

    zstd: x86 assembler implementation of sequenceDecs.decode

    This is plain x86 and x86 with BMI2 implementation of sequenceDecs.decode. Part of #515.

    Since the benchmarks use decodeSync I temporarily replaced its implementation with one using decode and execute, at cost of allocation of the seqVals array every time.

    There are some IMHO nice improvements and small regressions in few cases. From my previous experience can tell that we'll get quite big speedup when rewrite execute. And of course we'll get the biggest speedup when fuse decode and execute into a single procedure.

    ~~Marking PR as a draft as just one test TestNewDecoderBad/Reader-4/6f88497edbc9059998f9e6d0ea0d0eed8d8af38d.zst fails. Have to investigate why.~~ [fixed]

    Below are benchmarks.

    • old.txt was produced by the command go generate && go test -tags noasm -run XYZ -bench BenchmarkDecoder.
    • new.txt was produced by the command go generate && go test -run XYZ -bench BenchmarkDecoder.
    • new-bmi2.txt was produced by the command go generate && GOAMD64=v3 go test -run XYZ -bench BenchmarkDecoder.

    Comparison of old.txt with new.txt

    benchmark                                                                 old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            237.60       238.72       1.00x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        764.22       868.49       1.14x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         192.39       197.70       1.03x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           226.37       233.74       1.03x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         211.46       216.24       1.02x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          189.03       190.01       1.01x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1743.29      1951.14      1.12x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       3079.05      3309.04      1.07x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       6696.56      7926.76      1.18x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             337.14       365.98       1.09x
    BenchmarkDecoder_DecoderSmall/html.zst-16                                 613.59       687.53       1.12x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        345.78       374.54       1.08x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               244.27       241.55       0.99x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           785.09       912.82       1.16x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            200.18       203.32       1.02x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              237.47       239.27       1.01x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            211.75       214.06       1.01x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             190.52       188.88       0.99x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                1397.55      1488.07      1.06x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          3428.09      3716.01      1.08x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          9548.90      10887.83     1.14x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                356.48       386.03       1.08x
    BenchmarkDecoder_DecodeAll/html.zst-16                                    598.01       666.57       1.11x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           333.55       364.23       1.09x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      208.57       222.10       1.06x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      206.44       206.26       1.00x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       219.64       224.09       1.02x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         215.01       212.89       0.99x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9564.96      10855.81     1.13x
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          230.88       242.77       1.05x
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           300.04       357.40       1.19x
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             481.01       629.48       1.31x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1153.28      1167.83      1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              1178.62      1197.84      1.02x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               1061.04      1085.91      1.02x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 431.41       434.83       1.01x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 260.26       288.78       1.11x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 178.69       182.34       1.02x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  178.37       183.01       1.03x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    163.05       166.78       1.02x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       379.84       413.38       1.09x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       364.05       398.18       1.09x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        396.48       440.21       1.11x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          346.17       375.39       1.08x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9561.29      10873.23     1.14x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         224.96       240.51       1.07x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          303.94       364.00       1.20x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            476.76       635.21       1.33x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    1602.64      1693.12      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    1470.20      1556.52      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     1781.10      1894.80      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       1542.00      1661.43      1.08x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9556.32      10879.80     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9571.19      10882.04     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9572.19      10887.11     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9566.54      10886.24     1.14x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     1343.71      1607.61      1.20x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     1311.12      1407.07      1.07x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      1401.88      1587.61      1.13x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        1402.53      1484.78      1.06x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         66679.48     94849.27     1.42x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         1306.06      1502.06      1.15x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          1927.42      2336.68      1.21x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            3472.36      4863.07      1.40x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             6276.41      6383.51      1.02x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             5490.46      5771.14      1.05x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              6008.30      6052.66      1.01x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                3799.77      3895.76      1.03x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                1412.04      1441.62      1.02x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                962.89       949.33       0.99x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 984.52       969.09       0.98x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   801.71       795.11       0.99x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      1738.74      1974.58      1.14x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      1633.98      1841.22      1.13x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       1817.99      2012.59      1.11x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         1717.94      1874.86      1.09x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        66526.60     96359.49     1.45x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        1268.55      1490.20      1.17x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         1947.34      2373.92      1.22x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           3458.55      4850.24      1.40x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   8243.76      8724.07      1.06x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   8197.25      8948.34      1.09x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    9020.42      9939.28      1.10x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      10641.45     11529.73     1.08x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    66560.21     95518.08     1.44x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    66587.20     94626.59     1.42x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     66651.43     94356.64     1.42x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       66512.25     95444.30     1.43x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       1466.81      1604.80      1.09x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   4831.35      5497.25      1.14x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    1215.11      1374.36      1.13x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      1464.34      1623.73      1.11x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    1240.22      1406.06      1.13x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     1094.77      1206.81      1.10x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        10029.54     11377.73     1.13x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  19167.95     22324.31     1.16x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  66383.12     95910.86     1.44x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        2687.11      3090.49      1.15x
    BenchmarkDecoder_DecodeAllParallel/html.zst-16                            3748.35      4307.61      1.15x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1959.71      2145.97      1.10x
    

    Comparison of old.txt with new-bmi2.txt

    benchmark                                                                 old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            237.60       238.72       1.00x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        764.22       868.49       1.14x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         192.39       197.70       1.03x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           226.37       233.74       1.03x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         211.46       216.24       1.02x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          189.03       190.01       1.01x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1743.29      1951.14      1.12x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       3079.05      3309.04      1.07x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       6696.56      7926.76      1.18x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             337.14       365.98       1.09x
    BenchmarkDecoder_DecoderSmall/html.zst-16                                 613.59       687.53       1.12x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        345.78       374.54       1.08x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               244.27       241.55       0.99x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           785.09       912.82       1.16x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            200.18       203.32       1.02x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              237.47       239.27       1.01x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            211.75       214.06       1.01x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             190.52       188.88       0.99x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                1397.55      1488.07      1.06x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          3428.09      3716.01      1.08x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          9548.90      10887.83     1.14x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                356.48       386.03       1.08x
    BenchmarkDecoder_DecodeAll/html.zst-16                                    598.01       666.57       1.11x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           333.55       364.23       1.09x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      208.57       222.10       1.06x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      206.44       206.26       1.00x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       219.64       224.09       1.02x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         215.01       212.89       0.99x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9564.96      10855.81     1.13x
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          230.88       242.77       1.05x
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           300.04       357.40       1.19x
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             481.01       629.48       1.31x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1153.28      1167.83      1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              1178.62      1197.84      1.02x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               1061.04      1085.91      1.02x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 431.41       434.83       1.01x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 260.26       288.78       1.11x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 178.69       182.34       1.02x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  178.37       183.01       1.03x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    163.05       166.78       1.02x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       379.84       413.38       1.09x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       364.05       398.18       1.09x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        396.48       440.21       1.11x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          346.17       375.39       1.08x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9561.29      10873.23     1.14x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         224.96       240.51       1.07x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          303.94       364.00       1.20x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            476.76       635.21       1.33x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    1602.64      1693.12      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    1470.20      1556.52      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     1781.10      1894.80      1.06x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       1542.00      1661.43      1.08x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9556.32      10879.80     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9571.19      10882.04     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9572.19      10887.11     1.14x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9566.54      10886.24     1.14x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     1343.71      1607.61      1.20x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     1311.12      1407.07      1.07x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      1401.88      1587.61      1.13x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        1402.53      1484.78      1.06x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         66679.48     94849.27     1.42x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         1306.06      1502.06      1.15x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          1927.42      2336.68      1.21x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            3472.36      4863.07      1.40x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             6276.41      6383.51      1.02x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             5490.46      5771.14      1.05x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              6008.30      6052.66      1.01x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                3799.77      3895.76      1.03x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                1412.04      1441.62      1.02x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                962.89       949.33       0.99x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 984.52       969.09       0.98x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   801.71       795.11       0.99x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      1738.74      1974.58      1.14x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      1633.98      1841.22      1.13x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       1817.99      2012.59      1.11x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         1717.94      1874.86      1.09x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        66526.60     96359.49     1.45x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        1268.55      1490.20      1.17x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         1947.34      2373.92      1.22x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           3458.55      4850.24      1.40x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   8243.76      8724.07      1.06x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   8197.25      8948.34      1.09x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    9020.42      9939.28      1.10x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      10641.45     11529.73     1.08x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    66560.21     95518.08     1.44x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    66587.20     94626.59     1.42x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     66651.43     94356.64     1.42x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       66512.25     95444.30     1.43x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       1466.81      1604.80      1.09x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   4831.35      5497.25      1.14x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    1215.11      1374.36      1.13x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      1464.34      1623.73      1.11x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    1240.22      1406.06      1.13x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     1094.77      1206.81      1.10x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        10029.54     11377.73     1.13x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  19167.95     22324.31     1.16x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  66383.12     95910.86     1.44x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        2687.11      3090.49      1.15x
    BenchmarkDecoder_DecodeAllParallel/html.zst-16                            3748.35      4307.61      1.15x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1959.71      2145.97      1.10x
    
    opened by WojciechMula 16
  • [huff0] Add x86 specialisation of Decode4X

    [huff0] Add x86 specialisation of Decode4X

    Hi, first of all, thank you for such a great library! I have been working on speeding up the Zstd decompression, mainly by porting hot loops into the assembly. This is the first PR, that's pretty small and I'd like to make it an opportunity to discuss code shape. Is it something acceptable, or not.

    I'm marking it as a draft because not all tests in Zstd pass now; I branched some time ago and seems there are were some changes I have to investigate.

    Any way, below is comparison of decompressing speed for Zstd after applying the patch. Benchmarks were run on an Ice Lake machine.

    benchmark                                                                 old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            5064292       5055128       -0.18%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        924146        889296        -3.77%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           12552928      12475253      -0.62%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         3884720       3815638       -1.78%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          5320410       5307378       -0.24%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1458895       1419301       -2.71%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       218670        219073        +0.18%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       129945        121948        -6.15%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             14234754      13921832      -2.20%
    BenchmarkDecoder_DecoderSmall/html.zst-16                                 1028808       1002782       -2.53%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        83589         77477         -7.31%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               605687        603714        -0.33%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           111881        106144        -5.13%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              1445193       1424825       -1.41%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            481721        470827        -2.26%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             643764        641131        -0.41%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                234945        233164        -0.76%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          23411         23627         +0.92%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          11293         11302         +0.08%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                1644661       1592756       -3.16%
    BenchmarkDecoder_DecodeAll/html.zst-16                                    126047        121406        -3.68%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           10396         9758          -6.14%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      1503668       1447556       -3.73%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      1526024       1498882       -1.78%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       1453765       1415595       -2.63%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         1497672       1473028       -1.65%
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9182          9186          +0.04%
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          355812        359097        +0.92%
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           271279        273078        +0.66%
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             186720        192951        +3.34%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              3131          3181          +1.60%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              2929          2949          +0.68%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               3387          3451          +1.89%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 9063          9140          +0.85%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 5390          4876          -9.54%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 7683          7746          +0.82%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  7679          7760          +1.05%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    7923          7905          -0.23%
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       91261         84536         -7.37%
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       94548         89934         -4.88%
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        87788         83488         -4.90%
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          99753         97126         -2.63%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9213          9190          -0.25%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         360974        364403        +0.95%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          268990        269969        +0.36%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            185887        192209        +3.40%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    27607         27162         -1.61%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    30728         30104         -2.03%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     25116         24681         -1.73%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       30037         29093         -3.14%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9178          9183          +0.05%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9170          9174          +0.04%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9171          9179          +0.09%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9174          9185          +0.12%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     160969        171452        +6.51%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     172487        157895        -8.46%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      159815        145387        -9.03%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        160579        155217        -3.34%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         1077          1038          -3.62%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         42039         43817         +4.23%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          33769         34422         +1.93%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            25179         25295         +0.46%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             493           507           +2.94%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             495           503           +1.68%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              517           550           +6.41%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                888           880           -0.89%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                756           693           -8.25%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                877           892           +1.66%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 834           875           +4.99%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   1001          941           -6.01%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      13288         12076         -9.12%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      14990         12745         -14.98%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       12205         11149         -8.65%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         13920         12165         -12.61%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        1039          1025          -1.35%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        43506         42534         -2.23%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         33428         33474         +0.14%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           25168         25283         +0.46%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   3781          3688          -2.46%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   3976          3873          -2.59%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    3204          3178          -0.81%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      3605          3329          -7.66%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    1031          1029          -0.19%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    1028          1081          +5.16%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     1034          1029          -0.48%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       1028          1038          +0.97%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       83577         71934         -13.93%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   16304         14373         -11.84%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      209034        177289        -15.19%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    67474         58313         -13.58%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     90482         78433         -13.32%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        29896         28681         -4.06%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  3488          3332          -4.47%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  1261          1246          -1.19%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        210072        182726        -13.02%
    BenchmarkDecoder_DecodeAllParallel/html.zst-16                            18952         16590         -12.46%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1496          1332          -10.96%
    
    opened by WojciechMula 16
  • zstd: asm version of decodeSync

    zstd: asm version of decodeSync

    A little hacking in the current generator allowed us to reuse almost all code. That's nice!

    Part of #515.

    For now go test -run TestDecoder pass, I'm working on fixing the remaining tests. Another thing is that PR does not incorporate yet the history support (waiting for #542)

    Benchmark results from an Ice Lake machine. Some quite good speed ups are there!

    benchmark                                                                 old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            4948849       3115659       -37.04%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        884190        525662        -40.55%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         16820567      13477758      -19.87%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           12468563      9806714       -21.35%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         3829515       1815824       -52.58%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          5235045       2684060       -48.73%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1494667       1004488       -32.80%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       226176        180516        -20.19%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       126631        125630        -0.79%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             14001037      12040286      -14.00%
    BenchmarkDecoder_DecoderSmall/html.zst-16                                 1013496       580645        -42.71%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        78160         62866         -19.57%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               592552        291325        -50.84%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           106829        62462         -41.53%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            1901163       902247        -52.54%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              1417726       673019        -52.53%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            475107        223363        -52.99%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             634311        277013        -56.33%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                299068        201638        -32.58%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          24184         18814         -22.20%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          11298         11294         -0.04%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                1568542       949901        -39.44%
    BenchmarkDecoder_DecodeAll/html.zst-16                                    124118        70389         -43.29%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           9838          7870          -20.00%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      1452726       760812        -47.63%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      1495242       710737        -52.47%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       1407918       696247        -50.55%
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         1459001       698224        -52.14%
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9190          9184          -0.07%
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          346024        169963        -50.88%
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           249534        148022        -40.68%
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             148253        132778        -10.44%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              3584          3385          -5.55%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              3293          2842          -13.70%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               3897          3796          -2.59%
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 9021          11001         +21.95%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 4922          4361          -11.40%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 7764          6750          -13.06%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  7758          6751          -12.98%
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    7987          6581          -17.60%
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       86160         46936         -45.52%
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       92630         47010         -49.25%
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        85408         44996         -47.32%
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          102427        46479         -54.62%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9195          9192          -0.03%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         351130        170379        -51.48%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          245196        147856        -39.70%
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            146306        132574        -9.39%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    31964         25333         -20.75%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    33961         31073         -8.50%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     26667         24121         -9.55%
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       34375         31844         -7.36%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9212          9188          -0.26%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9176          9176          +0.00%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9179          9182          +0.03%
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9190          9193          +0.03%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     182557        85521         -53.15%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     160984        83577         -48.08%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      147865        78830         -46.69%
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        157513        81901         -48.00%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         1034          1029          -0.48%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         45459         22683         -50.10%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          29775         18967         -36.30%
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            18079         15816         -12.52%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             539           502           -6.97%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             546           524           -4.19%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              522           527           +0.88%
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                883           854           -3.27%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                730           631           -13.46%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                846           680           -19.67%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 874           681           -22.07%
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   970           707           -27.08%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      12545         6806          -45.75%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      13652         6875          -49.64%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       11274         6569          -41.73%
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         12746         6791          -46.72%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        1044          1063          +1.82%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        40948         22787         -44.35%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         28497         18869         -33.79%
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           17973         15798         -12.10%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   4327          2945          -31.94%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   4279          3075          -28.14%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    3562          2528          -29.03%
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      4120          3120          -24.27%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    1045          1049          +0.38%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    1047          1039          -0.76%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     1046          1034          -1.15%
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       1047          1044          -0.29%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       70555         35143         -50.19%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   14388         8518          -40.80%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    235269        108394        -53.93%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      176150        81790         -53.57%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    57606         27782         -51.77%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     77776         35865         -53.89%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        36842         22593         -38.68%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  3396          2428          -28.50%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  1260          1245          -1.19%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        173875        101967        -41.36%
    BenchmarkDecoder_DecodeAllParallel/html.zst-16                            16539         9267          -43.97%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1347          1081          -19.75%
    
    benchmark                                                                 old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            297.96       473.27       1.59x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        1072.96      1804.78      1.68x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         229.18       286.02       1.25x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           273.81       348.13       1.27x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         261.50       551.50       2.11x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          232.42       453.31       1.95x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             2192.33      3262.16      1.49x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       3621.95      4538.10      1.25x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       7776.50      7838.44      1.01x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             401.16       466.49       1.16x
    BenchmarkDecoder_DecoderSmall/html.zst-16                                 808.29       1410.85      1.75x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        417.20       518.69       1.24x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               311.06       632.70       2.03x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           1110.07      1898.57      1.71x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            253.46       534.07       2.11x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              301.01       634.09       2.11x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            263.48       560.43       2.13x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             239.77       549.03       2.29x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                1369.59      2031.36      1.48x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          4234.23      5442.83      1.29x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          10894.85     10899.45     1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                447.60       739.12       1.65x
    BenchmarkDecoder_DecodeAll/html.zst-16                                    825.02       1454.76      1.76x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           414.31       517.91       1.25x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      267.06       509.93       1.91x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      259.47       545.86       2.10x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       275.56       557.22       2.02x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         265.91       555.64       2.09x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          10881.20     10888.55     1.00x
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          289.01       588.38       2.04x
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           400.76       675.60       1.69x
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             674.54       753.16       1.12x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1148.38      1215.84      1.06x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              1249.77      1448.41      1.16x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               1056.17      1084.20      1.03x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 456.25       374.13       0.82x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 314.49       354.95       1.13x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 199.38       229.32       1.15x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  199.54       229.31       1.15x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    193.81       235.24       1.21x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       516.22       947.62       1.84x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       480.16       946.12       1.97x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        520.76       988.46       1.90x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          434.23       956.93       2.20x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         10876.10     10879.70     1.00x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         284.80       586.94       2.06x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          407.85       676.35       1.66x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            683.52       754.32       1.10x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    1601.81      2021.07      1.26x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    1507.63      1647.72      1.09x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     1919.97      2122.61      1.11x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       1489.44      1607.82      1.08x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     10856.04     10884.37     1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     10898.82     10898.93     1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      10895.15     10891.77     1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        10882.01     10878.53     1.00x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     2125.17      4536.47      2.13x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     2409.95      4642.02      1.93x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      2623.78      4921.51      1.88x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        2463.05      4737.00      1.92x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         96723.30     97181.69     1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         2199.84      4408.78      2.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          3358.67      5272.34      1.57x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            5531.33      6322.87      1.14x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             7633.80      8205.11      1.07x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             7531.33      7861.34      1.04x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              7878.01      7808.28      0.99x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                4663.70      4821.59      1.03x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                2121.74      2451.88      1.16x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                1828.81      2276.38      1.24x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 1770.70      2272.06      1.28x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   1595.73      2188.35      1.37x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      3545.30      6534.77      1.84x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      3257.99      6469.33      1.99x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       3945.06      6770.89      1.72x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         3489.38      6549.74      1.88x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        95753.69     94059.51     0.98x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        2442.18      4388.66      1.80x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         3509.25      5299.73      1.51x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           5564.01      6330.20      1.14x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   11832.49     17385.80     1.47x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   11965.02     16651.74     1.39x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    14375.72     20255.73     1.41x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      12425.77     16411.38     1.32x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    95659.72     95287.85     1.00x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    95492.90     96289.36     1.01x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     95576.73     96741.03     1.01x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       95552.41     95804.44     1.00x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       2612.42      5244.90      2.01x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   8242.15      13921.55     1.69x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    2048.12      4445.45      2.17x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      2422.67      5217.70      2.15x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    2173.02      4505.70      2.07x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     1955.48      4240.60      2.17x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        11117.71     18129.71     1.63x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  30154.29     42167.57     1.40x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  97717.39     98890.90     1.01x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        4037.89      6885.44      1.71x
    BenchmarkDecoder_DecodeAllParallel/html.zst-16                            6191.33      11050.07     1.78x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   3025.76      3768.91      1.25x
    
    opened by WojciechMula 14
  • runtime error: slice bounds out of range

    runtime error: slice bounds out of range

    My server crash after 10-20 min after start. caller code:

    var gCompressPool5 sync.Pool
    func mustFlateCompressWithBufferToBufWPool5(inb []byte,bufW *bytes.Buffer) {
    	var flateW *flate.Writer
    	obj:=gCompressPool5.Get()
    	if obj==nil{
    		var err error
    		flateW,err = flate.NewWriter(bufW,4)
    		if err!=nil{
    			panic(err)
    		}
    	}else{
    		flateW = obj.(*flate.Writer)
    		flateW.Reset(bufW)
    	}
    	_, err := flateW.Write(inb)
    	if err != nil {
    		flateW.Close()
    		flateW.Reset(nil)
    		gCompressPool5.Put(flateW)
    		panic(err)
    	}
    	err = flateW.Close()
    	flateW.Reset(nil)
    	gCompressPool5.Put(flateW)
    	if err != nil {
    		panic(err)
    	}
    }
    
    runtime error: slice bounds out of range
    
    goroutine 82 [running]:
    github.com/klauspost/compress/flate.(*fastGen).matchlenLong(0xc0288e8000, 0x5386ca610003395c, 0xc028aea000, 0x3fffc, 0x100000, 0x13)
    	github.com/klauspost/compress/flate/fast_encoder.go:208 +0xb6
    github.com/klauspost/compress/flate.(*fastEncL4).Encode(0xc0288e8000, 0xc02bcf2090, 0xc02bf88000, 0xffff, 0xffff)
    	github.com/klauspost/compress/flate/level4.go:127 +0x4db
    github.com/klauspost/compress/flate.(*compressor).storeFast(0xc02bcf2000)
    	github.com/klauspost/compress/flate/deflate.go:590 +0x1c1
    github.com/klauspost/compress/flate.(*compressor).write(0xc02bcf2000, 0xc02176bffc, 0x1c80ae, 0x18a004, 0x113d380, 0xc007352960, 0xc007352960)
    	github.com/klauspost/compress/flate/deflate.go:614 +0x81
    github.com/klauspost/compress/flate.(*Writer).Write(0xc02bcf2000, 0xc02172c000, 0x1c80ae, 0x1ca000, 0xc000d2b568, 0x44b894, 0xc015512199)
    	github.com/klauspost/compress/flate/deflate.go:781 +0x4b
    ...
    

    I have tried a single thread, it crash like this too(after one hour):

    var gCompressWriter *flate.Writer
    var gCompressWriterLocker sync.Mutex
    func mustFlateCompressWithBufferToBufWPool3(inb []byte,bufW *bytes.Buffer) {
    	gCompressWriterLocker.Lock()
    	defer gCompressWriterLocker.Unlock()
    	if gCompressWriter==nil{
    		var err error
    		gCompressWriter,err = flate.NewWriter(bufW,4)
    		if err!=nil{
    			panic(err)
    		}
    	}else{
    		gCompressWriter.Reset(bufW)
    	}
    	_, err := gCompressWriter.Write(inb)
    	if err != nil {
    		gCompressWriter.Close()
    		gCompressWriter.Reset(nil)
    		panic(err)
    	}
    	err = gCompressWriter.Close()
    	gCompressWriter.Reset(nil)
    	if err != nil {
    		panic(err)
    	}
    }
    

    I have tried flate from golang 1.11.6, it never crash after 3 hours. (but use more cpu resource than this library) I have tried flate from https://github.com/klauspost/compress and do not reuse the writer object, it never crash after 12 hours.(but use more cpu resource)

    Workaround right now: use flate from golang 1.11.6 and sync.Pool

    opened by bronze1man 14
  • zstd: expose header decoder as public API mimicking `zstd -lv foo.zst`

    zstd: expose header decoder as public API mimicking `zstd -lv foo.zst`

    As per subject it would be awesome if https://github.com/klauspost/compress/blob/98b287bcd1b5d61b698ccc529a220f597f584400/zstd/framedec.go#L183-L206 was available for use. At present the only options to know the size is to either call out to a zstd binary or to decompress a stream and count the bytes 🙀

    enhancement 
    opened by ribasushi 13
  • Extra bytes at the end of encoded buffer (zstd)

    Extra bytes at the end of encoded buffer (zstd)

    I have found that current implementation of zstd (v1.10.10) adds three extra bytes to the end of encoded byte stream (01 00 00). These bytes makes unable to uncompress archive with standard GNU zstd utility.

    Example code:

    package main
    
    import (
    	"bytes"
    	"fmt"
    	"strings"
    
    	"github.com/klauspost/compress/zstd"
    )
    
    var src = `0`
    
    func main() {
    	buffer := bytes.NewBuffer(nil)
    	encoder, err := zstd.NewWriter(buffer)
    	if err != nil {
    		panic(err)
    	}
    	if _, err := encoder.ReadFrom(strings.NewReader(src)); err != nil {
    		panic(err)
    	}
    	if err := encoder.Close(); err != nil {
    		panic(err)
    	}
    	fmt.Printf("% x\n", buffer.Bytes())
    }
    

    Prints: 28 b5 2f fd 04 00 09 00 00 30 ec af 44 12 01 00 00

    At the same time with GNU zstd:

    $ echo -n 0 | zstd | hexdump -C
    00000000  28 b5 2f fd 04 58 09 00  00 30 ec af 44 12        |(./..X...0..D.|
    0000000e
    

    Note last 3 extra bytes.

    Also, version 1.9.0 works like the GNU zstd.

    opened by Feresey 12
  • zstd: x86 assembler implementation of sequenceDecs.executeSimple

    zstd: x86 assembler implementation of sequenceDecs.executeSimple

    This is plain x86 and x86 with BMI2 implementation of sequenceDecs.executeSimple. Part of #515.

    I extracted function executeSimple to handle cases when no history nor dictionary is used. My quick check showed that for go test such cases is 83% of all calls, while for go test -bench . it's 99%. Thus, it's the vast majority of cases. Of course, we may consider handling all cases in another PR (but after completing #529).

    ~As always, I'm marking it as a draft, as some tests fail. I will figure out what's wrong, likely as usual I missed something silly.~ [fixed (I was right, it was silly)]

    Below are preliminary benchmark results from IceLake machine: it's noasm vs GOARM64=v3. Currently, the branch is built on top of #528, thus we see the combined performance boost from x86 BMI use in both decode and execute.

    benchmark                                                                 old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            260.20       348.57       1.34x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         211.67       251.78       1.19x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           250.34       298.27       1.19x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         233.78       373.57       1.60x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          209.63       322.12       1.54x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       6952.38      7619.23      1.10x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             363.22       431.34       1.19x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        366.19       458.76       1.25x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               261.76       386.54       1.48x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            221.62       350.06       1.58x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              262.00       389.86       1.49x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            231.21       361.89       1.57x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             209.42       335.43       1.60x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          9356.96      10893.21     1.16x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                388.64       539.45       1.39x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           357.89       444.40       1.24x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      226.70       360.78       1.59x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      224.89       354.12       1.57x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       239.08       369.24       1.54x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         219.11       329.76       1.50x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9348.18      10886.74     1.16x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1052.83      1292.48      1.23x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 432.47       419.43       0.97x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 264.89       324.29       1.22x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 184.00       207.14       1.13x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  182.83       205.27       1.12x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    165.99       192.78       1.16x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       401.53       562.99       1.40x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       382.26       558.41       1.46x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        413.76       587.31       1.42x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          389.76       540.86       1.39x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9334.02      10878.93     1.17x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9349.18      10882.05     1.16x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9357.66      10897.99     1.16x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9356.77      10893.79     1.16x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9332.25      10890.70     1.17x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     1502.94      2665.02      1.77x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     1431.25      2625.59      1.83x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      1443.33      2770.91      1.92x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        1518.36      2649.41      1.74x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         67904.31     97302.23     1.43x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             5855.79      5776.46      0.99x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                4075.41      4028.22      0.99x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                1468.79      1620.82      1.10x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                996.97       1099.55      1.10x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 1014.07      1097.88      1.08x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   823.45       943.41       1.15x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      1845.55      2886.39      1.56x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      1687.15      2817.55      1.67x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       1911.29      3005.57      1.57x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         1730.33      2881.95      1.67x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        67953.72     94562.80     1.39x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    67918.38     97067.32     1.43x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    67942.13     95589.74     1.41x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     67922.47     96824.04     1.43x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       67918.38     95890.17     1.41x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       1517.85      2819.81      1.86x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    1277.90      2531.84      1.98x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      1523.41      2878.54      1.89x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    1306.50      2575.44      1.97x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     1145.33      2332.57      2.04x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  68109.62     95841.69     1.41x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        2821.39      4603.53      1.63x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   2019.39      2534.19      1.25x
    
    opened by WojciechMula 11
  • enh: revert back to array for llTable/mlTable/ofTable

    enh: revert back to array for llTable/mlTable/ofTable

    The extra indirection of creating a slice with size maxTableSize is no longer needed; this results in a 25% performance increase on x86_64 for some workloads (esp html_x_4), and flat performance on aarch64.

    go1.17/amd64:

    benchmark                                                    old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               3494468       3379259       -3.30%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           687076        681387        -0.83%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            11085443      10924817      -1.45%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              8407670       8158085       -2.97%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            2825509       2749297       -2.70%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             3736775       3604104       -3.55%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1903319       1417892       -25.50%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          159818        154350        -3.42%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          75035         75623         +0.78%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                9421182       9058897       -3.85%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    795930        771416        -3.08%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           66842         65826         -1.52%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  435892        424669        -2.57%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              84058         90552         +7.73%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               1383197       1352505       -2.22%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 1042932       1005006       -3.64%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               355043        349566        -1.54%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                462460        448584        -3.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   236942        178127        -24.82%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             19407         18649         -3.91%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             8641          8690          +0.57%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   1152539       1132885       -1.71%
    BenchmarkDecoder_DecodeAll/html.zst-12                       99663         95678         -4.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              8361          8438          +0.92%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          71902         68621         -4.56%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      14927         14423         -3.38%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       241417        229017        -5.14%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         178932        169150        -5.47%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       59359         56624         -4.61%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        77809         74909         -3.73%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           43376         32490         -25.10%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     3572          3451          -3.39%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     1535          1539          +0.26%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           190123        185012        -2.69%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               17203         16557         -3.76%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      1509          1515          +0.40%
    
    benchmark                                                    old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               421.97       436.36       1.03x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           1380.79      1392.31      1.01x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            347.74       352.86       1.01x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              406.06       418.48       1.03x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            354.43       364.25       1.03x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             325.60       337.59       1.04x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1721.62      2311.04      1.34x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          5125.84      5307.43      1.04x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          13123.74     13021.82     0.99x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                596.18       620.02       1.04x
    BenchmarkDecoder_DecoderSmall/html.zst-12                    1029.24      1061.94      1.03x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           487.83       495.36       1.02x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  422.86       434.03       1.03x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              1410.79      1309.62      0.93x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               348.37       356.27       1.02x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 409.19       424.63       1.04x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               352.57       358.10       1.02x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                328.87       339.04       1.03x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   1728.70      2299.49      1.33x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             5276.43      5490.77      1.04x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             14245.78     14165.08     0.99x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   609.17       619.73       1.02x
    BenchmarkDecoder_DecodeAll/html.zst-12                       1027.46      1070.26      1.04x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              487.51       483.08       0.99x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          2563.50      2686.05      1.05x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      7944.49      8221.96      1.03x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       1995.97      2104.04      1.05x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         2385.01      2522.94      1.06x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       2108.86      2210.70      1.05x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        1954.64      2030.32      1.04x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           9443.01      12606.94     1.34x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     28670.75     29674.17     1.03x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     80178.39     79977.77     1.00x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           3692.80      3794.81      1.03x
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               5952.37      6184.74      1.04x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      2701.93      2691.29      1.00x
    
    benchmark                                                    old allocs     new allocs     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           1              1              +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  0              0              +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              0              0              +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 0              0              +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-12                       0              0              +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      0              0              +0.00%
    
    benchmark                                                    old bytes     new bytes     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               19459         19347         -0.58%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           2704          2594          -4.07%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            165860        158323        -4.54%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              108902        105083        -3.51%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            8997          8892          -1.17%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             17441         16595         -4.85%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                24948         18030         -27.73%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          48            527           +997.92%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          332           326           -1.81%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                199639        195355        -2.15%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           48            48            +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  0             0             +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              0             0             +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 0             0             +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-12                       0             0             +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              0             0             +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          141           131           -7.09%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      18            18            +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       1302          1289          -1.00%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         986           741           -24.85%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       78            75            -3.85%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        122           130           +6.56%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           179           133           -25.70%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     3             3             +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     2             2             +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           1539          1515          -1.56%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               18            17            -5.56%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      0             0             +0.00%
    

    go1.17/arm64:

    benchmark                                                   old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               7285584       7358412       +1.00%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           1948236       1954437       +0.32%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            23894495      23566187      -1.37%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              17781669      17587395      -1.09%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            6056783       5986820       -1.16%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             7595705       7503978       -1.21%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                4190851       4190728       -0.00%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          553375        561100        +1.40%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          348653        348737        +0.02%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                21020394      20865234      -0.74%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    2038084       2009719       -1.39%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           146524        156045        +6.50%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  900881        908068        +0.80%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              240917        242009        +0.45%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               2962741       2932156       -1.03%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 2203425       2174672       -1.30%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               752544        743828        -1.16%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                943281        931194        -1.28%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   515398        517646        +0.44%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             67341         68499         +1.72%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             41159         41151         -0.02%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   2594535       2577253       -0.67%
    BenchmarkDecoder_DecodeAll/html.zst-4                       252830        249818        -1.19%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              18349         19569         +6.65%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          226216        228030        +0.80%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      60384         60714         +0.55%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       748918        738658        -1.37%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         555155        547405        -1.40%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       188566        186724        -0.98%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        236716        233740        -1.26%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           129772        129420        -0.27%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     16902         17164         +1.55%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     10399         11602         +11.57%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           652628        688743        +5.53%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               63494         62539         -1.50%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      4707          5002          +6.27%
    
    benchmark                                                   old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               202.39       200.39       0.99x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           486.96       485.41       1.00x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            161.33       163.58       1.01x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              192.00       194.12       1.01x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            165.34       167.27       1.01x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             160.18       162.14       1.01x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                781.89       781.92       1.00x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          1480.37      1459.99      0.99x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          2824.42      2823.74      1.00x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                267.20       269.19       1.01x
    BenchmarkDecoder_DecoderSmall/html.zst-4                    401.95       407.62       1.01x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           222.54       208.97       0.94x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  204.60       202.98       0.99x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              492.24       490.01       1.00x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               162.64       164.34       1.01x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 193.68       196.24       1.01x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               166.34       168.29       1.01x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                161.23       163.33       1.01x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   794.72       791.28       1.00x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             1520.61      1494.92      0.98x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             2990.65      2991.24      1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   270.60       272.42       1.01x
    BenchmarkDecoder_DecodeAll/html.zst-4                       405.02       409.90       1.01x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              222.14       208.29       0.94x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          814.80       808.32       0.99x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      1963.88      1953.22      0.99x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       643.41       652.35       1.01x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         768.71       779.59       1.01x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       663.85       670.39       1.01x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        642.49       650.68       1.01x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           3156.31      3164.88      1.00x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     6058.38      5966.06      0.98x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     11837.16     10609.44     0.90x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           1075.78      1019.37      0.95x
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               1612.75      1637.37      1.02x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      865.88       814.92       0.94x
    
    benchmark                                                   old allocs     new allocs     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                2              2              +0.00%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           1              1              +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  0              0              +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              0              0              +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 0              0              +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-4                       0              0              +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      0              0              +0.00%
    
    benchmark                                                   old bytes     new bytes     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               40877         41129         +0.62%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           7053          7064          +0.16%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            392190        400904        +2.22%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              252577        266541        +5.53%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            19158         18939         -1.14%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             35133         34691         -1.26%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                52036         52056         +0.04%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          1385          1397          +0.87%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                498947        499662        +0.14%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           48            48            +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  0             0             +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              0             0             +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 0             0             +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-4                       0             0             +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              0             0             +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          161           175           +8.70%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      24            24            +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       1277          1290          +1.02%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         811           795           -1.97%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       84            83            -1.19%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        124           122           -1.61%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           179           180           +0.56%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     6             6             +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     4             4             +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           1621          1866          +15.11%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               22            23            +4.55%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      0             0             +0.00%
    
    

    go1.18beta1/amd64:

    benchmark                                                    old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               3429133       3412857       -0.47%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           693702        689294        -0.64%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            11083126      10933642      -1.35%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              8218440       8152884       -0.80%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            2785808       2798601       +0.46%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             3688584       3571942       -3.16%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1911532       1427758       -25.31%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          158056        156551        -0.95%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          75471         77183         +2.27%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                9104808       9158698       +0.59%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    782330        781526        -0.10%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           65191         65823         +0.97%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  435635        426514        -2.09%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              82841         83354         +0.62%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               1393424       1348864       -3.20%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 1039729       1014956       -2.38%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               348633        346539        -0.60%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                467920        450939        -3.63%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   235601        177940        -24.47%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             19147         18806         -1.78%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             8691          8654          -0.43%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   1153057       1133712       -1.68%
    BenchmarkDecoder_DecodeAll/html.zst-12                       96474         96299         -0.18%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              8127          8280          +1.88%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          71412         69662         -2.45%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      14648         14681         +0.23%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       237199        230351        -2.89%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         175811        171509        -2.45%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       58541         57730         -1.39%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        77509         75253         -2.91%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           43140         32879         -23.79%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     3507          3518          +0.31%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     1549          1628          +5.10%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           188335        186864        -0.78%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               16975         16777         -1.17%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      1483          1520          +2.49%
    
    benchmark                                                    old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               430.01       432.06       1.00x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           1367.60      1376.34      1.01x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            347.82       352.57       1.01x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              415.41       418.75       1.01x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            359.48       357.83       1.00x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             329.86       340.63       1.03x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1714.23      2295.07      1.34x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          5182.97      5232.79      1.01x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          13048.05     12758.55     0.98x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                616.89       613.26       0.99x
    BenchmarkDecoder_DecoderSmall/html.zst-12                    1047.13      1048.21      1.00x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           500.19       495.39       0.99x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  423.11       432.15       1.02x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              1431.52      1422.70      0.99x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               345.81       357.23       1.03x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 410.45       420.47       1.02x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               359.06       361.23       1.01x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                325.03       337.27       1.04x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   1738.53      2301.90      1.32x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             5347.99      5444.99      1.02x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             14162.80     14223.98     1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   608.89       619.28       1.02x
    BenchmarkDecoder_DecodeAll/html.zst-12                       1061.42      1063.35      1.00x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              501.52       492.26       0.98x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          2581.10      2645.93      1.03x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      8095.89      8077.87      1.00x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       2031.47      2091.86      1.03x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         2427.35      2488.22      1.03x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       2138.31      2168.34      1.01x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        1962.22      2021.03      1.03x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           9494.70      12457.71     1.31x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     29195.75     29104.21     1.00x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     79453.05     75621.54     0.95x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           3727.86      3757.20      1.01x
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               6032.46      6103.48      1.01x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      2748.11      2681.07      0.98x
    
    benchmark                                                    old allocs     new allocs     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           1              1              +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  0              0              +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              0              0              +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 0              0              +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-12                       0              0              +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      0              0              +0.00%
    
    benchmark                                                    old bytes     new bytes     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-12               19821         19297         -2.64%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-12           2650          2568          -3.09%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-12            165995        161008        -3.00%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-12              108565        109515        +0.88%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-12            9148          8733          -4.54%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-12             17213         16911         -1.75%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-12                23886         18246         -23.61%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-12          48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-12          331           336           +1.51%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-12                195130        195411        +0.14%
    BenchmarkDecoder_DecoderSmall/html.zst-12                    48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-12           48            48            +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-12                  0             0             +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-12              0             0             +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-12               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-12                 0             0             +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-12               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-12                0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-12                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-12             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-12             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-12                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-12                       0             0             +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-12              0             0             +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-12          135           140           +3.70%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-12      18            18            +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-12       1359          1252          -7.87%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-12         766           895           +16.84%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-12       77            76            -1.30%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-12        123           130           +5.69%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-12           178           134           -24.72%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-12     3             3             +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-12     2             2             +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-12           1502          1604          +6.79%
    BenchmarkDecoder_DecodeAllParallel/html.zst-12               18            18            +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-12      0             0             +0.00%
    

    go1.18beta1/arm64:

    benchmark                                                   old ns/op     new ns/op     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               7033872       7065133       +0.44%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           1842957       1899113       +3.05%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            23085034      22961622      -0.53%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              17288191      17104125      -1.06%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            5873367       5861024       -0.21%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             7362972       7335302       -0.38%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                4473970       4254649       -4.90%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          537208        550142        +2.41%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          349598        349861        +0.08%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                20083735      20279622      +0.98%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    1946366       1982454       +1.85%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           142151        151195        +6.36%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  870742        874869        +0.47%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              228264        234817        +2.87%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               2870046       2846427       -0.82%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 2132110       2120306       -0.55%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               728814        728754        -0.01%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                915257        910217        -0.55%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   553045        523042        -5.43%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             65675         66909         +1.88%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             41037         41045         +0.02%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   2493022       2512490       +0.78%
    BenchmarkDecoder_DecodeAll/html.zst-4                       241560        246458        +2.03%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              17804         18944         +6.40%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          218806        219646        +0.38%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      57340         58808         +2.56%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       721637        717992        -0.51%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         536653        537130        +0.09%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       182853        182832        -0.01%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        231020        228202        -1.22%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           138214        131220        -5.06%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     16427         16844         +2.54%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     10339         10398         +0.57%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           627512        631049        +0.56%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               60454         61648         +1.98%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      4524          4808          +6.28%
    
    benchmark                                                   old MB/s     new MB/s     speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               209.64       208.71       1.00x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           514.77       499.55       0.97x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            166.99       167.88       1.01x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              197.48       199.60       1.01x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            170.50       170.86       1.00x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             165.25       165.87       1.00x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                732.41       770.17       1.05x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          1524.92      1489.07      0.98x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          2816.79      2814.68      1.00x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                279.66       276.96       0.99x
    BenchmarkDecoder_DecoderSmall/html.zst-4                    420.89       413.23       0.98x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           229.39       215.67       0.94x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  211.68       210.68       1.00x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              519.52       505.02       0.97x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               167.89       169.29       1.01x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 200.16       201.27       1.01x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               171.76       171.77       1.00x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                166.17       167.09       1.01x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   740.63       783.11       1.06x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             1559.20      1530.44      0.98x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             2999.56      2998.99      1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   281.62       279.44       0.99x
    BenchmarkDecoder_DecodeAll/html.zst-4                       423.91       415.49       0.98x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              228.94       215.16       0.94x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          842.39       839.17       1.00x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      2068.14      2016.52      0.98x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       667.73       671.12       1.01x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         795.21       794.51       1.00x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       684.59       684.67       1.00x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        658.34       666.47       1.01x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           2963.51      3121.47      1.05x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     6233.72      6079.40      0.98x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     11905.70     11838.39     0.99x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           1118.84      1112.57      0.99x
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               1693.84      1661.04      0.98x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      901.07       847.82       0.94x
    
    benchmark                                                   old allocs     new allocs     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            1              2              +100.00%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    1              1              +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           1              1              +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  0              0              +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              0              0              +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 0              0              +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             0              0              +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   0              0              +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-4                       0              0              +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               0              0              +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      0              0              +0.00%
    
    benchmark                                                   old bytes     new bytes     delta
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-4               39448         39582         +0.34%
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-4           6716          6944          +3.39%
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-4            400904        401496        +0.15%
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-4              223823        223823        +0.00%
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-4            18568         18595         +0.15%
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-4             34178         33968         -0.61%
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-4                55559         52877         -4.83%
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-4          48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-4          1397          1403          +0.43%
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-4                437238        437240        +0.00%
    BenchmarkDecoder_DecoderSmall/html.zst-4                    48            48            +0.00%
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-4           48            48            +0.00%
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-4                  0             0             +0.00%
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-4              0             0             +0.00%
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-4               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-4                 0             0             +0.00%
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-4               0             0             +0.00%
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-4                0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html_x_4.zst-4                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-4             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-4             0             0             +0.00%
    BenchmarkDecoder_DecodeAll/urls.10K.zst-4                   0             0             +0.00%
    BenchmarkDecoder_DecodeAll/html.zst-4                       0             0             +0.00%
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-4              0             0             +0.00%
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-4          143           161           +12.59%
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-4      23            24            +4.35%
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-4       1220          1233          +1.07%
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-4         778           797           +2.44%
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-4       81            82            +1.23%
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-4        121           121           +0.00%
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-4           192           181           -5.73%
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-4     5             6             +20.00%
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-4     4             4             +0.00%
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-4           1578          1612          +2.15%
    BenchmarkDecoder_DecodeAllParallel/html.zst-4               21            21            +0.00%
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-4      0             0             +0.00%
    

    amd64 processor details (6 core / 12 hyperthread):

    processor	: 0
    vendor_id	: AuthenticAMD
    cpu family	: 25
    model		: 33
    model name	: AMD Ryzen 5 5600X 6-Core Processor
    stepping	: 0
    microcode	: 0xa201009
    cpu MHz		: 2200.000
    cache size	: 512 KB
    physical id	: 0
    siblings	: 12
    core id		: 0
    cpu cores	: 6
    apicid		: 0
    initial apicid	: 0
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 16
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca
    bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
    bogomips	: 7385.90
    TLB size	: 2560 4K pages
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 48 bits physical, 48 bits virtual
    power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
    

    arm64 processor details: graviton2, c6g.xlarge (4 core)

    processor	: 0
    BogoMIPS	: 243.75
    Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
    CPU implementer	: 0x41
    CPU architecture: 8
    CPU variant	: 0x3
    CPU part	: 0xd0c
    CPU revision	: 1
    
    opened by lizthegrey 11
  • [zip] 7z complains about

    [zip] 7z complains about "Headers Error" when large files are added to a zip archive

    Hi,

    Context

    I'm trying to generate a zip archive of huge files (something like 2GB each).

    The generated zip archive can successfully be extracted using my Ubuntu unzip command. But it raises an error when I try to extract using my graphical user interface or 7z.

    Here is the output of 7z in test integrity mode:

    $ 7z t archive.zip
    
    7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
    p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (806EC),ASM,AES-NI)
    
    Scanning the drive for archives:
    1 file, 4294967634 bytes (4097 MiB)
    
    Testing archive: archive.zip
    
    ERRORS:
    Headers Error
    
    --
    Path = archive.zip
    Type = zip
    ERRORS:
    Headers Error
    Physical Size = 4294967634
    64-bit = +
    
                  
    
    Archives with Errors: 1
    
    Open Errors: 1
    

    Since unzip is able to extract the files, I would say that the archive is valid. But I definitely does not understand why 7z is complaining here.

    I've absolutely no experience in ZIP / ZIP64 archives and I may be using your library in a wrong way...

    How to reproduce

    I've used the following code to :

    package main
    
    import (
    	"fmt"
    	"io"
    	"log"
    	"os"
    	"path/filepath"
    
    	"github.com/klauspost/compress/zip"
    )
    
    var (
    	// generated using "dd if=/dev/zero of=./file1 bs=1G count=2" or equivalent
    	fileNames = []string{
    		"./file1",
    		"./file2",
    	}
    )
    
    func main() {
    	outFileW, err := os.Create("./archive.zip")
    	if err != nil {
    		log.Fatal(err)
    	}
    	defer outFileW.Close()
    
    	zipWriter := zip.NewWriter(outFileW)
    
    	for _, filename := range fileNames {
    
    		fileToZip, err := os.Open(filename)
    		if err != nil {
    			log.Fatal(err)
    		}
    		defer fileToZip.Close()
    
    		// Get the file information
    		info, err := fileToZip.Stat()
    		if err != nil {
    			log.Fatal(err)
    		}
    
    		header, err := zip.FileInfoHeader(info)
    		if err != nil {
    			log.Fatal(err)
    		}
    
    		// Using FileInfoHeader() above only uses the basename of the file. If we want
    		// to preserve the folder structure we can overwrite this with the full path.
    		header.Name = filepath.Base(filename)
    
    		// Change to store to avoid compression
    		// see http://golang.org/pkg/archive/zip/#pkg-constants
    		header.Method = zip.Store
    
    		writer, err := zipWriter.CreateHeader(header)
    		if err != nil {
    			log.Fatal(err)
    		}
    		_, err = io.Copy(writer, fileToZip)
    		if err != nil {
    			log.Fatal(err)
    		}
    	}
    
    	if err := zipWriter.Close(); err != nil {
    		log.Fatal(err)
    	}
    
    	fmt.Println("archive.zip file created")
    }
    

    using

    • go 1.18.3
    • github.com/klauspost/compress v1.15.6

    Test files are filled with zero and created using a command such as dd if=/dev/zero of=./file1 bs=1G count=2

    Thanks in advance for your time and consideration

    opened by riton 10
  • huff0: Pass a single bitReader pointer to asm

    huff0: Pass a single bitReader pointer to asm

    This makes the context object smaller and frees up three registers, which we can use to replace the limitPtr and bufferOrigin stack variables.

    Benchmark results show a tiny win (Go 1.19beta, Core i7-3770K):

    name                                           old speed      new speed      delta
    Decompress1XTable/digits-8                      347MB/s ± 0%   347MB/s ± 0%    ~     (p=0.650 n=8+10)
    Decompress1XTable/gettysburg-8                  268MB/s ± 0%   268MB/s ± 0%    ~     (p=0.400 n=9+9)
    Decompress1XTable/twain-8                       327MB/s ± 0%   327MB/s ± 1%    ~     (p=0.339 n=7+9)
    Decompress1XTable/low-ent.10k-8                 385MB/s ± 0%   385MB/s ± 1%    ~     (p=0.510 n=9+10)
    Decompress1XTable/superlow-ent-10k-8            376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.712 n=8+10)
    Decompress1XTable/crash2-8                     17.3MB/s ± 1%  17.3MB/s ± 1%    ~     (p=0.926 n=10+10)
    Decompress1XTable/endzerobits-8                52.9MB/s ± 1%  52.4MB/s ± 0%  -0.94%  (p=0.000 n=10+10)
    Decompress1XTable/endnonzero-8                 11.4MB/s ± 0%  11.4MB/s ± 1%    ~     (p=0.343 n=10+10)
    Decompress1XTable/case1-8                      22.0MB/s ± 0%  22.0MB/s ± 0%    ~     (p=0.618 n=9+9)
    Decompress1XTable/case2-8                      18.1MB/s ± 0%  18.1MB/s ± 0%    ~     (p=0.348 n=9+9)
    Decompress1XTable/case3-8                      19.1MB/s ± 0%  19.1MB/s ± 0%  +0.21%  (p=0.048 n=10+10)
    Decompress1XTable/pngdata.001-8                 374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.861 n=9+10)
    Decompress1XTable/normcount2-8                 54.3MB/s ± 1%  54.5MB/s ± 1%    ~     (p=0.093 n=10+10)
    Decompress1XNoTable/digits/100-8                279MB/s ± 0%   280MB/s ± 0%  +0.30%  (p=0.003 n=10+9)
    Decompress1XNoTable/digits/10000-8              366MB/s ± 0%   365MB/s ± 0%    ~     (p=0.113 n=10+9)
    Decompress1XNoTable/digits/262143-8             347MB/s ± 0%   347MB/s ± 1%    ~     (p=0.739 n=10+10)
    Decompress1XNoTable/gettysburg/100-8            278MB/s ± 1%   277MB/s ± 1%    ~     (p=0.676 n=10+9)
    Decompress1XNoTable/gettysburg/10000-8          363MB/s ± 1%   362MB/s ± 0%  -0.50%  (p=0.001 n=10+9)
    Decompress1XNoTable/gettysburg/262143-8         350MB/s ± 0%   347MB/s ± 0%  -0.90%  (p=0.000 n=10+8)
    Decompress1XNoTable/twain/100-8                 268MB/s ± 0%   267MB/s ± 0%    ~     (p=0.384 n=9+8)
    Decompress1XNoTable/twain/10000-8               363MB/s ± 0%   362MB/s ± 0%  -0.32%  (p=0.000 n=9+9)
    Decompress1XNoTable/twain/262143-8              328MB/s ± 0%   329MB/s ± 0%    ~     (p=0.063 n=9+10)
    Decompress1XNoTable/low-ent.10k/100-8           180MB/s ± 0%   181MB/s ± 0%    ~     (p=0.225 n=10+10)
    Decompress1XNoTable/low-ent.10k/10000-8         385MB/s ± 0%   385MB/s ± 0%    ~     (p=0.289 n=10+10)
    Decompress1XNoTable/low-ent.10k/262143-8        389MB/s ± 1%   389MB/s ± 1%    ~     (p=0.971 n=10+10)
    Decompress1XNoTable/superlow-ent-10k/262143-8   389MB/s ± 0%   390MB/s ± 0%  +0.27%  (p=0.017 n=9+10)
    Decompress1XNoTable/crash2/100-8                278MB/s ± 0%   279MB/s ± 1%    ~     (p=0.163 n=9+10)
    Decompress1XNoTable/crash2/10000-8              373MB/s ± 1%   373MB/s ± 0%    ~     (p=0.370 n=10+8)
    Decompress1XNoTable/crash2/262143-8             375MB/s ± 0%   375MB/s ± 0%    ~     (p=0.604 n=9+10)
    Decompress1XNoTable/endzerobits/100-8           180MB/s ± 0%   181MB/s ± 0%  +0.26%  (p=0.005 n=10+9)
    Decompress1XNoTable/endzerobits/10000-8         384MB/s ± 0%   385MB/s ± 0%    ~     (p=0.914 n=8+10)
    Decompress1XNoTable/endzerobits/262143-8        389MB/s ± 0%   390MB/s ± 0%    ~     (p=0.739 n=10+10)
    Decompress1XNoTable/endnonzero/100-8            180MB/s ± 1%   180MB/s ± 1%    ~     (p=0.926 n=10+10)
    Decompress1XNoTable/endnonzero/10000-8          384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.965 n=10+8)
    Decompress1XNoTable/endnonzero/262143-8         390MB/s ± 0%   390MB/s ± 0%    ~     (p=0.633 n=8+10)
    Decompress1XNoTable/case1/100-8                 282MB/s ± 0%   283MB/s ± 0%  +0.34%  (p=0.005 n=10+10)
    Decompress1XNoTable/case1/10000-8               372MB/s ± 0%   373MB/s ± 0%    ~     (p=0.113 n=9+9)
    Decompress1XNoTable/case1/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.448 n=10+10)
    Decompress1XNoTable/case2/100-8                 274MB/s ± 1%   274MB/s ± 0%    ~     (p=0.927 n=10+10)
    Decompress1XNoTable/case2/10000-8               376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.408 n=10+8)
    Decompress1XNoTable/case2/262143-8              376MB/s ± 1%   377MB/s ± 0%    ~     (p=1.000 n=10+10)
    Decompress1XNoTable/case3/100-8                 266MB/s ± 0%   265MB/s ± 0%    ~     (p=0.113 n=9+10)
    Decompress1XNoTable/case3/10000-8               372MB/s ± 0%   372MB/s ± 0%    ~     (p=0.075 n=10+9)
    Decompress1XNoTable/case3/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.172 n=10+10)
    Decompress1XNoTable/pngdata.001/100-8           238MB/s ± 0%   238MB/s ± 0%    ~     (p=0.438 n=9+8)
    Decompress1XNoTable/pngdata.001/10000-8         384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.448 n=10+10)
    Decompress1XNoTable/pngdata.001/262143-8        378MB/s ± 0%   378MB/s ± 0%    ~     (p=0.836 n=10+10)
    Decompress1XNoTable/normcount2/100-8            281MB/s ± 0%   282MB/s ± 1%    ~     (p=0.122 n=8+10)
    Decompress1XNoTable/normcount2/10000-8          369MB/s ± 1%   369MB/s ± 0%    ~     (p=0.912 n=10+10)
    Decompress1XNoTable/normcount2/262143-8         370MB/s ± 0%   370MB/s ± 1%    ~     (p=0.342 n=10+10)
    Decompress4XNoTable/digits/100-8                197MB/s ± 0%   197MB/s ± 1%    ~     (p=0.764 n=10+9)
    Decompress4XNoTable/digits/10000-8              594MB/s ± 0%   602MB/s ± 1%  +1.35%  (p=0.000 n=10+10)
    Decompress4XNoTable/digits/262143-8             570MB/s ± 1%   578MB/s ± 0%  +1.30%  (p=0.000 n=10+8)
    Decompress4XNoTable/gettysburg/100-8            258MB/s ± 1%   260MB/s ± 0%  +0.59%  (p=0.001 n=10+10)
    Decompress4XNoTable/gettysburg/10000-8          638MB/s ± 0%   641MB/s ± 0%  +0.44%  (p=0.000 n=9+9)
    Decompress4XNoTable/gettysburg/262143-8         573MB/s ± 1%   574MB/s ± 0%    ~     (p=0.353 n=10+10)
    Decompress4XNoTable/twain/100-8                 214MB/s ± 2%   214MB/s ± 2%    ~     (p=0.853 n=10+10)
    Decompress4XNoTable/twain/10000-8               634MB/s ± 1%   638MB/s ± 0%  +0.62%  (p=0.000 n=10+10)
    Decompress4XNoTable/twain/262143-8              513MB/s ± 1%   517MB/s ± 0%  +0.85%  (p=0.000 n=10+10)
    Decompress4XNoTable/low-ent.10k/100-8           195MB/s ± 0%   194MB/s ± 0%    ~     (p=0.130 n=9+9)
    Decompress4XNoTable/low-ent.10k/10000-8         635MB/s ± 0%   642MB/s ± 0%  +1.19%  (p=0.000 n=10+10)
    Decompress4XNoTable/low-ent.10k/262143-8        675MB/s ± 0%   685MB/s ± 0%  +1.51%  (p=0.000 n=10+10)
    Decompress4XNoTable/superlow-ent-10k/262143-8   673MB/s ± 1%   684MB/s ± 0%  +1.70%  (p=0.000 n=10+10)
    Decompress4XNoTable/case1/100-8                 206MB/s ± 1%   206MB/s ± 0%    ~     (p=0.189 n=10+9)
    Decompress4XNoTable/case1/10000-8               593MB/s ± 0%   601MB/s ± 0%  +1.47%  (p=0.000 n=10+10)
    Decompress4XNoTable/case1/262143-8              603MB/s ± 0%   613MB/s ± 0%  +1.64%  (p=0.000 n=10+10)
    Decompress4XNoTable/case2/100-8                 201MB/s ± 0%   202MB/s ± 1%    ~     (p=0.053 n=9+10)
    Decompress4XNoTable/case2/10000-8               610MB/s ± 0%   618MB/s ± 0%  +1.30%  (p=0.000 n=9+10)
    Decompress4XNoTable/case2/262143-8              622MB/s ± 1%   634MB/s ± 0%  +1.90%  (p=0.000 n=9+8)
    Decompress4XNoTable/case3/100-8                 197MB/s ± 1%   198MB/s ± 0%  +0.53%  (p=0.001 n=9+10)
    Decompress4XNoTable/case3/10000-8               606MB/s ± 0%   615MB/s ± 0%  +1.49%  (p=0.000 n=8+10)
    Decompress4XNoTable/case3/262143-8              613MB/s ± 1%   622MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
    Decompress4XNoTable/pngdata.001/100-8           212MB/s ± 1%   211MB/s ± 0%    ~     (p=0.136 n=9+9)
    Decompress4XNoTable/pngdata.001/10000-8         645MB/s ± 1%   649MB/s ± 1%  +0.65%  (p=0.000 n=9+10)
    Decompress4XNoTable/pngdata.001/262143-8        640MB/s ± 1%   649MB/s ± 0%  +1.44%  (p=0.000 n=10+10)
    Decompress4XNoTable/normcount2/100-8            260MB/s ± 1%   261MB/s ± 1%    ~     (p=0.211 n=10+9)
    Decompress4XNoTable/normcount2/10000-8          584MB/s ± 1%   591MB/s ± 0%  +1.33%  (p=0.000 n=9+9)
    Decompress4XNoTable/normcount2/262143-8         588MB/s ± 1%   596MB/s ± 1%  +1.39%  (p=0.000 n=10+9)
    Decompress4XNoTableTableLog8/digits-8           583MB/s ± 1%   592MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
    Decompress4XTable/digits-8                      580MB/s ± 0%   588MB/s ± 0%  +1.33%  (p=0.000 n=8+10)
    Decompress4XTable/gettysburg-8                  368MB/s ± 1%   370MB/s ± 0%  +0.59%  (p=0.017 n=10+9)
    Decompress4XTable/twain-8                       510MB/s ± 0%   515MB/s ± 0%  +0.99%  (p=0.000 n=9+10)
    Decompress4XTable/low-ent.10k-8                 657MB/s ± 0%   665MB/s ± 0%  +1.24%  (p=0.000 n=10+10)
    Decompress4XTable/superlow-ent-10k-8            608MB/s ± 0%   617MB/s ± 1%  +1.48%  (p=0.000 n=8+10)
    Decompress4XTable/case1-8                      21.1MB/s ± 1%  21.0MB/s ± 2%    ~     (p=0.223 n=10+10)
    Decompress4XTable/case2-8                      17.6MB/s ± 0%  17.6MB/s ± 0%    ~     (p=0.199 n=9+10)
    Decompress4XTable/case3-8                      18.7MB/s ± 0%  18.7MB/s ± 0%    ~     (p=0.557 n=10+8)
    Decompress4XTable/pngdata.001-8                 633MB/s ± 1%   645MB/s ± 0%  +1.90%  (p=0.000 n=9+10)
    Decompress4XTable/normcount2-8                 49.9MB/s ± 1%  49.5MB/s ± 1%  -0.64%  (p=0.002 n=10+10)
    [Geo mean]                                      270MB/s        271MB/s       +0.36%
    
    opened by greatroar 0
  • S2 tier better matching

    S2 tier better matching

    Replaces #627

    enwik9	s2	2	1000000000	418353819	209	4552.13
    enwik9	s2	2	1000000000	416064079	228	4176.25
    
    silesia.tar	s2	2	211947520	87096708	50	3983.43
    silesia.tar	s2	2	211947520	86869241	56	3574.54
    
    github-ranks-backup.bin	s2	2	1862623243	563671388	239	7407.08
    github-ranks-backup.bin	s2	2	1862623243	565154367	290	6105.12
    
    gob-stream	s2	2	1911399616	308667456	173	10510.00
    gob-stream	s2	2	1911399616	301451558	201	9049.05
    
    github-june-2days-2019.json	s2	2	6273951764	954430842	528	11321.73
    github-june-2days-2019.json	s2	2	6273951764	943017937	603	9922.09
    
    nyc-taxi-data-10M.csv	s2	2	3325605752	915188199	433	7316.83
    nyc-taxi-data-10M.csv	s2	2	3325605752	883010338	459	6895.66
    
    consensus.db.10gb	s2	2	10737418240	4435349472	1396	7330.79
    consensus.db.10gb	s2	2	10737418240	4419595862	1764	5803.49
    
    rawstudio-mint14.tar	s2	2	8558382592	4134348052	1331	6127.64
    rawstudio-mint14.tar	s2	2	8558382592	4100018306	1550	5263.72
    
    hollywood-2009.tar	s2	2	808796160	267351968	110	6979.73
    hollywood-2009.tar	s2	2	808796160	268395630	129	5973.25
    
    10gb.tar	s2	2	10065157632	5512113730	1531	6267.18
    10gb.tar	s2	2	10065157632	5459406267	1709	5614.36
    
    sofia-air-quality-dataset.tar	s2	2	15464463872	4399846947	2059	7160.27
    sofia-air-quality-dataset.tar	s2	2	15464463872	4433864667	2405	6130.21
    
    sharnd.out.2gb	s2	2	2147483647	2147487753	210	9718.07
    sharnd.out.2gb	s2	2	2147483647	2147487753	234	8751.08
    
    opened by klauspost 0
  • Incorrect decompression with WithDecoderConcurrency(1) or DecodeAll

    Incorrect decompression with WithDecoderConcurrency(1) or DecodeAll

    I'm using your zstd library in my squashfs library and have run into a weird edge case. It seems that sometime when I decompress some data, I'm not getting the correct output. The weird thing is that the issue only arises when I pass WithDecoderConcurrency(1) or use DecodeAll. If I use WithDecoderConcurrency with any number other than 1 (and don't use DecodeAll), the problem goes away.

    For some context, I'm having the issue particularly when trying to read squashfs metadata blocks, which are pretty small (they have a max size of 8KiB, and not all of them are having issues. When I have some time, I'll try to capture some of the blocks that are causing the issue.

    Also, fantastic work on this library.

    opened by CalebQ42 4
  • Add support for 16 bit FSE compression

    Add support for 16 bit FSE compression

    I have created a 16 bit image compression codec in Go for medical images at https://github.com/pappuks/medical-image-codec which uses Delta encoding, RLE, FSE and/or Huffman coding. I have added 16 bit FSE implementation, which is based on the 8 bit FSE implementation from this repository and the 16 bit FSE implementation from https://github.com/Cyan4973/FiniteStateEntropy. The 16 bit FSE implementation in my repository has been updated to handle max values till the range of 65535. This is a deviation from the implementation in https://github.com/Cyan4973/FiniteStateEntropy where the max supported value is 4095.

    I would like if the 16 bit FSE implementation can be added to this repository, that way it is available along with the 8 bit FSE implementation.

    The files which have the 16 bit FSE implementation are: https://github.com/pappuks/medical-image-codec/blob/main/fseu16.go https://github.com/pappuks/medical-image-codec/blob/main/fsecompressu16.go https://github.com/pappuks/medical-image-codec/blob/main/fsedecompressu16.go

    If you agree with the feature request, I can then submit a PR to this branch.

    enhancement 
    opened by pappuks 4
Releases(v1.15.7)
Owner
Klaus Post
Klaus Post
Port of LZ4 lossless compression algorithm to Go

go-lz4 go-lz4 is port of LZ4 lossless compression algorithm to Go. The original C code is located at: https://github.com/Cyan4973/lz4 Status Usage go

Бранимир Караџић 209 Jun 14, 2022
LZ4 compression and decompression in pure Go

lz4 : LZ4 compression in pure Go Overview This package provides a streaming interface to LZ4 data streams as well as low level compress and uncompress

Pierre Curto 670 Jul 3, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Klaus Post 938 Jun 22, 2022
Unsigned Integer 32 Byte Packing Compression

dbp32 Unsigned Integer 32 Byte Packing Compression. Inspired by lemire/FastPFor. Package bp32 is an implementation of the binary packing integer compr

Ali Josie 2 Sep 6, 2021
Bzip2 Compression Tool written in Go

Bzip2 Compression Tool written in Go

Pedro Albanese 1 Dec 28, 2021
Slipstream is a method for lossless compression of power system data.

Slipstream Slipstream is a method for lossless compression of power system data. Design principles The protocol is designed for streaming raw measurem

Synaptec Ltd 4 Apr 14, 2022
An easy-to-use CLI-based compression tool.

Easy Compression An easy-to-use CLI-based compression tool. Usage NAME: EasyCompression - A CLI-based tool for (de)compression USAGE: EasyCompr

Tei Michael 1 Jan 1, 2022
zlib compression tool for modern multi-core machines written in Go

zlib compression tool for modern multi-core machines written in Go

Pedro F. Albanese 0 Jan 21, 2022
Optimized compression packages

compress This package provides various compression algorithms. zstandard compression and decompression in pure Go. S2 is a high performance replacemen

Klaus Post 3k Jun 25, 2022
Go wrapper for LZO compression library

This is a cgo wrapper around the LZO real-time compression library. LZO is available at http://www.oberhumer.com/opensource/lzo/ lzo.go is the go pack

Damian Gryski 13 Mar 4, 2022
Port of LZ4 lossless compression algorithm to Go

go-lz4 go-lz4 is port of LZ4 lossless compression algorithm to Go. The original C code is located at: https://github.com/Cyan4973/lz4 Status Usage go

Бранимир Караџић 209 Jun 14, 2022
LZ4 compression and decompression in pure Go

lz4 : LZ4 compression in pure Go Overview This package provides a streaming interface to LZ4 data streams as well as low level compress and uncompress

Pierre Curto 670 Jul 3, 2022
Go parallel gzip (de)compression

pgzip Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip". This will split compression

Klaus Post 938 Jun 22, 2022
Integer Compression Libraries for Go

Encoding This is a set of integer compression algorithms implemented in Go. It is an (incomplete) port of the JavaFastPFOR by Dr. Daniel Lemire. For m

null 123 May 5, 2022
Using brotli compression to embed static files in Go.

?? Broccoli go get -u aletheia.icu/broccoli Broccoli uses brotli compression to embed a virtual file system of static files inside Go executables. A f

Aletheia 525 Jun 17, 2022
The Snappy compression format in the Go programming language.

The Snappy compression format in the Go programming language. To download and install from source: $ go get github.com/golang/snappy Unless otherwis

Go 1.3k Jun 28, 2022
Package cae implements PHP-like Compression and Archive Extensions.

Compression and Archive Extensions 中文文档 Package cae implements PHP-like Compression and Archive Extensions. But this package has some modifications de

ᴜɴᴋɴᴡᴏɴ 36 Jun 16, 2022
Interfaces for LZ77-based data compression

Pack Interfaces for LZ77-based data compression. Introduction Many compression libraries have two main parts: Something that looks for repeated sequen

Andy Balholm 3 Oct 19, 2021
Unsigned Integer 32 Byte Packing Compression

dbp32 Unsigned Integer 32 Byte Packing Compression. Inspired by lemire/FastPFor. Package bp32 is an implementation of the binary packing integer compr

Ali Josie 2 Sep 6, 2021
Simple image compression using SVD

SVD image compression An implementation image compression using SVD decomposition on Go Built With Go 1.17 Gonum Compression examples Header Image Ori

null 4 Mar 30, 2022
An effective time-series data compression/decompression method based on Facebook's Gorilla.

Gorilla This package provides the effective time-series data compression method based on Facebook's Gorilla.. In a nutshell, it uses delta-of-delta ti

Keisuke Umegaki 49 Jun 15, 2022
Bzip2 Compression Tool written in Go

Bzip2 Compression Tool written in Go

Pedro Albanese 1 Dec 28, 2021
Image compression codec for 16 bit medical images

MIC - Medical Image Codec This library introduces a lossless medical image compression codec MIC for 16 bit images which provides compression ratio si

Kuldeep S 0 Dec 26, 2021
Slipstream is a method for lossless compression of power system data.

Slipstream Slipstream is a method for lossless compression of power system data. Design principles The protocol is designed for streaming raw measurem

Synaptec Ltd 4 Apr 14, 2022
An easy-to-use CLI-based compression tool.

Easy Compression An easy-to-use CLI-based compression tool. Usage NAME: EasyCompression - A CLI-based tool for (de)compression USAGE: EasyCompr

Tei Michael 1 Jan 1, 2022
zlib compression tool for modern multi-core machines written in Go

zlib compression tool for modern multi-core machines written in Go

Pedro F. Albanese 0 Jan 21, 2022
Seekable ZSTD compression format implemented in Golang.

ZSTD seekable compression format implementation in Go Seekable ZSTD compression format implemented in Golang. This library provides a random access re

Alexey Ivanov 17 Jun 22, 2022
Novel, efficient, and practical image compression with visually appealing results. 🤏 ✨

Tiny Thumb ?? ✨ A novel, efficient, and practical method for lossy image compression, that produces visually appealing thumbnails. This technique is u

Slack 5 Jun 23, 2022
Go implementation of BLAKE2 (b) cryptographic hash function (optimized for 64-bit platforms).

Go implementation of BLAKE2b collision-resistant cryptographic hash function created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, an

Dmitry Chestnykh 89 May 1, 2022