Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

Overview

kanzi

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

  • modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-tasking.
  • modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
  • expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
  • efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).

Kanzi supports a wide range of compression ratios and can compress many files more than most common compressors (at the cost of decompression speed). It is not compatible with standard compression formats.

For more details, check https://github.com/flanglet/kanzi-go/wiki.

Credits

Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov

Disclaimer

Use at your own risk. Always keep a backup of your files.

Build Status Go Report Card Total alerts Documentation

Silesia corpus benchmark

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 20.04

go1.16rc1

Kanzi version 1.9 Go implementation. Block size is 100 MB.

Compressor Encoding (sec) Decoding (sec) Size
Original 211,938,580
Zstd 1.4.8 -2 --long=30 1.2 0.3 68,761,465
Zstd 1.4.8 -2 -T6 --long=30 0.7 0.3 68,761,465
Kanzi -l 1 2.8 1.3 68,471,355
Kanzi -l 1 -j 6 0.9 0.4 68,471,355
Pigz 1.6 -6 -p6 1.4 1.4 68,237,849
Gzip 1.6 -6 6.1 1.1 68,227,965
Brotli 1.0.9 -2 --large_window=30 1.5 0.8 68,033,377
Pigz 1.6 -9 -p6 3.0 1.6 67,656,836
Gzip 1.6 -9 14.0 1.0 67,631,990
Kanzi -l 2 3.6 1.3 64,522,501
Kanzi -l 2 -j 6 1.3 0.4 64,522,501
Brotli 1.0.9 -4 --large_window=30 4.1 0.7 64,267,169
Zstd 1.4.8 -9 --long=30 5.3 0.3 59,937,600
Zstd 1.4.8 -9 -T6 --long=30 2.8 0.3 59,937,600
Kanzi -l 3 4.8 2.3 59,647,212
Kanzi -l 3 -j 6 1.7 0.8 59,647,212
Zstd 1.4.8 -13 --long=30 16.0 0.3 58,065,257
Zstd 1.4.8 -13 -T6 --long=30 9.2 0.3 58,065,257
Orz 1.5.0 7.7 2.0 57,564,831
Brotli 1.0.9 -9 --large_window=30 36.7 0.7 56,232,817
Lzma 5.2.2 -3 24.1 2.6 55,743,540
Kanzi -l 4 10.6 6.9 54,996,858
Kanzi -l 4 -j 6 3.8 2.3 54,996,858
Bzip2 1.0.6 -9 14.9 5.2 54,506,769
Zstd 1.4.8 -19 --long=30 59.9 0.3 53,039,786
Zstd 1.4.8 -19 -T6 --long=30 59.7 0.4 53,039,786
Kanzi -l 5 12.4 6.5 51,745,795
Kanzi -l 5 -j 6 4.2 2.1 51,745,795
Brotli 1.0.9 --large_window=30 356.2 0.9 49,383,136
Lzma 5.2.2 -9 65.6 2.5 48,780,457
Kanzi -l 6 15.6 10.8 48,067,846
Kanzi -l 6 -j 6 5.3 3.7 48,067,846
BCM 1.6.0 -7 18.0 22.1 46,506,716
Kanzi -l 7 22.2 17.3 46,446,991
Kanzi -l 7 -j 6 8.0 6.2 46,446,991
Tangelo 2.4 83.2 85.9 44,862,127
zpaq v7.14 m4 t1 107.3 112.2 42,628,166
zpaq v7.14 m4 t12 108.1 111.5 42,628,166
Kanzi -l 8 63.4 64.6 41,830,871
Kanzi -l 8 -j 6 22.5 21.8 41,830,871
Tangelo 2.0 302.0 310.9 41,267,068
Kanzi -l 9 84.8 86.5 40,369,883
Kanzi -l 9 -j 6 33.8 33.5 40,369,883
zpaq v7.14 m5 t1 343.1 352.0 39,112,924
zpaq v7.14 m5 t12 344.3 350.4 39,112,924

enwik8

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 20.04

go1.16rc1

Kanzi version 1.9 Go implementation. Block size is 100 MB. 1 thread

Compressor Encoding (sec) Decoding (sec) Size
Original 100,000,000
Kanzi -l 1 1.49 0.75 32,650,127
Kanzi -l 2 2.03 0.74 31,018,886
Kanzi -l 3 2.41 1.19 27,328,809
Kanzi -l 4 5.10 3.40 25,670,935
Kanzi -l 5 5.02 2.60 22,484,700
Kanzi -l 6 7.15 4.45 21,232,218
Kanzi -l 7 10.84 7.97 20,935,522
Kanzi -l 8 23.86 23.90 19,671,830
Kanzi -l 9 31.84 32.55 19,097,962

Build

There are no dependencies, making the project easy to build.

Option 1: go get

cd $GOPATH

go get github.com/flanglet/kanzi-go

cd src/github.com/flanglet/kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go

Option 2: git clone

cd $GOPATH/src

mkdir github.com; cd github.com

mkdir flanglet; cd flanglet

git clone https://github.com/flanglet/kanzi-go.git

cd kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go
Issues
  • When using io.Copy, CompressedStream never returns EOF when decompressing - infinite loop

    When using io.Copy, CompressedStream never returns EOF when decompressing - infinite loop

    Hi,

    Most encoders, while attempting to decompress using io.Copy for other encoders (zstandard, lz4, gzip, etc), will have the Read function return an EOF as an error when it's completed, but that is not happening here. Is this on pupose?

    Here is my code sample:

    	var (
    		file     *os.File
    		filename = fmt.Sprintf("%v.bak", strings.TrimSuffix(in, ".knz"))
    		zfile    *os.File
    		zinfo    os.FileInfo
    		ctx      = map[string]interface{}{
    			"inputName":  in,
    			"outputName": filename,
    			"jobs":       uint(1),
    			"overwrite":  true,
    			"verbosity":  1,
    		}
    	)
    
    	zfile, err = os.Open(in)
    	if err != nil {
    		return
    	}
    
    	zinfo, err = zfile.Stat()
    	if err != nil {
    		return
    	}
    
    	ctx["fileSize"] = zinfo.Size()
    
    	mode := zinfo.Mode() // use the same mode for the output file
    
    	// Output file.
    	file, err = os.OpenFile(filename, os.O_CREATE|os.O_WRONLY, mode)
    	if err != nil {
    		return
    	}
    
    	zr, err := kio.NewCompressedInputStreamWithCtx(zfile, ctx)
    
    	// Uncompress.
    	_, err = io.Copy(file, zr)
    	if err != nil {
    		return
    	}
    
    	for _, c := range []io.Closer{zr, file, zfile} {
    		err = c.Close()
    		if err != nil {
    			return
    		}
    	}
    
    	msg := fmt.Sprintf("Decoding %v: %v => %v bytes", in, zinfo.Size(), zr.GetRead())
    	log.Println(msg, true)
    
    	return
    
    opened by jherman 7
  • Hash benchmark test failed

    Hash benchmark test failed

    Hi, an error occurred when I was running benchmark test. I’d be grateful if someone could help me.

    C:\Users\xxxxx\AppData\Local\Temp___gobench_Hash_test_go.exe -test.v -test.bench "^BenchmarkXXHash32b|BenchmarkXXHash64$" -test.run ^$ #gosetup goos: windows goarch: amd64 pkg: github.com/flanglet/kanzi-go/benchmark cpu: Intel(R) Core(TM) i5-8600 CPU @ 3.10GHz BenchmarkXXHash32b Hash_test.go:48: Incorrect result for XXHash32 --- FAIL: BenchmarkXXHash32b BenchmarkXXHash64 Hash_test.go:75: Incorrect result for XXHash64 --- FAIL: BenchmarkXXHash64 FAIL

    Process finished with exit code 1

    opened by awen09 2
  • Compatibility issue between master branch and tag 1.8.0

    Compatibility issue between master branch and tag 1.8.0

    Hi,

    I compressed a file with tag 1.8.0, then attempted to decompress with the commit d1d768f6499a (github.com/flanglet/kanzi-go v1.8.1-0.20210210012806-d1d768f6499a) and get the following error:

    image

    There looks to be an incompatibility issue.

    opened by jherman 2
Releases(v2.0.0)
  • v2.0.0(Dec 12, 2021)

    • Many performance improvements (all levels)
    • Level 4, 5, 6, 7 decompress significantly faster
    • Reduced memory usage during compression and decompression
    • Improve scalability of parallel tasks with huge blocks
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(May 11, 2021)

    • Level 1 compresses better
    • New level 2 to fill a compression/speed gap
    • Level 3 compresses slightly better and faster
    • Level 5, 6, 7 decompress faster
    • Level 8 uses less memory (a bit weaker and faster)
    • Partial decompression available (only some blocks)
    • Bitstream format frozen
    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Nov 28, 2020)

    • Corner cases fixed and code improvements
    • Level 1 compresses a lot better
    • New codec for some multimedia files added to levels 2 & 3
    • Multi-threading rewritten to parallelize entropy (de)coding
    • Level 5 faster & level 6 faster (but a bit weaker)
    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Feb 18, 2020)

    • Bug fixes and code improvements
    • Small compression gains throughout
    • Level 1 compresses better (a bit slower)
    • Level 6 has been redefined (faster for text files)
    • Better handling of small files
    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Jul 7, 2019)

    • Bug fixes & code cleanup
    • Decompression speed improvements, especially level 4, 5 and 6 (new inverse BWT)
    • Better compression ratio at level 1, 2, 5 and 8.
    • New Sorted Ranks Transform
    • Improved code quality (refactored test code, improved docs, fixed linter warnings, ...)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Dec 24, 2018)

    • Two new levels (2 and 8) have been introduced to remove gaps in the compression ratio/time curve.
    • Many speed improvements for compression ratios similar to 1.4.
    • Better text compression at level 0.
    • Inverse BWT is now multi-threaded.
    • Reduced memory usage of forward BWT.
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(May 15, 2018)

    • Bug fixes
    • Code reorganization: split into 3 repositories (1 per language): kanzi. kanzi-go, kanzi-cpp.
    • Code is now go gettable for easy install.
    • New ROLZ based compression level 2
    • Compression improved in (ex) levels 1, 3 and 5. Level 5 is also faster
    • First stage allows up to 8 functions (instead of 4).
    Source code(tar.gz)
    Source code(zip)
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Jun 24, 2022
Graphik is a Backend as a Service implemented as an identity-aware document & graph database with support for gRPC and graphQL

Graphik is a Backend as a Service implemented as an identity-aware, permissioned, persistant document/graph database & pubsub server written in Go.

null 302 Jun 17, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

AdRoll 148 Jun 17, 2022
Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Dud Website | Install | Getting Started | Source Code Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Kevin Hanselman 98 Jun 20, 2022
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

null 2.7k Jun 29, 2022
xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Mohammed Al Ashaal 55 Apr 4, 2022
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Mozilla Services 3.4k Jun 25, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

InfluxData 2.1k Jun 23, 2022
churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline appli

churrodata 13 Mar 10, 2022
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Merico 22 Jun 30, 2022
A library for performing data pipeline / ETL tasks in Go.

Ratchet A library for performing data pipeline / ETL tasks in Go. The Go programming language's simplicity, execution speed, and concurrency support m

Daily Burn 385 Jan 19, 2022
A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Stripe 1.6k Jun 25, 2022
Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

好未来技术 825 Jun 21, 2022
sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Neil O'Toole 382 Jun 27, 2022
Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

whitaker-io 113 Jun 17, 2022
Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

null 9 Apr 4, 2022
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

null 5.8k Jun 26, 2022
Simple CRUD application using CockroachDB and Go

Simple CRUD application using CockroachDB and Go

Martín Montes 2 Feb 20, 2022