Package for indexing zip files and storing a compressed index

Overview

zipindex

Go Reference Go

zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index.

It will only provide the minimal needed data for successful decompression and CRC checks.

Custom metadata can be stored per file and filtering can be performed on the incoming files.

Usage

Indexing

Indexing is performed on the last part of a complete ZIP file.

Three methods can be used:

The zipindex.ReadDir function allows parsing from a raw buffer from the end of the file. If this isn't enough to read the directory zipindex.ErrNeedMoreData is returned.

Alternatively, zipindex.ReadFile will open a file on disk and read the directory from that.

Finally zipindex.ReaderAt allows to read the index from anything supporting the io.ReaderAt interface.

By default, only "regular" files are indexed, meaning directories and other entries are skipped, as well as files for which a decompressor isn't registered.

A custom filter function can be provided to change the default filtering. This also allows adding custom data for each file if more information is needed.

See examples in the documentation

Serializing

Before serializing it is recommended to run the OptimizeSize() on the returned files. This will sort the entries and remove any redundant CRC information.

The files are serialized using the Serialize() method. This will allow the information to be recreated using zipindex.DeserializeFiles, or to find a single file zipindex.FindSerialized can be used.

See examples in the documentation

Accessing file content

A file contains the following information:

type File struct {
    Name               string // Name of the file as stored in the zip.
    CompressedSize64   uint64 // Size of compressed data, excluding ZIP headers.
    UncompressedSize64 uint64 // Size of the Uncompressed data.
    Offset             int64  // Offset where file data header starts.
    CRC32              uint32 // CRC of the uncompressed data.
    Method             uint16 // Storage method.
    Flags              uint16 // General purpose bit flag

    Custom map[string]string
}

First an io.Reader must be forwarded to the absolute offset in Offset before. It is up to the caller to decide how to achieve that.

To open an individual file from the index use the (*File).Open(r io.Reader) with the forwarded Reader to open the content.

Similar to stdlib zip, not all methods/flags may be supported. Use zipfile.RegisterDecompressor to register non-standard decompressors.

For expert users, (*File).OpenRaw allows access to the compressed data.

License

zipindex is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

zipindex contains code that is Copyright (c) 2009 The Go Authors. See GO_LICENSE file for license. Parts are

Contributing

Contributions are welcome, please send PRs for any enhancements.

You might also like...
Golang library for reading and writing Microsoft Excelâ„¢ (XLSX) files.
Golang library for reading and writing Microsoft Excelâ„¢ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.

Hilbert Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves. Documentation available here This is not an

Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transforming and consuming them.

iter Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transformi

Generates data structure definitions from JSON files for any kind of programming language

Overview Archivist generates data structure definitions from JSON files for any kind of programming language. It also provides a library for golang to

Package ring provides a high performance and thread safe Go implementation of a bloom filter.

ring - high performance bloom filter Package ring provides a high performance and thread safe Go implementation of a bloom filter. Usage Please see th

Go package implementing bitsets

bitset Go language library to map between non-negative integers and boolean values Description Package bitset implements bitsets, a mapping between no

Go package implementing Bloom filters

Bloom filters A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item

Go package implementing an indexable ordered multimap

PACKAGE package skiplist import "github.com/glenn-brown/skiplist" Package skiplist implements fast indexable ordered multimaps. This sk

Comments
  • Add Method 93 (Zstandard) file compression support

    Add Method 93 (Zstandard) file compression support

    This will allow default indexing and decompression of zstandard compressed files inside zip files.

    By default 128MB is the maximum allowed decompression window size to limit excessive memory use.

    opened by klauspost 0
Releases(v0.3.0)
  • v0.3.0(Aug 25, 2022)

    What's Changed

    • Add Method 93 (Zstandard) file compression support by @klauspost in https://github.com/minio/zipindex/pull/8
    • Harden library, switch to Go 1.18 fuzz tests by @klauspost in https://github.com/minio/zipindex/pull/10
    • Update compression library for faster reads by @klauspost in https://github.com/minio/zipindex/pull/9
    • Clean up and more documentation by @klauspost in https://github.com/minio/zipindex/pull/12

    Full Changelog: https://github.com/minio/zipindex/compare/v0.2.1...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Oct 14, 2021)

    What's Changed

    • Add more fuzz corpus by @klauspost in https://github.com/minio/zipindex/pull/5
    • Hide generated serializers by @klauspost in https://github.com/minio/zipindex/pull/6
    • Read CRC if in FD and use to compare. by @klauspost in https://github.com/minio/zipindex/pull/7

    Full Changelog: https://github.com/minio/zipindex/compare/v0.2.0...v0.2.1

    Source code(tar.gz)
    Source code(zip)
Owner
High Performance, Kubernetes Native Object Storage
High Performance, Kubernetes Native Object Storage
Document Indexing and Searching Library in Go

Fehrist Fehrist is a pure Go library for indexing different types of documents. Currently it supports only CSV and JSON but flexible architecture give

Adnan Siddiqi 16 May 22, 2022
indexing library for Go

Bluge modern text indexing in go - blugelabs.com Features Supported field types: Text, Numeric, Date, Geo Point Supported query types: Term, Phrase, M

Bluge Labs 1.5k Jan 3, 2023
A Go implementation of the core algorithm in paper

Boolean Expression Indexer Go library A Go implementation of the core algorithm in paper <Indexing Boolean Expression>, which already supports the fol

wangyi 54 Dec 26, 2022
A data structure for storing points.

ptree This package provides an in-memory data structure for storing points. Under the hood it stores points in a tree structure where nodes are spatia

Josh 17 Apr 18, 2022
Storing strings without GC overhead

stringbank stringbank allows you to hold large numbers of strings without bothering the garbage collector. For small strings storage is reduced as the

Phil Pearl 67 Nov 17, 2022
When storing a value in a Go interface allocates memory on the heap.

Go interface values This repository deep dives Go interface values, what they are, how they work, and when storing a value in a Go interface allocates

Andrew Kutz 40 Dec 16, 2022
Simple dense bitmap index in Go with binary operators

This package contains a bitmap index which is backed by uint64 slice, easily encodable to/from a []byte without copying memory around so it can be present in both disk and memory. As opposed to something as roaring bitmaps, this is a simple implementation designed to be used for small to medium dense collections.

Roman Atachiants 188 Jan 3, 2023
Maintidx measures the maintainability index of each function

maintidx maintidx measures the maintainability index of each function. Here for

Hiroyuki Yagihashi 62 Dec 17, 2022
Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

null 44 Dec 27, 2022
Dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool.

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Tom Wright 3.9k Jan 1, 2023