Easily create & extract archives, and compress & decompress files of various formats

Overview

archiver archiver GoDoc

Introducing Archiver 3.1 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CLI in this generic replacement for several platform-specific or format-specific archive utilities.

Features

Package archiver makes it trivially easy to make and extract common archive formats such as tarball (and its compressed variants) and zip. Simply name the input and output file(s). The arc command runs the same on all platforms and has no external dependencies (not even libc). It is powered by the Go standard library and several third-party, pure-Go libraries.

Files are put into the root of the archive; directories are recursively added, preserving structure.

  • Make whole archives from a list of files
  • Open whole archives to a folder
  • Extract specific files/folders from archives
  • Stream files in and out of archives without needing actual files on disk
  • Traverse archive contents without loading them
  • Compress files
  • Decompress files
  • Streaming compression and decompression
  • Several archive and compression formats supported

Format-dependent features

  • Gzip is multithreaded
  • Optionally create a top-level folder to avoid littering a directory or archive root with files
  • Toggle overwrite existing files
  • Adjust compression level
  • Zip: store (not compress) already-compressed files
  • Make all necessary directories
  • Open password-protected RAR archives
  • Optionally continue with other files after an error

Supported compression formats

  • brotli (br)
  • bzip2 (bz2)
  • flate (zip)
  • gzip (gz)
  • lz4
  • snappy (sz)
  • xz
  • zstandard (zstd)

Supported archive formats

  • .zip
  • .tar (including any compressed variants like .tar.gz)
  • .rar (read-only)

Tar files can optionally be compressed using any of the above compression formats.

GoDoc

See https://pkg.go.dev/github.com/mholt/archiver/v3

Install

With webi

webi will install webi and arc to ~/.local/bin/ and update your PATH.

Mac, Linux, Raspberry Pi

curl -fsS https://webinstall.dev/arc | bash

Windows 10

curl.exe -fsS -A MS https://webinstall.dev/arc | powershell

With Go

To install the runnable binary to your $GOPATH/bin:

go get github.com/mholt/archiver/cmd/arc

Manually

To install manually

  1. Download the binary for your platform from the Github Releases page.
  2. Move the binary to a location in your path, for example:
    • without sudo:
      chmod a+x ~/Downloads/arc_*
      mkdir -p ~/.local/bin
      mv ~/Downloads/arc_* ~/.local/bin/arc
    • as root:
      chmod a+x ~/Downloads/arc_*
      sudo mkdir -p /usr/local/bin
      sudo mv ~/Downloads/arc_* /usr/local/bin/arc
  3. If needed, update ~/.bashrc or ~/.profile to include add arc in your PATH, for example:
    echo 'PATH="$HOME:/.local/bin:$PATH"' >> ~/.bashrc
    

Build from Source

You can successfully build arc with just the go tooling, or with goreleaser.

With go

go build cmd/arc/*.go

Multi-platform with goreleaser

Builds with goreleaser will also include version info.

goreleaser --snapshot --skip-publish --rm-dist

Command Use

Make new archive

# Syntax: arc archive [archive name] [input files...]

arc archive test.tar.gz file1.txt images/file2.jpg folder/subfolder

(At least one input file is required.)

Extract entire archive

# Syntax: arc unarchive [archive name] [destination]

arc unarchive test.tar.gz

(The destination path is optional; default is current directory.)

The archive name must end with a supported file extension—this is how it knows what kind of archive to make. Run arc help for more help.

List archive contents

# Syntax: arc ls [archive name]

arc ls caddy_dist.tar.gz
drwxr-xr-x  matt    staff   0       2018-09-19 15:47:18 -0600 MDT   dist/
-rw-r--r--  matt    staff   6148    2017-08-07 18:34:22 -0600 MDT   dist/.DS_Store
-rw-r--r--  matt    staff   22481   2018-09-19 15:47:18 -0600 MDT   dist/CHANGES.txt
-rw-r--r--  matt    staff   17189   2018-09-19 15:47:18 -0600 MDT   dist/EULA.txt
-rw-r--r--  matt    staff   25261   2016-03-07 16:32:00 -0700 MST   dist/LICENSES.txt
-rw-r--r--  matt    staff   1017    2018-09-19 15:47:18 -0600 MDT   dist/README.txt
-rw-r--r--  matt    staff   288     2016-03-21 11:52:38 -0600 MDT   dist/gitcookie.sh.enc
...

Extract a specific file or folder from an archive

# Syntax: arc extract [archive name] [path in archive] [destination on disk]

arc extract test.tar.gz foo/hello.txt extracted/hello.txt

Compress a single file

# Syntax: arc compress [input file] [output file]

arc compress test.txt compressed_test.txt.gz
arc compress test.txt gz

For convenience, the output file (second argument) may simply be a compression format (without leading dot), in which case the output filename will be the same as the input filename but with the format extension appended, and the input file will be deleted if successful.

Decompress a single file

# Syntax: arc decompress [input file] [output file]

arc decompress test.txt.gz original_test.txt
arc decompress test.txt.gz

For convenience, the output file (second argument) may be omitted. In that case, the output filename will have the same name as the input filename, but with the compression extension stripped from the end; and the input file will be deleted if successful.

Flags

Flags are specified before the subcommand. Use arc help or arc -h to get usage help and a description of flags with their default values.

Library Use

The archiver package allows you to easily create and open archives, walk their contents, extract specific files, compress and decompress files, and even stream archives in and out using pure io.Reader and io.Writer interfaces, without ever needing to touch the disk.

To use as a dependency in your project:

go get github.com/mholt/archiver/v3
import "github.com/mholt/archiver/v3"

See the package's GoDoc for full API documentation.

For example, creating or unpacking an archive file:

err := archiver.Archive([]string{"testdata", "other/file.txt"}, "test.zip")
// ...
err = archiver.Unarchive("test.tar.gz", "test")

The archive format is determined by file extension. (There are several functions in this package which perform a task by inferring the format from file extension or file header, including Archive(), Unarchive(), CompressFile(), and DecompressFile().)

To configure the archiver used or perform, create an instance of the format's type:

z := archiver.Zip{
	CompressionLevel:       flate.DefaultCompression,
	MkdirAll:               true,
	SelectiveCompression:   true,
	ContinueOnError:        false,
	OverwriteExisting:      false,
	ImplicitTopLevelFolder: false,
}

err := z.Archive([]string{"testdata", "other/file.txt"}, "/Users/matt/Desktop/test.zip")

Inspecting an archive:

err = z.Walk("/Users/matt/Desktop/test.zip", func(f archiver.File) error {
	zfh, ok := f.Header.(zip.FileHeader)
	if ok {
		fmt.Println("Filename:", zfh.Name)
	}
	return nil
})

Streaming files into an archive that is being written to the HTTP response:

err = z.Create(responseWriter)
if err != nil {
	return err
}
defer z.Close()

for _, fname := range filenames {
	info, err := os.Stat(fname)
	if err != nil {
		return err
	}

	// get file's name for the inside of the archive
	internalName, err := archiver.NameInArchive(info, fname, fname)
	if err != nil {
		return err
	}

	// open the file
	file, err := os.Open(f)
	if err != nil {
		return err
	}

	// write it to the archive
	err = z.Write(archiver.File{
		FileInfo: archiver.FileInfo{
			FileInfo:   info,
			CustomName: internalName,
		},
		ReadCloser: file,
	})
	file.Close()
	if err != nil {
		return err
	}
}

The archiver.File type allows you to use actual files with archives, or to mimic files when you only have streams.

There's a lot more that can be done, too. See the GoDoc for full API documentation.

Security note: This package does NOT attempt to mitigate zip-slip attacks. It is extremely difficult to do properly and seemingly impossible to mitigate effectively across platforms. Attempted fixes have broken processing of legitimate files in production, rendering the program unusable. Our recommendation instead is to inspect the contents of an untrusted archive before extracting it (this package provides Walkers) and decide if you want to proceed with extraction.

Project Values

This project has a few principle-based goals that guide its development:

  • Do our thing really well. Our thing is creating, opening, inspecting, compressing, and streaming archive files. It is not meant to be a replacement for specific archive format tools like tar, zip, etc. that have lots of features and customizability. (Some customizability is OK, but not to the extent that it becomes overly complicated or error-prone.)

  • Have good tests. Changes should be covered by tests.

  • Limit dependencies. Keep the package lightweight.

  • Pure Go. This means no cgo or other external/system dependencies. This package should be able to stand on its own and cross-compile easily to any platform -- and that includes its library dependencies.

  • Idiomatic Go. Keep interfaces small, variable names semantic, vet shows no errors, the linter is generally quiet, etc.

  • Be elegant. This package should be elegant to use and its code should be elegant when reading and testing. If it doesn't feel good, fix it up.

  • Well-documented. Use comments prudently; explain why non-obvious code is necessary (and use tests to enforce it). Keep the docs updated, and have examples where helpful.

  • Keep it efficient. This often means keep it simple. Fast code is valuable.

  • Consensus. Contributions should ideally be approved by multiple reviewers before being merged. Generally, avoid merging multi-chunk changes that do not go through at least one or two iterations/reviews. Except for trivial changes, PRs are seldom ready to merge right away.

  • Have fun contributing. Coding is awesome!

We welcome contributions and appreciate your efforts! However, please open issues to discuss any changes before spending the time preparing a pull request. This will save time, reduce frustration, and help coordinate the work. Thank you!

Issues
  • Add support for storing symlinks in tar and zip archives

    Add support for storing symlinks in tar and zip archives

    Also implement extraction of symlinks from zip archives.

    This PR also adds a relative symlink in the testdata directory. It passes all tests on OS X, but has not been tested on Windows.

    This PR is a superset of changes from the following issues and PRs:

    Fixes #21 Fixes #31 Fixes #60 Fixes #74

    opened by jandubois 20
  • v4: Implement FS over an io.ReadSeeker stream

    v4: Implement FS over an io.ReadSeeker stream

    Please include lines https://github.com/mholt/archiver/blob/10c5080fa78f78d10e28abdf11fa6d4abb7f999f/fs.go#L40 - https://github.com/mholt/archiver/blob/10c5080fa78f78d10e28abdf11fa6d4abb7f999f/fs.go#L53 as an example for library use (handle archive as an io.FS) - it took me an hour to find out how to do it, and it is essentially VERY easy, but this information was hard to find.

    feature request 
    opened by tgulacsi 16
  • RFE: port to github.com/pierrec/lz4/v4

    RFE: port to github.com/pierrec/lz4/v4

    What would you like to have changed?

    Make it possible to build with github.com/pierrec/lz4/v4.

    Why is this feature a useful, necessary, and/or important addition to this project?

    lz4 was updated to 4.0.2 in Fedora rawhide and archiver no longer builds using distribution-provided Go packages as a result.

    What alternatives are there, or what are you doing in the meantime to work around the lack of this feature?

    I'm forced to stop updating archiver until it's ported to lz4 v4 API or someone creates a compatibility package with v3 API.

    Please link to any relevant issues, pull requests, or other discussions.

    To see the issue, patch the source with the following patch:

    diff -up archiver-3.3.2/lz4.go.lz4 archiver-3.3.2/lz4.go
    --- archiver-3.3.2/lz4.go.lz4	2020-09-28 10:43:21.000000000 +0200
    +++ archiver-3.3.2/lz4.go	2020-10-05 13:06:07.879465436 +0200
    @@ -5,7 +5,7 @@ import (
     	"io"
     	"path/filepath"
     
    -	"github.com/pierrec/lz4/v3"
    +	"github.com/pierrec/lz4"
     )
     
     // Lz4 facilitates LZ4 compression.
    diff -up archiver-3.3.2/tarlz4.go.lz4 archiver-3.3.2/tarlz4.go
    --- archiver-3.3.2/tarlz4.go.lz4	2020-09-28 10:43:21.000000000 +0200
    +++ archiver-3.3.2/tarlz4.go	2020-10-05 13:06:17.578418304 +0200
    @@ -5,7 +5,7 @@ import (
     	"io"
     	"strings"
     
    -	"github.com/pierrec/lz4/v3"
    +	"github.com/pierrec/lz4"
     )
     
     // TarLz4 facilitates lz4 compression
    

    and build. Fedora package build process fails with the following errors:

    _build/src/github.com/mholt/archiver/lz4.go:19:3: w.Header undefined (type *lz4.Writer has no field or method Header)
    _build/src/github.com/mholt/archiver/tarlz4.go:87:7: lz4w.Header undefined (type *lz4.Writer has no field or method Header)
    

    It looks like option handling and compression level setting got changed: https://github.com/pierrec/lz4/compare/v3.3.2..v4.0.2

    I have nearly zero Golang knowledge, but this crude patch makes it compile and go test passes:

    diff -up archiver-3.3.2/lz4.go.lz4 archiver-3.3.2/lz4.go
    --- archiver-3.3.2/lz4.go.lz4	2020-09-28 10:43:21.000000000 +0200
    +++ archiver-3.3.2/lz4.go	2020-10-05 13:28:21.581995885 +0200
    @@ -5,7 +5,7 @@ import (
     	"io"
     	"path/filepath"
     
    -	"github.com/pierrec/lz4/v3"
    +	"github.com/pierrec/lz4"
     )
     
     // Lz4 facilitates LZ4 compression.
    @@ -16,7 +16,12 @@ type Lz4 struct {
     // Compress reads in, compresses it, and writes it to out.
     func (lz *Lz4) Compress(in io.Reader, out io.Writer) error {
     	w := lz4.NewWriter(out)
    -	w.Header.CompressionLevel = lz.CompressionLevel
    +	options := []lz4.Option{
    +		lz4.CompressionLevelOption(lz4.CompressionLevel(1 << (8 + lz.CompressionLevel))),
    +	}
    +	if err := w.Apply(options...); err != nil {
    +		return err
    +	}
     	defer w.Close()
     	_, err := io.Copy(w, in)
     	return err
    diff -up archiver-3.3.2/tarlz4.go.lz4 archiver-3.3.2/tarlz4.go
    --- archiver-3.3.2/tarlz4.go.lz4	2020-09-28 10:43:21.000000000 +0200
    +++ archiver-3.3.2/tarlz4.go	2020-10-05 13:28:21.581995885 +0200
    @@ -5,7 +5,7 @@ import (
     	"io"
     	"strings"
     
    -	"github.com/pierrec/lz4/v3"
    +	"github.com/pierrec/lz4"
     )
     
     // TarLz4 facilitates lz4 compression
    @@ -84,7 +84,12 @@ func (tlz4 *TarLz4) wrapWriter() {
     	var lz4w *lz4.Writer
     	tlz4.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) {
     		lz4w = lz4.NewWriter(w)
    -		lz4w.Header.CompressionLevel = tlz4.CompressionLevel
    +		options := []lz4.Option{
    +			lz4.CompressionLevelOption(lz4.CompressionLevel(1 << (8 + tlz4.CompressionLevel))),
    +		}
    +		if err := lz4w.Apply(options...); err != nil {
    +			return lz4w, err
    +		}
     		return lz4w, nil
     	}
     	tlz4.Tar.cleanupWrapFn = func() {
    
    feature request 
    opened by rathann 13
  • craft zip file for symlink testing

    craft zip file for symlink testing

    We need a special zip file that cannot be created with normal commandline tools. It requires crafting with an API. This should be possible with archive/zip#Writer, for example.

    We want a double entry of a file - the first being a symlink such that the second will be placed in an arbitrary location:

    ./goodfile.txt  "hello world"         (file)
    ./bad/file.txt  => ../../badfile.txt  (symlink)
    ./bad/file.txt  "Mwa-ha-ha"           (file)
    ./morefile.txt  "hello world"         (file)
    

    This should go in testdata/testarchives/evilarchives/ as double-evil.zip and double-evil.tar (if it is allowed).

    See also https://github.com/mholt/archiver/issues/242#issuecomment-703086020

    opened by coolaj86 13
  • fix: prevent extraction of archived files outside target path

    fix: prevent extraction of archived files outside target path

    Why this PR?

    This PR is meant to fix an arbitrary file write vulnerability, that can be achieved using a specially crafted zip archive, that holds path traversal filenames. When the filename gets concatenated to the target extraction directory, the final path ends up outside of the target folder.

    A sample malicious zip file named zip-slip.zip. (see this gist) was used, and when running the code below, resulted in creation of evil.txt file in /tmp folder.

    package main
    
    import "log"
    import "github.com/mholt/archiver"
    
    func main() {
    	err := archiver.Tar.Open("/tmp/evil-tar.tar", "/tmp/safe")
    	if err != nil {
    		log.Fatal(err)
    	}
    }
    

    There are various possible ways to avoid this issue, some include checking for .. (dot dot) characters in the filename, but the best solution in our opinion is to check if the final target filename, starts with the target folder (after both are resolved to their absolute path).

    Stay secure, Snyk Team

    opened by aviadatsnyk 13
  • Fix hard links

    Fix hard links

    Had to refactor how paths/filenames are passed into some functions to accommodate hard-linked files.

    @petemoore Would you please try this out?

    Should fix #152.

    opened by mholt 12
  • Installation Instructions Incorrect / go get fails

    Installation Instructions Incorrect / go get fails

    What version of the package or command are you using?

    Not sure how to check. I just ran go get github.com/mholt/archiver/v3

    What are you trying to do?

    Install the package and use it.

    What steps did you take?

    Installed go on Windows. Ran go get github.com/mholt/archiver/v3

    What did you expect to happen, and what actually happened instead?

    The package should successfully install.

    This happened instead:

    package github.com/mholt/archiver/v3: cannot find package "github.com/mholt/archiver/v3" in any of:
            c:\go\src\github.com\mholt\archiver\v3 (from $GOROOT)
            C:\Users\vroy1\go\src\github.com\mholt\archiver\v3 (from $GOPATH)
    

    Doing go get github.com/mholt/archiver works better and throws the following error instead:

    C:\Users\vroy1\Desktop\server>go get github.com/mholt/archiver/  
    package github.com/pierrec/lz4/v3: cannot find package "github.com/pierrec/lz4/v3" in any of:
            c:\go\src\github.com\pierrec\lz4\v3 (from $GOROOT)
            C:\Users\vroy1\go\src\github.com\pierrec\lz4\v3 (from $GOPATH)
    

    Please link to any related issues, pull requests, and/or discussion

    https://github.com/mholt/archiver/issues/195

    opened by vedantroy 11
  • Update github.com/pierrec/lz4 dependency to module version

    Update github.com/pierrec/lz4 dependency to module version

    Updates dependency to use github.com/pierrec/lz4/v3 at version v3.0.1. This version is functionally equivalent to the previously specified dependency.

    opened by nmiyake 11
  • use filepath.Dir() instead of path.Dir()

    use filepath.Dir() instead of path.Dir()

    Please use filepath.Dir(), not path.Dir().

    In Windows, following code doesn't work with Error peco_windows_amd64\peco_windows_amd64\Changes: creating new file: open peco_windows_amd64\peco_windows_amd64\Changes: The system cannot find the path specified.

    package main
    
    import (
        "fmt"
        "io"
        "net/http"
        "os"
        "path"
    
        "github.com/mholt/archiver"
    )
    
    func main() {
        url := "https://github.com/peco/peco/releases/download/v0.4.0/peco_windows_amd64.zip"
        resp, _ := http.Get(url)
        defer resp.Body.Close()
        fname := path.Base(url)
        f, _ := os.Create(fname)
        io.Copy(f, resp.Body)
    
        err := archiver.Unzip(fname, "peco_windows_amd64")
        if err != nil {
            fmt.Println(err.Error())
        }
    }
    
    

    And following code doesn't extract .tar.gz file including symlink with Error: dest\hoge.txt: creating new file: open dest\hoge.txt: The system cannot find the path specified.

    $ tar -tvf hoge.tar.gz
    -rw-r--r-- username/197121     0 2016-08-20 14:35 hoge.txt
    lrwxrwxrwx username/197121     0 2016-08-20 14:12 link -> hoge.txt
    
    package main
    
    import (
        "fmt"
    
        "github.com/mholt/archiver"
    )
    
    func main() {
        err := archiver.UntarGz("hoge.tar.gz", "dest")
        if err != nil {
            fmt.Println(err.Error())
        }
    
    opened by whatalnk 11
  • v4 rewrite: All new design, core types, and stream-oriented interfaces

    v4 rewrite: All new design, core types, and stream-oriented interfaces

    I spent the holiday rewriting this package from scratch with a completely new approach to handling archives and compression formats. This will become v4.

    The new core APIs are completely stream-oriented and file-agnostic. The abstractions for files, directories, and file systems are virtualized thanks to the recently-added io/fs package in the Go standard library. I expect these design changes will close most open issues and PRs because it either fixes the problems or makes them irrelevant.

    A significant number of issues relate directly to interactions with specific file systems (files on disk) and this new API does not deal with that directly anymore, except for a couple specific functions that read the disk in order to create the abstraction. Nothing in the core API writes to disk or deals with that. (Yay!) The stream and FS abstractions are highly flexible to build upon.

    Another nice thing about the new design is that there's no more need for explicit composite types (like TarGz and TarBz2), because we have a new CompressedArchive type that composes an archive and a compression format. It's mainly used with Tar only, since Zip and Rar do their own thing, but another nice feature is the Identify() function that automatically gets you the right type:

    // opening a file on disk for this example, but can be any ReadSeeker
    unknownFile, err := os.Open("filename.tar.gz")
    if err != nil {
    	return err
    }
    defer unknownFile.Close()
    
    // if you don't have a filename, leave it blank: identification uses both/either filenames and/or streams
    format, err := archiver.Identify("filename.tar.gz", unknownFile)
    if err != nil {
    	return err
    }
    
    // we can now work with the file, for example, extract a file out of it
    if ex, ok := format.(archiver.Extractor); ok {
    	ex.Extract(context.Background(), unknownFile, "target.txt", func(_ context.Context, f File) error {
    		// do something with the file ...
    		return nil
    	})
    }
    
    // or maybe it's just a compressed log file
    if decom, ok := format.(archiver.Decompressor); ok {
    	rc, err := decom.OpenReader(unknownFile)
    	if err != nil {
    		return err
    	}
    	defer rc.Close()
    	// read from it ... all reads are decompressed now
    }
    

    I think the new APIs are pretty slick, and you should try them out and let me know what you think.

    One of my favorite new features is the FileSystem() function. Give it a path on disk, and it will return a fs.ReadDirFS. Basically, it lets you read from real directories, regular files, archive files, and compressed archive files, ALL THE SAME WAY. This is pretty cool I think. You don't have to worry about whether the given file is just a regular file, a directory, or an archive file (which acts like a directory because it contains other files!) -- you can traverse it all the same way. And the archive format doesn't matter either, it's automatically identified for you! You literally won't even know things are being decompressed as you read from them:

    fsys, err := archiver.FileSystem("/path/to/folder/or/file")
    if err != nil {
    	return err
    }
    
    // traverse everything except the ".git" folder...
    err = fs.WalkDir(fsys, ".", func(path string, d fs.DirEntry, err error) error {
    	if err != nil {
    		return err
    	}
    	if path == ".git" {
    		return fs.SkipDir
    	}
    	fmt.Println(path, d.IsDir())
    	return nil
    })
    if err != nil {
    	return err
    }
    
    // ...or just open one file
    file, err := fsys.Open("example.txt")
    if err != nil {
    	return err
    }
    defer file.Close()
    

    To make a new archive from files on disk, you might do this:

    files, err := archiver.FilesFromDisk(map[string]string{
    	"/path/on/disk/file1.txt": "file1.txt",
    	"/path/on/disk/file2.txt": "subfolder/file2.txt",
    	"/path/on/disk/folder":    "",
    })
    if err != nil {
    	return err
    }
    
    out, err := os.Create("example.tar.gz")
    if err != nil {
    	return err
    }
    defer out.Close()
    
    caf := archiver.CompressedArchive{
    	Compression: archiver.Gz{},
    	Archival:    archiver.Tar{},
    }
    
    err = caf.Archive(context.Background(), out, files)
    if err != nil {
    	return err
    }
    

    Notice how you have the flexibility of mapping each file (or folder) to a different path in the archive. You can also leave the mapped path blank to assume the base filename for convenience. Folders are added recursively.

    Oh yeah, and I added basic context support.

    The arc command has not been ported over yet, and as that has to deal directly with the file system, it will require some work before it is ready. This would be the only part of the repository that would write to disk, as the core library doesn't write to disk anymore.

    Looks like I was able to delete about 70% of the code, however that count includes test files, the command, and the README, which have not been restored yet. Still, I could feel a significant code reduction in this new design when I wrote it.

    Should make irrelevant / close / fix #118, #128, #131, #141, #146, #150, #227, #194, #204, #216, #239, #255, #262, #278, #282

    opened by mholt 10
  • [style] Use error instance rather than an error string

    [style] Use error instance rather than an error string

    Recently we had a PR that was necessary, but had some code style issues that we'd like to fix:

    The PR in question: https://github.com/mholt/archiver/pull/231/files

    What would you like to have changed?

    There are a few ways I think could work really well:

    1. Follow the style of os.IsNotFound
      • Let's add function something like IsIllegalPath in archiver.go
      • Let's replace lines like strings.Contains(err.Error(), "illegal file path") with IsIllegalPath
    2. Follow the style of csv.ParseError
      • Let's add a struct IllegalPathError to archiver.go
      • Let's replace things like fmt.Errorf("illegal file path: %s", filename) with the use of that error
    3. Let's do both!

    Why is this feature a useful, necessary, and/or important addition to this project?

    We just want to make the code more durable and maintainable.

    good first issue 
    opened by coolaj86 10
  • [v4] can't open a file inside tar.gz

    [v4] can't open a file inside tar.gz

    What version of the package or command are you using?

    v4.0.0-alpha.7

    What are you trying to do?

    List files of .tar.gz archive

    What steps did you take?

    package main
    
    import (
    	"fmt"
    	"github.com/mholt/archiver/v4"
    	"io/fs"
    )
    
    func main() {
    	fsys, err := archiver.FileSystem(`test.tar.gz`)
    	if err != nil {
    		panic(err)
    	}
    
    	err = fs.WalkDir(fsys, `.`, func(path string, f fs.DirEntry, err error) error {
    		if f.IsDir() {
    			return nil
    		}
    
    		fh, err := fsys.Open(path)
    		if err != nil {
    			return err
    		}
    		defer fh.Close()
    
    		fmt.Printf(`file: %v`+"\n", path)
    
    		return nil
    	})
    	if err != nil {
    		panic(err)
    	}
    }
    

    What did you expect to happen, and what actually happened instead?

    Expected to open files inside tar.gz archive, but got file not found error. If you try with the attached zip, that works fine. So something is wrong with tar and/or gz filesystem implementation?

    test.zip test.tar.gz

    unconfirmed 
    opened by raspi 7
  • ArchiveFS.Open returns the first file in an implicit directory rather than a fs.ReadDirFile

    ArchiveFS.Open returns the first file in an implicit directory rather than a fs.ReadDirFile

    What version of the package or command are you using?

    v4.0.0-alpha.7

    What are you trying to do?

    ArchiveFS.Open() with a zip file that does not have explicit directory entries.

    What steps did you take?

    Adding a disabled test to showcase the bug via https://github.com/mholt/archiver/pull/339.

    When you run the tests you'll get the following output.

    $ go test ./...
    --- FAIL: TestArchiveFS_ReadDir (0.00s)
        --- FAIL: TestArchiveFS_ReadDir/nodir.zip (0.00s)
            --- FAIL: TestArchiveFS_ReadDir/nodir.zip/Open(cmd) (0.00s)
                fs_test.go:136: 'cmd' did not return a fs.ReadDirFile, <nil>
            --- FAIL: TestArchiveFS_ReadDir/nodir.zip/Open(.github) (0.00s)
                fs_test.go:136: '.github' did not return a fs.ReadDirFile, <nil>
    FAIL
    FAIL    github.com/mholt/archiver/v4    0.064s
    ?       github.com/mholt/archiver/v4/cmd/arc    [no test files]
    FAIL
    

    Subtest of TestArchiveFS_ReadDir that reproduces this issue:

    // Uncomment to reproduce https://github.com/mholt/archiver/issues/340.
    t.Run(fmt.Sprintf("Open(%s)", baseDir), func(t *testing.T) {
    	f, err := fsys.Open(baseDir)
    	if err != nil {
    		t.Error(err)
    	}
    
    	rdf, ok := f.(fs.ReadDirFile)
    	if !ok {
    		t.Fatalf("'%s' did not return a fs.ReadDirFile, %+v", baseDir, rdf)
    	}
    
    	dis, err := rdf.ReadDir(-1)
    	if err != nil {
    		t.Fatal(err)
    	}
    
    	dirs := []string{}
    	for _, di := range dis {
    		dirs = append(dirs, di.Name())
    	}
    
    	// Stabilize the sort order
    	sort.Strings(dirs)
    
    	if diff := cmp.Diff(wantLS, dirs); diff != "" {
    		t.Errorf("Open().ReadDir(-1) mismatch (-want +got):\n%s", diff)
    	}
    })
    

    What did you expect to happen, and what actually happened instead?

    For ArchiveFS.Open() will return the first file with that directory prefix.

    How do you think this should be fixed?

    ArchiveFS.Open(<directory>) to return an fs.ReadDirFile probably in the concrete form of dirFile.

    Please link to any related issues, pull requests, and/or discussion

    Bonus: What do you use archiver for, and do you find it useful?

    opened by jeremyje 0
  • m1 related unarchive inconsistency, dropping the root dir

    m1 related unarchive inconsistency, dropping the root dir

    What version of the package or command are you using?

    github.com/mholt/archiver/v4 v4.0.0-alpha.6.0.20220421032531-8a97d87612e9

    What are you trying to do?

    unarchive a tar.gz directory, given the basic wrapper function:

    func Unarchive(input io.Reader, dir string) error {
    	// TODO: consider if should write to a more generic interface
    	// like a writer, or if maybe if the function itself
    	// should take the handler as an input so can be as generic
    	// as you'd like in the handler
    	format, input, err := archiver.Identify("", input)
    	if err != nil {
    		return err
    	}
    	// the list of files we want out of the archive; any
    	// directories will include all their contents unless
    	// we return fs.SkipDir from our handler
    	// (leave this nil to walk ALL files from the archive)
    
    	handler := func(ctx context.Context, f archiver.File) error {
    		newPath := filepath.Join(dir, f.NameInArchive)
    		if f.IsDir() {
    			return os.MkdirAll(newPath, f.Mode())
    		}
    		newFile, err := os.OpenFile(newPath, os.O_CREATE|os.O_WRONLY, f.Mode())
    		if err != nil {
    			return err
    		}
    		defer newFile.Close()
    		// copy file data into tar writer
    		af, err := f.Open()
    		if err != nil {
    			return err
    		}
    		defer af.Close()
    		if _, err := io.Copy(newFile, af); err != nil {
    			return err
    		}
    		return nil
    	}
    	// make sure the format is capable of extracting
    	ex, ok := format.(archiver.Extractor)
    	if !ok {
    		return err
    	}
    	return ex.Extract(context.Background(), input, nil, handler)
    }
    

    What steps did you take?

    On the mac, given a tar archive with a root dir of quarto-0.9.532 and directories

    tar -tvf ~/Downloads/quarto-0.9.532-linux-amd64.tar.gz
    drwxr-xr-x  0 runner docker      0 Jun  6 18:20 quarto-0.9.532/
    drwxr-xr-x  0 runner docker      0 Jun  6 18:20 quarto-0.9.532/bin/
    

    unpacking manually, can likewise see a directory structure:

    .
    └── quarto-0.9.542
       ├── bin
       └── share
    

    however when I add fmt.Println("name in archive: ", f.NameInArchive) I see on the m1 mac

    name in archive:  ./
    name in archive:  ./bin/
    name in archive:  ./share/
    

    On linux, I do see the correct behavior.

    name in archive:  quarto-0.9.542/
    name in archive:  quarto-0.9.542/bin/
    name in archive:  quarto-0.9.542/share/
    

    What did you expect to happen, and what actually happened instead?

    expect to unarchive the directory as present in the archive

    How do you think this should be fixed?

    normalize behavior

    Please link to any related issues, pull requests, and/or discussion

    likely the inverse issue of #336

    Bonus: What do you use archiver for, and do you find it useful?

    unconfirmed 
    opened by dpastoor 7
  • Easier unarchiving support with v4?

    Easier unarchiving support with v4?

    Hello, thanks for the awesome library. The new version is fantastic since I can use it in conjunction with other network libraries to stream HTTP response right into unarchiving process.

    I'm using archiver on Google Cloud Storage caching library for GitHub Actions, and while archiving files was super easy, unarchiving was more involved with some manual file type handling.

    Is there a plan to have a shorthand to make unarchival process a bit easier?

    What would you like to have changed?

    Quick way to unarchive file on v4, similar to how v3 works in some way

    Why is this feature a useful, necessary, and/or important addition to this project?

    Currently, unarchiving requires a user to manually handle each different Tar type flags which involves a lot of boilerplate.

    What alternatives are there, or what are you doing in the meantime to work around the lack of this feature?

    I took a look at v3 code and implemented unarchiving in-line, however it's not ideal at least from my perspective.

    Please link to any relevant issues, pull requests, or other discussions.

    feature request 
    opened by premist 1
  • rar multipart (*.part1.rar, *.part2.rar) support?

    rar multipart (*.part1.rar, *.part2.rar) support?

    Hi,

    I have a lot of multipart .rar archives that follow the pattern from the title

    <name>.part1.rar
    <name>.part2.rar
    <name>.part<x>.rar
    

    but when I execute a command like this: arc.exe -overwrite unarchive "R:\melodies.part1.rar" "R:\melodies"

    I only get the message: scanning source archive: scanning tarball's file listing: rardecode: archive continues in next volume and no extraction is done...

    I'm on Windows, using arc v3.2

    feature request 
    opened by oO0XX0Oo 1
  • Repo causing problems on windows due to symlink

    Repo causing problems on windows due to symlink

    Git is constantly telling me that there are changes to the file/symlink: vendor/github.com/mholt/archiver/testdata/exist

    This breaks the go dep dependency/vendor tool.

    The issue appears to be because when windows or bash or git see the symlink, they add a C: to the front of the file name.

    prometheus/procfs had a similar issue, that they solved: https://github.com/prometheus/procfs/issues/60 Alternatively, just create the symlinks when the tests are run, rather than commit them to git.

    MinGW 03:12:47 ~/workspace/go/src/github.com/ReturnPath/gdzilla$ git status
    On branch cduncan_ss-xx_update-prometheus
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working directory)
    
            modified:   vendor/github.com/mholt/archiver/testdata/exist
    
    no changes added to commit (use "git add" and/or "git commit -a")
    
    MinGW 03:20:37 ~/workspace/go/src/github.com/ReturnPath/gdzilla$ git diff
    diff --git a/vendor/github.com/mholt/archiver/testdata/exist b/vendor/github.com/mholt/archiver/testdata/exist
    index b8f119655..9728dc4cc 120000
    --- a/vendor/github.com/mholt/archiver/testdata/exist
    +++ b/vendor/github.com/mholt/archiver/testdata/exist
    @@ -1 +1 @@
    -/target/does/not/exist
    \ No newline at end of file
    +C:/target/does/not/exist
    \ No newline at end of file
    

    What version of the package or command are you using?

    bfece90dc3bb7199e8a397a27eb267a6cdd9589c

    What are you trying to do?

    on one project that has this as a dependency i am trying to run dep ensure -update

    What steps did you take?

    Be on windows

    What did you expect to happen, and what actually happened instead?

    Not have git constantly tell me there are differences, and prevent me from checking out other branches, because it thinks there are changes to this file.

    How do you think this should be fixed?

    prometheus/procfs had a similar issue, and they fixed it: https://github.com/prometheus/procfs/issues/60 I'm not sure if their fix is necessarily applicable. Maybe don't have the symlinks in the git repo and instead generate them when the tests are run?

    opened by veqryn 1
Releases(v4.0.0-alpha.6)
Owner
Matt Holt
M.S. Computer Science. Author of the Caddy Web Server, CertMagic, Papa Parse, JSON/curl-to-Go, Timeliner, Relica, and more...
Matt Holt
A go compress library for fs.FS interface

compress: a go compress library for fs.FS interface Format Test Charset Decoder Encoder Password Info zip local true true true false used go std rar l

null 1 Apr 16, 2022
This is a tool to extract TODOs, NOTEs etc or search user provided terms from given files and/or directories.

ado This is a tool to extract TODOs, NOTEs etc or user provided terms from given files and/or directories. DEPRECIATED: My project seek has cleaner co

Meelis Utt 0 Jan 30, 2022
Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

go-exiftool go-exiftool is a golang library that wraps ExifTool. ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS,

null 114 Jun 22, 2022
Coalmine: De-mining canaries in common file formats

Coalmine: De-mining canaries in common file formats Objective On-prem file checking for canaries prior to opening them in readers (e.g. Acrobat, Word,

D.Snezhkov 4 May 20, 2022
Extract profiles and tasks information from CSV file

Footsite-Bot ideas from jw6602 Extract profiles and tasks information from CSV f

Zhiyao Wen 6 May 18, 2022
RtxTest - Extract this zip file into your golang development environment

Documentation 1. Clone or extract file extract this zip file into your golang de

Abdul Rauf 1 May 12, 2022
Create ePub files from URLs

url2epub Create ePub files from URLs Overview The root directory provides a Go library that creates ePub files out of URLs, with limitations.

Yuxuan 'fishy' Wang 29 May 8, 2022
Create all possible binaries from go files

nextBuild.go Create all possible binaries of a project in go ChangeLog 0.0.1 ─ First release. Flags You can alter a few things when creating the binar

FlamesX128 3 Dec 16, 2021
Create all possible binaries from go files

nextBuild.go Create all possible binaries of a project in go ChangeLog 0.0.1 ─ First release. Flags You can alter a few things when creating the binar

FlamesX128 3 Dec 16, 2021
Go filesystem implementations for various URL schemes

hairyhenderson/go-fsimpl This module contains a collection of Go filesystem implementations that can discovered dynamically by URL scheme. All filesys

Dave Henderson 225 Jun 27, 2022
app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files

app-services-go-linter app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files. A

Red Hat Developer 2 Nov 29, 2021
Split text files into gzip files with x lines

hakgzsplit split lines of text into multiple gzip files

Luke Stephens (hakluke) 6 Jun 21, 2022
create PDF from ASCII File for Cable labels

CableLable create PDF from ASCII File for Cable labels file format is one label per line, a line containing up to 3 words, each word is a line on the

null 0 Nov 8, 2021
This program let you create a DataSet (.CSV) with all TedTalks

TedTalks-Scraper This program let you create a file .CSV with all information from TedTalks, including: Title Description Views (Number of Views) Auth

null 0 Dec 26, 2021
Gokrazy mkfs: a program to create an ext4 file system on the gokrazy perm partition

gokrazy mkfs This program is intended to be run on gokrazy only, where it will c

null 4 Jun 13, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Mohamed Shapan 100 Dec 22, 2021
🏵 Gee is tool of stdin to each files and stdout

Gee is tool of stdin to each files and stdout. It is similar to the tee command, but there are more functions for convenience. In addition, it was written as go. which provides output to stdout and files.

HAHWUL 64 Jun 13, 2022
Golang PDF library for creating and processing PDF files (pure go)

UniPDF - PDF for Go UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is wr

UniDoc 1.6k Jun 27, 2022