Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Overview

Dud

Build status Go report card

Website | Install | Getting Started | Source Code

Dud is a lightweight tool for versioning data alongside source code and building data pipelines. In practice, Dud extends many of the benefits of source control to large binary data.

With Dud, you can commit, checkout, fetch, and push large files and directories with a simple command line interface. Dud stores recipes (a.k.a. stages) for retrieving your data in small YAML files. These stages can be stored in source control to link your data to your code. On top of that, stages can run the commands to generate the data, sort of like Make. Stages can be chained together to create data pipelines. See the Getting Started guide for a hands-on overview.

Dud is pronounced "duhd", not "dood". Dud is not an acronym.

Motivation

Dud is heavily inspired by DVC. DVC addresses the need for data versioning and reproducibility, but its implementation is not without problems. My criticisms of DVC boil down to two things: speed and simplicity. By speed, I mean throughput and responsiveness. By simplicity, I mean doing less--both in project scope and amount of abstraction.

In terms of speed, Dud is generally much faster than DVC. In terms of simplicity, Dud has a smaller, more focused scope, and it is distributed as a single executable.

To summarize with an analogy: Dud is to DVC what Flask is to Django. Both Dud and DVC have their strengths. If you want a "batteries included" suite of tools for managing machine learning projects, DVC may be a good fit for you. If data management is your main area of need and you want something lightweight and fast, Dud may be a better fit.

To get down to brass tacks, read on.

Concrete differences with DVC

Dud does not manage experiments and/or metrics.

Dud is solely focused on versioning and reproducing data alongside source code. DVC's scope has grown to encompass a large portion of a traditional machine learning workflow. While an integrated suite of tools has its benefits, if UNIX is any guide, the composition of smaller, more focused tools generally yield more productivity than their monolithic counterparts. For example, there's no reason you couldn't use MLflow or Aim alongside Dud to track your experiments. Dud does not prescribe any solution for experiment tracking, and it doesn't try to enter the new, yet already crowded, marketplace for such tools.

Secondly, versioning data alongside source code is an incredibly useful concept in its own right. Domains beyond machine learning and data science (e.g. game development and digital design) may greatly benefit from this approach to data management without being burdened by extra baggage carried by a specific domain.

Dud commits must always be explicitly invoked; they are never side effects.

For both Dud and DVC, committing data to the cache is one of the most expensive operations that each tool undertakes (in terms of both run-time and I/O). Because of this, Dud puts the user in absolute control of when to commit data. In Dud, commits only happen in when you run dud commit.

In contrast, DVC often commits automatically on your behalf as a side effect of other commands (for example, during dvc add and dvc repro). While DVC is trying to be helpful, these implicit commits are often accidental commits. For example, if you're rapidly iterating on a pipeline, you're likely running dvc repro or dvc run repeatedly as you develop. However, DVC will automatically commit the results each time you run dvc repro or dvc run--even if you are just debugging something or tweaking your code. Such accidental commits have a high cost; they turn "rapid development" into "development", and they bloat your cache. (You can disable DVC's implicit commits using the --no-commit flag, but you have to remember to type it each time, and DVC does not support enabling this flag by default, e.g. via configuration file.)

Dud checks out files as symbolic links by default.

When Dud checks out cached files into the workspace, it uses symbolic links (a.k.a. symlinks) by default. Symlinks have a number of benefits that make them an excellent choice for checkouts. First, symlinks require very little I/O to create, so dud checkout usually completes almost instantaneously. Second, symlinks transparently redirect to the cached files themselves, so data isn't duplicated between the workspace and the cache, and your storage space is used efficiently. Last but not least, symlinks make it trivial to check if a file is up-to-date (by checking the link target), so dud status can also be extremely fast.

By default, DVC checks out files as hard copies. (Technically, DVC tries to use reflinks before copies, but very few filesystems support reflinks, so copies are far more likely to be the default.) With hard copies, efficiencies listed above are not possible, so checkouts and status checks are inefficient by default. To its credit, DVC's cache can be configured to use symlinks, but arguably DVC's default cache configuration is not sensible for projects of any significant size.

Running a Dud pipeline never implicitly alters a stage's artifacts.

When you run a pipeline in DVC, DVC will remove all pipeline outputs before running the pipeline's command(s). While this can help ensure reproducible pipelines, it is another implicit behavior the user must consider, and it prevents the user from deciding when stage outputs can safely be reused.

If you don't want DVC to automatically remove outputs for you, you need to explicitly tell it each output you'd like to persist. However, by telling DVC to persist an output, DVC may perform a new and different automatic behavior. If you're using symbolic links (or hard links) for checkouts (which is generally a good idea; see above), DVC will "unprotect" all output links by replacing them with hard copies from the cache. Not only is this behavior surprising, it's also very costly in both runtime and storage.

The result of these two behaviors in DVC means that, in a sensible configuration, stages simply cannot reuse outputs efficiently; the user has little choice but to accept DVC's limitations.

When you run a pipeline Dud, Dud doesn't do any implicit modification of existing files. Dud defers all modification of workspace files to the user. If you want a specific behavior, you should code it into your stage's command. For example, if you want to clear all outputs of a stage prior to it running, you can delete any outputs at the beginning of your command's script. If you want to reuse outputs, you can check for preexisting outputs in your script and choose not to recreate them. Dud's minimalist approach results in a stage's command entirely owning it's own reproducibility; the responsibility is not awkwardly shared between the stage and the tool.

Dud delegates remote cache management to Rclone.

Rclone is a very popular command-line tool which describes itself as "The Swiss army knife of cloud storage." At the time of writing, Rclone has more than 28,000 stars on Github. Rclone supports just about any cloud storage provider you've possibly heard of. (S3, GCS, Dropbox, Backblaze, to name a few.) This is all to say: Rclone is a top-tier choice for moving data around the internet.

Dud internally calls Rclone for all of its remote cache functionality, such as dud fetch and dud push. But Dud doesn't hide the Rclone abstraction entirely. Dud exposes its Rclone configuration file, and it's expected and encouraged that users will use Rclone directly to configure remote storage or interact with their remote data. By using Rclone, Dud's remote cache interface immediately gains the benefit of years of open-source development and a rich, well-documented CLI. This is an example of how Dud embraces the UNIX philosophy and the composition of single-focus tools, as stated above.

In contrast, DVC stiches together various Python packages to support a modest assortment of cloud storage options. At the time of writing, DVC 2.6 supports eleven cloud storage providers, and Rclone 1.56 supports more than fifty. But the amount of cloud storage options isn't the critical disadvantage of DVC's approach. (Both Dud and DVC support the biggest players, such as S3 and GCS.) DVC's critical disadvantage is that they must develop and maintain most of their remote data management stack themselves. If Rclone is any indication, cloud data transfer is a very hard problem, and DVC has their work cut out for them.

In summary, Dud leverages the deep knowledge and effort of the Rclone developers to provide a robust and familiar remote cache experience. DVC plots their own course, and in doing so incurs a steep development cost.

Dud does not use analytics. (And it never will.)

By default, DVC enables embedded analytics. I strongly disagree with this practice, especially in free and open-source software. I will never embed analytics in Dud.

Contributing

See CONTRIBUTING.md.

License

BSD-3-Clause. See LICENSE.

Issues
  • deps: bump github.com/cheggaaa/pb/v3 from 3.0.8 to 3.1.0

    deps: bump github.com/cheggaaa/pb/v3 from 3.0.8 to 3.1.0

    Bumps github.com/cheggaaa/pb/v3 from 3.0.8 to 3.1.0.

    Commits
    • 67c695b Merge pull request #188 from cheggaaa/v3-pooling
    • 56b7944 Merge pull request #192 from dmitryk-dk/v3-pooling
    • 8830ba5 rollback go version
    • 55fa2cc rollback old build rules
    • 684eb1b used syscall
    • f435c2c used syscall
    • e1c53e3 Update dependencies, use term instead of Syscall
    • 90c02fa configurable os.Signal catching
    • bbc97ac Merge pull request #190 from jkawamoto/prefix
    • 4bd2f07 Add a white space after prefix and before suffix
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/stretchr/testify from 1.7.5 to 1.8.0

    deps: bump github.com/stretchr/testify from 1.7.5 to 1.8.0

    Bumps github.com/stretchr/testify from 1.7.5 to 1.8.0.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/stretchr/testify from 1.7.3 to 1.7.4

    deps: bump github.com/stretchr/testify from 1.7.3 to 1.7.4

    Bumps github.com/stretchr/testify from 1.7.3 to 1.7.4.

    Commits
    • 48391ba Fix panic in AssertExpectations for mocks without expectations (#1207)
    • 840cb80 arrays value types in a zero-initialized state are considered empty (#1126)
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/stretchr/testify from 1.7.1 to 1.7.3

    deps: bump github.com/stretchr/testify from 1.7.1 to 1.7.3

    Bumps github.com/stretchr/testify from 1.7.1 to 1.7.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/stretchr/testify from 1.7.1 to 1.7.2

    deps: bump github.com/stretchr/testify from 1.7.1 to 1.7.2

    Bumps github.com/stretchr/testify from 1.7.1 to 1.7.2.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/spf13/viper from 1.11.0 to 1.12.0

    deps: bump github.com/spf13/viper from 1.11.0 to 1.12.0

    Bumps github.com/spf13/viper from 1.11.0 to 1.12.0.

    Release notes

    Sourced from github.com/spf13/viper's releases.

    v1.12.0

    This release makes YAML v3 and TOML v2 the default versions used for encoding.

    You can switch back to the old versions by adding viper_yaml2 and viper_toml1 to the build tags.

    Please note that YAML v2 and TOML v1 are considered deprecated from this release and may be removed in a future release.

    Please provide feedback in discussions and report bugs on the issue tracker. Thanks!

    What's Changed

    Exciting New Features 🎉

    Enhancements 🚀

    Dependency Updates ⬆️

    New Contributors

    Full Changelog: https://github.com/spf13/viper/compare/v1.11.0...v1.12.0

    Commits
    • 4322cf2 feat: make toml2 the default
    • 8d02999 feat: make yaml3 the default
    • 7c35aa9 chore(deps): update yaml3
    • 433821f feat: add etcd3 support to remote
    • 2080d43 chore: update crypt
    • da55858 chore: fix Error log calls in mergeMaps
    • f50ce90 Add in MustBindEnv.
    • 3b836e5 build(deps): bump github.com/subosito/gotenv from 1.2.0 to 1.3.0
    • 5d65186 build(deps): bump github.com/pelletier/go-toml/v2 from 2.0.0 to 2.0.1
    • 9f85518 build(deps): bump github.com/spf13/cast from 1.4.1 to 1.5.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/zeebo/blake3 from 0.2.2 to 0.2.3

    deps: bump github.com/zeebo/blake3 from 0.2.2 to 0.2.3

    Bumps github.com/zeebo/blake3 from 0.2.2 to 0.2.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/stretchr/testify from 1.7.0 to 1.7.1

    deps: bump github.com/stretchr/testify from 1.7.0 to 1.7.1

    Bumps github.com/stretchr/testify from 1.7.0 to 1.7.1.

    Commits
    • 083ff1c Fixed didPanic to now detect panic(nil).
    • 1e36bfe Use cross Go version compatible build tag syntax
    • e798dc2 Add docs on 1.17 build tags
    • 83198c2 assert: guard CanConvert call in backward compatible wrapper
    • 087b655 assert: allow comparing time.Time
    • 7bcf74e fix msgAndArgs forwarding
    • c29de71 add tests for correct msgAndArgs forwarding
    • f87e2b2 Update builds
    • ab6dc32 fix linting errors in /assert package
    • edff5a0 fix funtion name
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/spf13/cobra from 1.3.0 to 1.4.0

    deps: bump github.com/spf13/cobra from 1.3.0 to 1.4.0

    Bumps github.com/spf13/cobra from 1.3.0 to 1.4.0.

    Release notes

    Sourced from github.com/spf13/cobra's releases.

    v1.4.0

    Winter 2022 Release ❄️

    Another season, another release!

    Goodbye viper! 🐍 🚀

    The core Cobra library no longer requires Viper and all of its indirect dependencies. This means that Cobra's dependency tree has been drastically thinned! The Viper dependency was included because of the cobra CLI generation tool. This tool has migrated to spf13/cobra-cli.

    It's pretty unlikely you were importing and using the bootstrapping CLI tool as part of your application (after all, it's just a tool to get going with core cobra).

    But if you were, replace occurrences of

    "github.com/spf13/cobra/cobra"
    

    with

    "github.com/spf13/cobra-cli"
    

    And in your go.mod, you'll want to also include this dependency:

    github.com/spf13/cobra-cli v1.3.0
    

    Again, the maintainers do not anticipate this being a breaking change to users of the core cobra library, so minimal work should be required for users to integrate with this new release. Moreover, this means the dependency tree for your application using Cobra should no longer require dependencies that were inherited from Viper. Huzzah! 🥳

    If you'd like to read more

    Documentation 📝

    Other 💭

    Shoutout to our awesome contributors helping to make this cobra release possible!! @​spf13 @​marckhouzam @​johnSchnake @​jpmcb @​liggitt @​umarcor @​hiljusti @​marians @​shyim @​htroisi

    Changelog

    Sourced from github.com/spf13/cobra's changelog.

    Cobra Changelog

    v1.1.3

    • Fix: release-branch.cobra1.1 only: Revert "Deprecate Go < 1.14" to maintain backward compatibility

    v1.1.2

    Notable Changes

    • Bump license year to 2021 in golden files (#1309) @​Bowbaq
    • Enhance PowerShell completion with custom comp (#1208) @​Luap99
    • Update gopkg.in/yaml.v2 to v2.4.0: The previous breaking change in yaml.v2 v2.3.0 has been reverted, see go-yaml/yaml#670
    • Documentation readability improvements (#1228 etc.) @​zaataylor etc.
    • Use golangci-lint: Repair warnings and errors resulting from linting (#1044) @​umarcor

    v1.1.1

    • Fix: yaml.v2 2.3.0 contained a unintended breaking change. This release reverts to yaml.v2 v2.2.8 which has recent critical CVE fixes, but does not have the breaking changes. See spf13/cobra#1259 for context.
    • Fix: correct internal formatting for go-md2man v2 (which caused man page generation to be broken). See spf13/cobra#1049 for context.

    v1.1.0

    Notable Changes

    • Extend Go completions and revamp zsh comp (#1070)
    • Fix man page doc generation - no auto generated tag when cmd.DisableAutoGenTag = true (#1104) @​jpmcb
    • Add completion for help command (#1136)
    • Complete subcommands when TraverseChildren is set (#1171)
    • Fix stderr printing functions (#894)
    • fix: fish output redirection (#1247)

    v1.0.0

    Announcing v1.0.0 of Cobra. 🎉

    Notable Changes

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/spf13/viper from 1.9.0 to 1.10.1

    deps: bump github.com/spf13/viper from 1.9.0 to 1.10.1

    Bumps github.com/spf13/viper from 1.9.0 to 1.10.1.

    Release notes

    Sourced from github.com/spf13/viper's releases.

    v1.10.1

    This is a maintenance release upgrading the Consul dependency fixing CVEs.

    v1.10.0

    This is a maintenance release primarily containing minor fixes and improvements.

    Changes

    Added

    • Experimental finder based on io/fs
    • Tests are executed on Windows
    • Tests are executed on Go 1.17
    • Logger interface to decouple Viper from JWW

    In addition to the above changes, this release comes with minor improvements, documentation changes an dependency updates.

    Many thanks to everyone who contributed to this release!

    Commits
    • f646c50 chore(deps): update dependencies
    • a4bfcd9 chore(deps): update crypt
    • 1cb6606 build(deps): bump gopkg.in/ini.v1 from 1.65.0 to 1.66.2
    • a785a79 refactor: replace jww with the new logger interface
    • f1f6b21 feat: add logger interface and default implementation
    • c43197d build(deps): bump github.com/mitchellh/mapstructure from 1.4.2 to 1.4.3
    • 2abe0dd build(deps): bump gopkg.in/ini.v1 from 1.64.0 to 1.65.0
    • 8ec82f8 chore(deps): update crypt
    • 35877c8 chore: fix lint
    • 655a0aa chore(deps): update golangci-lint
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • deps: bump github.com/spf13/cobra from 1.2.1 to 1.3.0

    deps: bump github.com/spf13/cobra from 1.2.1 to 1.3.0

    Bumps github.com/spf13/cobra from 1.2.1 to 1.3.0.

    Release notes

    Sourced from github.com/spf13/cobra's releases.

    v1.3.0 - The Fall 2021 release 🍁

    Completion fixes & enhancements 💇🏼

    In v1.2.0, we introduced a new model for completions. Thanks to everyone for trying it, giving feedback, and providing numerous fixes! Continue to work with the new model as the old one (as noted in code comments) will be deprecated in a coming release.

    • DisableFlagParsing now triggers custom completions for flag names #1161
    • Fixed unbound variables in bash completions causing edge case errors #1321
    • help completion formatting improvements & fixes #1444
    • All completions now follow the help example: short desc are now capitalized and removes extra spacing from long description #1455
    • Typo fixes in bash & zsh completions #1459
    • Fixed mixed tab/spaces indentation in completion scripts. Now just 4 spaces #1473
    • Support for different bash completion options. Bash completions v2 supports descriptions and requires descriptions to be removed for menu-complete, menu-complete-backward and insert-completions. These descriptions are now purposefully removed in support of this model. #1509
    • Fix for invalid shell completions when using ~/.cobra.yaml. Log message Using config file: ~/.cobra.yaml now printed to stderr #1510
    • Removes unnecessary trailing spaces from completion command descriptions #1520
    • Option to hid default completion command #1541
    • Remove __complete command for programs without subcommands #1563

    Generator changes ⚙️

    Thanks to @​spf13 for providing a number of changes to the Cobra generator tool, streamlining it for new users!

    • The Cobra generator now won't automatically include Viper and cleans up a number of unused imports when not using Viper.
    • The Cobra generator's default license is now none
    • The Cobra generator now works with Go modules
    • Documentation to reflect these changes

    New Features ⭐

    • License can be specified by their SPDX identifiers #1159
    • MatchAll allows combining several PositionalArgs to work in concert. This now allows for enabling composing PositionalArgs #896

    Bug Fixes 🐛

    • Fixed multiple error message from cobra init boilerplates #1463 #1552 #1557

    Testing 👀

    • Now testing golang 1.16.x and 1.17.x in CI #1425
    • Fix for running diff test to ignore CR for windows #949
    • Added helper functions and reduced code reproduction in args_test #1426
    • Now using official golangci-lint github action #1477

    Security 🔏

    • Added GitHub dependabot #1427
    • Now using Viper v1.10.0
      • There is a known CVE in an indirect dependency from viper: spf13/cobra#1538. This will be patched in a future release

    Documentation 📝

    • Multiple projects added to the projects_using_cobra.md file: #1377 #1501 #1454
    • Removed ToC from main readme file as it is now automagically displayed by GitHub #1429
    • Documentation correct for when the --author flag is specified #1009
    • shell_completions.md has an easier to use snippet for copying and pasting shell completions #1372

    ... (truncated)

    Commits
    • 178edbb Bump github.com/spf13/viper from 1.9.0 to 1.10.0 (#1561)
    • 9054739 Remove __complete cmd for program without subcmds (#1563)
    • 19c9c74 Always include the os package import when generating the root command (#1557)
    • 01e05b8 Bump github.com/spf13/viper from 1.8.1 to 1.9.0 (#1554)
    • 36bff0a fix root.go.golden (#1552)
    • 1854bb5 Fix some typos (mostly found by codespell) (#1514)
    • ff2c55e chore(ci): use golangci-lint-action (#1477)
    • 1beb476 fix: Duplicate error message from cobra init boilerplates (#1463)
    • 6f84ef4 Provide option to hide default 'completion' cmd (#1541)
    • ee75a2b Remove trailing spaces from bash completion command description (#1520)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies go 
    opened by dependabot[bot] 1
  • empty sub-directories not included in directory status output

    empty sub-directories not included in directory status output

    [email protected]:~/repo$ dud init
    Dud project initialized.
    See .dud/config.yaml and .dud/rclone.conf to customize the project.
    
    [email protected]:~/repo$ mkdir -p bish/bash/bosh
    
    [email protected]:~/repo$ dud stage gen -o bish/ > bish.yaml
    
    [email protected]:~/repo$ dud stage add bish.yaml
    Added bish.yaml to the index.
    
    [email protected]:~/repo$ dud status
    bish.yaml  stage definition not checksummed
      bish
    

    The last line should not be empty. Options for correcting the behavior:

    1. Include a "... in X directories" note in the output. In this example: bish: 0 files in 3 directories or bish: 3 directories: 0 files. This is my preferred solution at present.
    2. Output x3 empty directory. "empty" here is misleading, because empty is only considering files, not sub-dirs. This option should probably be avoided.
    3. Output x1 empty directory. This may also be misleading, because there are actually three (nested) directories.
    4. Combine 1) and 3): x1 empty directory across 3 directories. This could be more confusing than helpful, but it gives the most information to the user.
    bug 
    opened by kevin-hanselman 0
  • add diff command to summarize the differences between two cached artifacts

    add diff command to summarize the differences between two cached artifacts

    Usage:

    dud diff <checksum_a> <checksum_b>
    

    or

    dud diff <path_to_cached_artifact_a> <path_to_cached_artifact_b>
    

    If the cached artifacts are directory manifests, the directory artifacts are recursively loaded and a diff of the full structure is displayed. (cmp.Diff could be used to accomplish this.)

    If the cached artifacts are NOT directory manifests, the location of the first difference is displayed (e.g., "first difference detected at byte X")

    If the cached artifacts are a directory manifest and a binary file, display as much.

    enhancement 
    opened by kevin-hanselman 0
  • add flag to dry-run impactful commands

    add flag to dry-run impactful commands

    At the least, remote cache commands (e.g. fetch, push, pull) should support some sort of dry-run. Commit and checkout could also benefit from a dry-run flag, but are not as critical.

    enhancement 
    opened by kevin-hanselman 0
  • Provide a means to prevent files/directories/patterns from being tracked as part of a directory artifact

    Provide a means to prevent files/directories/patterns from being tracked as part of a directory artifact

    The main benefit is that directory artifacts could be created from directories whose contents should not be entirely tracked by Dud.

    Option 1: Support .dudignore files that work similarly to .gitignore files. This is likely most useful when there's a pattern that shouldn't be tracked project-wide.

    Option 2: Replace Artifact.DisableRecursion with a .gitignore-style list defined in the stage YAML itself. In this approach, YAML files remain standalone; no separate .dudignore file affects the definition of the stage.

    Decoupling ignored patterns from the stages/artifacts themselves has strong pros and strong cons.

    enhancement low priority 
    opened by kevin-hanselman 0
Releases(v0.4.0)
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

null 2.8k Aug 11, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

InfluxData 2.1k Aug 3, 2022
Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Prometheus Common Data Exporter Prometheus Common Data Exporter 用于将多种来源(如http响应报文、本地文件、TCP响应报文、UDP响应报文)的Json、xml、yaml或其它格式的数据,解析为Prometheus metric数据。

null 7 May 18, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

AdRoll 150 Jul 12, 2022
sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Neil O'Toole 384 Aug 4, 2022
This project is meant to make you code a digital version of an ant farm

This project is meant to make you code a digital version of an ant farm. Create a program lem-in that will read from a file (describing the ants and the colony) given in the arguments. Upon successfully finding the quickest path, lem-in will display the content of the file passed as argument and each move the ants make from room to room. How does it work? You make an ant farm with tunnels and rooms. You place the ants on one side and look at how they find the exit.

null 0 Dec 24, 2021
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Mozilla Services 3.4k Aug 5, 2022
Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

null 412 Jun 24, 2022
churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline appli

churrodata 13 Mar 10, 2022
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Merico 40 Aug 6, 2022
A library for performing data pipeline / ETL tasks in Go.

Ratchet A library for performing data pipeline / ETL tasks in Go. The Go programming language's simplicity, execution speed, and concurrency support m

Daily Burn 385 Jan 19, 2022
A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Stripe 1.6k Aug 7, 2022
Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

好未来技术 840 Aug 4, 2022
Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

whitaker-io 113 Jul 31, 2022
Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

null 9 Apr 4, 2022
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Aug 5, 2022
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

null 6k Aug 9, 2022
Graphik is a Backend as a Service implemented as an identity-aware document & graph database with support for gRPC and graphQL

Graphik is a Backend as a Service implemented as an identity-aware, permissioned, persistant document/graph database & pubsub server written in Go.

null 304 Jul 17, 2022