S3pd - CLI utility that downloads multiple s3 objects at a time, with multiple range-requests issued per object

Related tags

DevOps Tools s3pd
Overview

S3 Parallel Downloader

CLI utility that downloads multiple s3 objects at a time, with multiple range-requests issued per object. It also has support for copying between local filesystem locations using multiple threads.

Operations will always recurse the specified directories. When reading from a local filesystem, symlinks will not be followed.

Known Issues:

  • If your S3 bucket has a folder and object with the same name, this utility will fail. (E.g. s3://mybucket/test.txt && s3://mybucket/test.txt/another-object.txt). This fails as POSIX filesystems cannot have a folder and file with the same absolute path.
  • Support for writing to S3 has not yet been added.

Benchmark resuts

65.6069Gibps - 7926ms transfering 65GiB of data - downloaded 2,080 32MiB objects across 370 (185 * 2) concurrent HTTP requests

./s3pd-linux-amd64 \
--region=us-west-2 \
--workers=185 \
--threads=2 \
--partsize=$((4*1024*1024)) \
s3://test-400gbps-s3/32MiB/ /mnt/ram-disk

73.9592Gibps - 31585ms transfering 292GiB of data - downloaded 146 2GiB objects across 1,280 (40*32) concurrent HTTP requests

./s3pd-linux-amd64 \
--region=us-west-2 \
--workers=40 \
--threads=32 \
--partsize=$((16*1024*1024)) \
s3://test-400gbps-s3/2GiB/ /mnt/ram-disk

258.0953Gibps - 2014ms transferring 65GiB of data from a local RAM disk to a local RAM disk

./s3pd-linux-amd64 \
--workers=300 \
--threads=1 \
--partsize=$((128*1024)) \
/mnt/ram-disk/32MiB /mnt/ram-disk/234

Example CLI usage

Equivalent to: aws s3 cp s3://ml-training-dataset/pictures/* /mnt/nvme-local-disks But instead of downloading 1 object at a time, it'll download 40 objects at a time, with a higher concurrency rate than the aws s3 utility.

./s3pd-linux-amd64 \
--region=us-west-2 \
--workers=40 \
--threads=32 \
--partsize=$((16*1024*1024)) \
s3://ml-training-dataset/pictures /mnt/my-nvme-local-disks

If you just want to run a benchmark, and avoid needing to spin up a large-enough RAM disk, you can use the --benchmark flag which will only store the data temporarily in an in-memory buffer. For example:

./s3pd-linux-amd64 \
--workers=40 \
--threads=32 \
--partsize=$((16*1024*1024)) \
--benchmark \
s3://test-400gbps-s3/2GiB/

If you want to copy between the local filesystem

./s3pd-linux-amd64 \
--workers=40 \
--threads=32 \
--partsize=$((8*1024*1024)) \
/mnt/my-nvme-disk-1/datasetA /mnt/my-nvme-disk-2/
You might also like...
CPU usage percentage is the ratio of the total time the CPU was active, to the elapsed time of the clock on your wall.

Docker-Kubernetes-Container-CPU-Utilization Implementing CPU Load goroutine requires the user to call the goroutine from the main file. go CPULoadCalc

After approve this contract, you can use the contract to adventure with multiple characters at the same time
After approve this contract, you can use the contract to adventure with multiple characters at the same time

MultipleRarity 又又又更新了! MultipleRarity最新版:0x8ACcaa4b940eaFC41b33159027cDBDb4A567d442 注:角色冷却时间不统一时,可以不用管能不能冒险或升级,合约内部加了筛选,但消耗的gas增加了一点点,介意的可以使用常规修复版。 Mu

A kubectl plugin to query multiple namespace at the same time.

kubemulti A kubectl plugin to query multiple namespace at the same time. $ kubemulti get pods -n cdi -n default NAMESPACE NAME

Sample Driver that provides reference implementation for Container Object Storage Interface (COSI) API

cosi-driver-minio Sample Driver that provides reference implementation for Container Object Storage Interface (COSI) API Community, discussion, contri

Converts your k8s YAML to a cdk8s Api Object.

kube2cdk8s Converts your k8s YAML to a cdk8s Api Object. Uses Pulumi's kube2pulumi as a base. Dependencies 1. pulumi cli 2. pulumi kubernetes provider

Simple online syncing tool for Oracle Object Store

TrollBox ... use your storage with Oracle Object Store Quick Start Make sure you have the Object Storage, bucket and you know the compartment id where

Reward is a Swiss Army knife CLI utility for orchestrating Docker based development environments.
Reward is a Swiss Army knife CLI utility for orchestrating Docker based development environments.

Reward Reward is a Swiss Army knife CLI utility for orchestrating Docker based development environments. It makes possible to run multiple local envir

crud is a cobra based CLI utility which helps in scaffolding a simple go based micro-service along with build scripts, api documentation, micro-service documentation and k8s deployment manifests

crud crud is a CLI utility which helps in scaffolding a simple go based micro-service along with build scripts, api documentation, micro-service docum

A simple CLI and API client for One-Time Secret

OTS Go client otsgo is a simple CLI and API client for One-Time Secret written i

Owner
Colin Bookman
Colin Bookman
Package trn introduces a Range type with useful methods to perform complex operations over time ranges

Time Ranges Package trn introduces a Range type with useful methods to perform c

CappuccinoTeam 39 Aug 18, 2022
A long-running Go program that watches a Youtube playlist for new videos, and downloads them using yt-dlp or other preferred tool.

ytdlwatch A long-running Go program that watches a Youtube playlist for new videos, and downloads them using yt-dlp or other preferred tool. Ideal for

Raine Virta 9 Jul 25, 2022
Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Devtron Labs 195 Dec 19, 2022
A controller managing namespaces deployments, statefulsets and cronjobs objects. Inspired by kube-downscaler.

kube-ns-suspender Kubernetes controller managing namespaces life cycle. kube-ns-suspender Goal Usage Internals The watcher The suspender Flags Resourc

Virtuo 62 Dec 27, 2022
kcount counts Kubernetes (K8s) objects across clusters.

kcount counts Kubernetes (K8s) objects across clusters. It gets the cluster configuration, including cluster name and namespace, from kubeconfig files

Deutsche Telekom Pan-Net 13 Sep 23, 2022
provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

International Business Machines 2 Dec 14, 2022
Crossplane provider to provision and manage Kubernetes objects on (remote) Kubernetes clusters.

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

Crossplane Contrib 69 Jan 3, 2023
cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Overview cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resou

Daimler Group 61 Oct 27, 2022
Cloudflare-operator - Manage Cloudflare DNS records with Kubernetes objects

cloudflare-operator Documentation The goal of cloudflare-operator is to manage C

containeroo 14 Nov 16, 2022
Buildkit tekton: Buildkit frontend to run Tekton objects locally

buildkit-tekton Buildkit frontend to run Tekton objects locally. Usage With Dock

Vincent Demeester 21 Jun 16, 2022