A virtual file system for small to medium sized datasets (MB or GB, not TB or PB). Like Docker, but for data.

Overview

AetherFS assists in the production, distribution, and replication of embedded databases and in-memory datasets. You can think of it like Docker, but for data.

AetherFS provides engineers with a platform to manage collections of files called datasets. It optimizes its use of the underlying blob store (AWS S3 or equivalent) to reduce cost to operators and improve performance for end users.

Why not use S3 directly or a file server?

While this is an option, there are several problems that arise with this solution. For example, to produce two references to the same dataset, you must upload the same set of files twice. If you want to produce three references, then three times (and so on). This comes at a cost of additional time in your pipeline and storage costs.

Instead, producers tag datasets in AetherFS. A tag can refer to a specific version (semantic or calendar) or a channel that consumers can subscribe to (latest, stable, etc.). Instead of storing entire snapshots of datasets in each version, AetherFS removes duplicated blocks between them. This allows clients to re-use blocks of data and only download new or updated portions.

Status

This project is under active development. The lists below detail aspirational features and documentation.

  • Documentation
  • Features
    • HTTP file server for ease of interaction
    • REST and gRPC APIs for programmatic interaction
    • Optional agent that can manage a shared or FUSE file system
    • Efficiently persist and query information stored in AWS S3
    • Authenticate using common schemes (such as OIDC)
    • Enforce access control around datasets
    • Encrypt data in transit and at rest
    • Built-in developer tools to help understand dataset performance and usage

Expectations & Roadmap

Since I'm mostly iterating on this project in my free time, I plan on using calendar versioning. Bugfixes and minor features can be introduced in any patch version but any major feature should wait for the next release. Releases happen in October, February, and June (every 4 months). Any security issues will be addressed in a timely manner, regardless of release schedule.

v21.10

This will be the initial release of AetherFS. It includes the "essentials".

  • Single binary containing all components.
  • Command to run an AetherFS data hub.
  • Command to upload to and tag datasets in AetherFS.
  • Command to download tagged datasets from AetherFS.
  • Minimal web interface.

v22.02

As the second major release of the AetherFS system, this will include additional security measures and helps simplify interaction for end users (provided there's interest in the system).

  • Command to run an agent process with a FUSE file system.
  • Block caching to improve performance and usage of S3.
  • Command to authenticate clients.
  • Enforce access controls around datasets.
  • Data encrypted in transit.
You might also like...
Dragonfly is an intelligent P2P based image and file distribution system.
Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

File system for GitHub
File system for GitHub

HUBFS · File System for GitHub HUBFS is a read-only file system for GitHub and Git. Git repositories and their contents are represented as regular dir

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go
GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go Overview GeeseFS allows you to mount an S3 bucket as a file sys

Encrypted File System in Go

Getting Started: Setup the environment: Install GoLang: $ sudo apt update $ sudo apt upgrade $ sudo apt install libssl-dev gcc pkg-config $ sudo apt

A rudimentary go program that allows you to mount a mongo database as a FUSE file system

This is a rudimentary go program that allows you to mount a mongo database as a

Gokrazy mkfs: a program to create an ext4 file system on the gokrazy perm partition

gokrazy mkfs This program is intended to be run on gokrazy only, where it will c

Simple but powerful manager for your dotfiles
Simple but powerful manager for your dotfiles

Dotman The dotfile manager you are searching for Version v0.3 [Next] Installer scripts Bug fixes v0.2 [Now] Automatic git support added v0.1 Initial v

Ripgrep but for gzip-compressed files over http

Juicer It's ripgrep but for Gzip-compressed files over HTTP! This tool was primarily designed to scan thru the Common Crawl dataset for URLs without s

SeaweedFS is a distributed storage system for blobs, objects, files, and data warehouse, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV.
Comments
  • feat(file-server): setup an HTTP file server to make it easy to work with files

    feat(file-server): setup an HTTP file server to make it easy to work with files

    Go's out of box http.FileServer allows us to serve file content efficiently over HTTP. It supports range requests and If-Modified-Since semantics. This glues together the http.FileServer API and the the underlying dataset and block APIs.

    opened by mjpitz 0
  • issues with NFS handle

    issues with NFS handle

    So I'm pretty sure this comes from the current implementation of a caching handler. Should the NFS client connect ever change servers, then the caching handler returns a stale connection.

    Option one here would be to support service topology routing that favors host communication exclusively in the daemonset deployment and region/zone communication in the deployment. This is really just a mitigation though. To handle this properly, we'll need to replace the caching handler.

    There are a few things we can do to cut down on the number of handles that are created.

    opened by mjpitz 0
  • issues with bleve / bolt on read only NFS

    issues with bleve / bolt on read only NFS

    Alright, this was a rather aspirational first use case... but figured I'd document it

    When using the default storage engine with blevesearch (boltdb, specifically bbolt), mounting a read-only NFS volume interferes with the technologies ability to obtain file locks (even in a read-only mode which I find odd). I need to dig into this a bit more to determine if it's an NFS limitation or how boltdb manages locks...

    Ideally, this solution is great for solutions who do not need locks for read only connections...

    opened by mjpitz 0
  • swap

    swap "dataset" with "artifact"

    I spent a bit more time thinking about the release asset distribution use case. I want to dig into that a bit more, but it seems like there would be a fair amount duplicated between releases.

    opened by mjpitz 0
Releases(v21.10.0)
  • v21.10.0(Oct 20, 2021)

    Changelog

    cca34ea 21.10.0 d398540 ci: use proper target bb0e0bf ci: ignore package-lock.json 3d2819b ci: remove package lock json 4a5942a ci: pass in right variables af342e9 ci: release helpers 96bf616 ci: set dependency version using array offset ae9bdb8 ci: run test after distribute 4dd9b40 feat(ci): add ci workflows (#5) a1f2201 fix: read aws, minio, and custom environment variables 9cca884 ci: minor edits to scripts 16245a2 doc: minor typo 9e57412 doc: finish arch (#4) a0a4a9e doc: add more to arch 3ef343f doc: add overview 9df5f27 doc: adjust syntax now that I know it works c9d9b4f doc: try adding an inline image 1112382 doc: add padding to png f1ce096 doc: add image 425c507 feat(ui): add swagger browser and generated files ae92d83 minor edits 7daba44 feat(web): add web ui (#3) 1ed939b doc: update architecture and fix up path in block api 3eea721 docs: typo 62bb0f6 docs: update with roadmap c83c840 deploy: initial pass at a helm chart 7c33bb2 docs: fix fragmented sentence 963cd59 license: simplify header de1ef35 relicense 3043bed simplify tls 6d52d5e pull works, not well, but it works b39efae minor edits 784e3ee feat(s3): tied into s3 (#2) 5fd6763 doc: update requirement 9269d7d break out version command and update comments 02da611 doc: add milestones and deliverables bea7efc doc: remove checkbox from architecture 777efc0 add more to architecture document 2cf267e feat(file-server): setup an HTTP file server to make it easy to work with files (#1) b7e562f feat(api): add file server definition 55d0fdd feat(core): implement bulk of core server frameworking 941b1b3 add legal header ab4dbed minor edits 2d9f014 start to work in some authentication 0e0bab5 feat: support converting a structure defining the configuration into flags fbd32bc feat(deploy): add dex for identity management d84f844 fix: better defaults efd74df fix(cli): break commands out into internal package 8c94e0e fix: add missing dockerfile and update targets d7b5475 initial release tooling f1f22bc feat(legal): add header for files e9dce91 fix(proto): upgrade buf config version 12fcd1f feat(proto): initial api definition and related code/documentation generation c383a6d doc: add architecture

    Source code(tar.gz)
    Source code(zip)
    aetherfs-21.10.0.tgz(34.13 KB)
    aetherfs-hub-21.10.0.tgz(3.61 KB)
    aetherfs_checksums.txt(466 bytes)
    aetherfs_darwin_amd64.tar.gz(8.19 MB)
    aetherfs_darwin_arm64.tar.gz(8.13 MB)
    aetherfs_linux_amd64.tar.gz(7.94 MB)
    aetherfs_linux_arm64.tar.gz(7.39 MB)
Owner
mya
Principal Software Engineer (Golang, NodeJS, Java, Python) | homesteader | she/her
mya
Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

C2FO 212 Jan 3, 2023
A Small Virtual Filesystem in Go

This is a virtual filesystem I'm coding to teach myself Go in a fun way. I'm documenting it with a collection of Medium posts that you can find here.

Alyson 32 Dec 11, 2022
Plik is a scalable & friendly temporary file upload system ( wetransfer like ) in golang.

Want to chat with us ? Telegram channel : https://t.me/plik_root_gg Plik Plik is a scalable & friendly temporary file upload system ( wetransfer like

root.gg 1.1k Jan 2, 2023
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Tom 1.7k Jan 3, 2023
Ghostinthepdf - This is a small tool that helps to embed a PostScript file into a PDF

This is a small tool that helps to embed a PostScript file into a PDF in a way that GhostScript will run the PostScript code during the

Emil Lerner 135 Dec 20, 2022
A small tool for sending a single file to another machine

file-traveler A small tool for sending a single file to another machine. Build g

Vence Lam 1 Dec 28, 2021
🌳 Go Bonzai™ File Completer, normal completion looking at files and directories with trailing slashes on directories (like bash)

?? Go Bonzai™ File Completer, normal completion looking at files and directories with trailing slashes on directories (like bash)

Rob Muhlestein 2 Apr 12, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site

Bigfile ———— a file transfer system that supports http, rpc and ftp protocol 简体中文 ∙ English Bigfile is a file transfer system, supports http, ftp and

null 238 Dec 31, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Rafal Jeczalik 788 Dec 31, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

fsnotify 7.7k Jan 1, 2023