a sharded store to hold large IPLD graphs efficiently, packaged as location-transparent attachable CAR files, with mechanical sympathy

Related tags

Utilities dagstore
Overview
Issues
  • [WIP] Automated LRU GC sketch PR

    [WIP] Automated LRU GC sketch PR

    The problem with automated LRU GC is that it can also GC shards that are being served/initialised. This will fail the ongoing initialise/acquire/blockstore read operations for those shards. We simply look at the directory size and keep removing transients till we hit our target. We can launch with this for V0 and fix it down the line.

    Clients who are not okay with interrupting existing shard acquires/initialisation can continue using the existing manual but safe GC mechanism for now.

    It's hard to make the automated LRU GC thread safe with the async shard acquire and shard init ops as they write to the transient directory in their own go-routine outside the dagstore event loop. The way the existing manual GC mechanism gets away with this is by ignoring shards that are being initialised or served.

    The other thing to note here is that we may not always have Mounts that can tell you the size of the CAR upfront because the CAR maybe the result of a remote (root, selector) traversal and the size will be known only upon fetching the subDAG as a result of a remote traversal. One example of this is the IPFS Gateway which does NOT have a HEAD method to know the size of a (root, selector) CAR upfront. It only exposes a GET API to stream the result of a (root, selector) traversal but does NOT have a HEAD method to know the size upfront. This makes the reactive "evict before a Mount fetch" strategy hard to implement. The simple thing to do here is to always run the GC proactively in the dagstore event loop when we detect watermark breach.

    opened by aarshkshah1992 9
  • lazy indexing

    lazy indexing

    Migrating existing lotus deals to the DAG store currently entails having to unseal every deal to eagerly index it. This is prohibitive. We need to add support for lazy indexing, specifically for the migration, but also for other use cases down the road.

    • [x] Add a LazyIndexing option to RegisterOpts that disables indexing on registration.
      • We'll enable this when we register shards for deals that the storage subsystem has no unsealed CAR for.
    • [x] Add logic to index an unindexed shard when it's first acquired.
    • [x] Unit tests.
    opened by raulk 5
  • Upgrade to `go-car` `v2.0.0-beta1` tag

    Upgrade to `go-car` `v2.0.0-beta1` tag

    Update dependency to latest tagged release of go-car:

    • Fix breaking change to read option.
    • Fix breaking changes to CARv2 index package in tests.

    Run go mod tidy

    Relates to:

    • https://github.com/ipld/go-car/releases/tag/v2.0.0-beta1
    opened by masih 4
  • Ignore nil channels in the dispatcher

    Ignore nil channels in the dispatcher

    If the code directly writes to the channel that the dispatcher function is reading from, we could escape the nil channel check we've put in place in the dispatchResult function.

    opened by aarshkshah1992 3
  • Inverted Index

    Inverted Index

    This is the first draft of the top-level inverted Index in the DAGStore which will enable Markets to serve retrieval requests for any cid in a deal-dag and NOT just the payloadCid.

    Discussion points:

    • There is a race between a deal expiring causing the corresponding entries being removed from the inverted Index and a client doing a lookup on the top-level Index. Do we live with this ?

    Blockers: We need to block on a CARv2 Index iterator that the data-systems team is working on.

    opened by aarshkshah1992 2
  • initial implementation of the dagstore.

    initial implementation of the dagstore.

    🐉🐉🐉🐉🐉🐉 BEWARE OF DRAGONS!


    This PR contributes an initial implementation of the DAG store.

    Concurrency model

    The DAG store operates on an event loop to perform mutations to the shard state. This is important as it allows shard operations to be non-blocking (although we can provide a blocking stub on top of this async plumbing). This means that the user does not need to worry about for a shard registration to have returned before they can go and ask to acquire that shard. I explored numerous alternatives and discarded them, such as:

    • using locks all over the place; prone to concurrency bugs and a nightmare to debug races.
    • using per-shard goroutines; this is wasteful as registering 1MM shards would immediately explode onto 1MM goroutines (which is quite feasible and expected for Filecoin miners, 1 deal = 1 shard). Furthermore, the majority of those goroutines would be idle most of the time.

    Finite state machine

    Shards to go through a finite state machine. It would be worth to formalise this FSM.

    Public API

    The public API is quite simple:

    RegisterShard(key shard.Key, mnt mount.Mount, out chan ShardResult, opts RegisterOpts) error
    AcquireShard(key shard.Key, out chan ShardResult, _ AcquireOpts) error
    DestroyShard(key shard.Key, out chan ShardResult, _ DestroyOpts) error
    

    The caller supplies a channel where they want the result returned, when available. This is much better and flexibile than having the DAG store return a channel, since it enables the caller to send all results to a single channel with a single goroutine servicing it.

    The caller can also pass a per-call channel if they so desire. Moreover, we can easily create sugar *Sync() method counterparts that do this behind the scenes, giving the appearance of a synchronous interface.

    Mount upgrader

    This PR contributes a "mount upgrader". This is in turn a Mount that acts as a locally cached copy of a remote mount (e.g. Lotus, HTTP, FTP, NFS, etc.)

    Indices

    I was able to remove the FullIndex abstraction, since the CAR library already provides an index.Index interface that we can use directly.

    TODO

    This is an initial, unstable, buggy implementation to get the ball rolling. There are many features missing here, such as:

    • Transient file tracking and cleanup; refcounting in the mount.Upgrader. Aka "scrap area".
      • #26
    • mmap transients.
      • #28
    • Proper failure management (and recovery).
      • #33
    • ~Proper backpressure management (backlogged task channel).~
      • No longer needed, as internal and external channels have been split out in 258d74d.
    • Correct shard destruction.
      • #31
    • Shard release (via ShardAccessor#Close).
      • #27
    • Shard state persistence and recovery upon restart.
      • #32
    • Repopulate transients on restart from the scratch space.
      • #24
    • Transient vs. mount integrity -- how do we recover from a corrupted transient?
      • #25
    • Lots of unit tests.
    • CI.
    • Synchronous sugar on top of async API.
    opened by raulk 2
  • Automated Watermark based LRU garbage collection and Transient Quota allocation

    Automated Watermark based LRU garbage collection and Transient Quota allocation

    This PR introduces automated watermark based LRU GC of transients along with a quota reservation mechanism to allow for downloading transients whose size we do not know upfront.

    • The dagstore now performs automated high->low watermark based GC for transient files.

    • Users who want to use this feature will have to configure a maximum size for the transients directory and the dagstore guarantees that the size of the transients directory will never exceed that limit.

    • Users will also have to configure a high and low watermark for the transients directory. The dagstore will kickstart an automated GC when it detects that the size of the transients directory has crossed the high watermark and will attempt to bring down the directory size below the low watermark threshold.

    • Users will have to configure a GC Strategy that will recommend the order in which reclaimable shards should be GC'd by the automated GC mechanism. The dagstore comes inbuilt with an LRU GC Strategy but users are free to implement their own. See the documentation of GarbageCollectionStrategy for more details.

    • A quota reservation mechanism has been introduced for downloading transients whose size we do not know upfront. To download such a CAR, the downloader will first get a reservation from the dagstore for a preconfigured number of bytes, then download those many bytes and then go back to the allocator for more reservation if it hasn't finished downloading the transient. In the end, it will release unused reserved bytes back to the allocator.

    • The existing manual GC mechanism works as is and no changes have been made to it.

    Known Edge Case

    There is an unhandled known edge case in the code.

    If a group of concurrent transients downloads end up reserving all the available space in the transients directory but not enough to satisfy their individual downloads, then all of them will end up back-off retrying together for more space to become available. However, no space will become available till one of them exhausts the number of backoff-retry attempts -> fails the download -> releases reserved space. Thus, the dagstore will not make any progress with new downloads till one of the download fails and releases it's reservation.

    However, this edge case should be mitigated by:

    1. Rate limiting the number of concurrent transients fetches
    2. Giving higher reservations to older downloads vs newer downloads.
    opened by aarshkshah1992 1
  • honour context cancellation on acquire.

    honour context cancellation on acquire.

    If the acquirer cancelled the context, we would never deliver the accessor and never release the shard. This was visible when testing out the new migration logic on Lotus and interrupting the lotus dagstore initialize-all command.

    opened by raulk 1
  • refactor GC and implement reconciliation.

    refactor GC and implement reconciliation.

    This PR closes #58.

    It implements reconciliation of the transients directory, which consists of removing files that are no longer referenced by any mount, either as a complete or partial transient file.

    We introduce the concept of a "partial file" in the Upgrader. The partial file is the file where a mount is being fetched into, while the fetch is in progress (cf. .crdownload in Chrome). This is not safe to expose as the TransientPath, because that would imply that it's fully functional. However, we don't want it saved in the Datastore.

    I've also refactor the way that GC is triggered, so it's much neater now.

    opened by raulk 1
  • do not use `car.ReadOrGenerateIndex`; parse versions and capabilities instead, and generate index explicitly

    do not use `car.ReadOrGenerateIndex`; parse versions and capabilities instead, and generate index explicitly

    See thread here: https://github.com/filecoin-project/dagstore/pull/49#discussion_r667323603

    Refer to old code as a possible reference: https://github.com/filecoin-project/dagstore/commit/48bf452aa28139cde21dc577e904bb901fe229e9#diff-6fe0b61e2e37465ab2b8a2feee25fed61577337c1a959c5619fb5f0bf36ef28aR553-R586

    opened by raulk 1
  • Fix Transient file cleanup, support CARv1 and CARv2 and remove the `indexed` field from Shard State

    Fix Transient file cleanup, support CARv1 and CARv2 and remove the `indexed` field from Shard State

    Closes https://github.com/filecoin-project/dagstore/issues/27. ~Closes https://github.com/filecoin-project/dagstore/issues/26.~ Closes https://github.com/filecoin-project/dagstore/issues/47. Closes https://github.com/filecoin-project/dagstore/issues/48.

    In addition to the above:

    • Transient file cleanup MUST be based on transient file ref-counting, not on Shard access refcounting. We can run into all sorts of hairy bugs otherwise such as the async go-routines failing to decrement the refcount if they fail, not incrementing the refcount during shard registartion even though we do access the transient for Indexing etc etc.
    • Fixes a bug where we were always expecting Mounts to ONLY return CARv2 files. We now support both CARv1 and CARv2.
    • Fixes shard access refcounting -> we weren't releasing refcounts for when the shard acquire go-routines failed.
    • Fixes nil value access exception during spinning up the initialisation go-routine to re-register an initialising shard upon resumption.
    • Removes the ShardStateServing state. This information can be discerned from the shard refcount and it's one less thing for us to keep track of/update correctly.
    • The FS mount should support resumption by serializing the entire file path and NOT just the file name. Need this for resumption tests to work.
    • Fixed a hairy bug that was blocking the entire event loop on a mount.Fetch operation. Basically shard persistence that happens in the event loop needs to take the Upgrader lock to get the transient file path but the underlying mount fetch also needs to take that look to ensure we fetch only once... Boom ! Have moved to a deterministic naming scheme for upgrader files and we lookup files for a given shard key on resumption rather than persisting the transient file path to the shard state.
    • Introduces solid tests for shard access refcount, dagstore resumption, blocking mounts and upgrader file refcounts.

    TODO

    • [x] Solid tests for the Upgrader.

    • We need to talk about failure management and recovery. I saw a case where we can block acquirers forever if we arent careful. Have commented on this PR at the relevant code line. I'm sure there are more bugs hiding in there.

    opened by aarshkshah1992 1
  • Automated Watermark based GC and Transient Quota allocation

    Automated Watermark based GC and Transient Quota allocation

    This is a meta-issue to track the work of introducing an automated watermark based LRU GC of transients along with a quota reservation mechanism to allow for downloading transients whose size we do not know upfront.

    The work is spread across multiple PRs.

    High level overview

    • The dagstore now performs automated high->low watermark based GC for transient files.

    • Users who want to use this feature will have to configure a maximum size for the transients directory and the dagstore guarantees that the size of the transients directory will never exceed that limit.

    • Users will also have to configure a high and low watermark for the transients directory. The dagstore will kickstart an automated GC when it detects that the size of the transients directory has crossed the high watermark and will attempt to bring down the directory size below the low watermark threshold.

    • Users will have to configure a GC Strategy that will recommend the order in which reclaimable shards should be GC'd by the automated GC mechanism. The dagstore comes inbuilt with an LRU GC Strategy but users are free to implement their own. See the documentation of GarbageCollectionStrategy for more details.

    • A quota reservation mechanism has been introduced for downloading transients whose size we do not know upfront. To download such a CAR, the downloader will first get a reservation from the dagstore for a preconfigured number of bytes, then download those many bytes and then go back to the allocator for more reservation if it hasn't finished downloading the transient. In the end, it will release unused reserved bytes back to the allocator.

    • The existing manual GC mechanism works as is and no changes have been made to it.

    Known Edge Case

    There is an unhandled known edge case in the code.

    If a group of concurrent transients downloads end up reserving all the available space in the transients directory but not enough to satisfy their individual downloads, then all of them will end up back-off retrying together for more space to become available. However, no space will become available till one of them exhausts the number of backoff-retry attempts -> fails the download -> releases reserved space. Thus, the dagstore will not make any progress with new downloads till one of the download fails and releases it's reservation.

    However, this edge case should be mitigated by:

    1. Rate limiting the number of concurrent transients fetches
    2. Giving higher reservations to older downloads vs newer downloads.

    PRs

    1. Upgrader should reserve and release allocations if transient size is unknown. #130 .
    2. Dagstore event loop does automated watermark based gc and handles quota allocations and reservations. #131 .
    3. Interface for extensible GC with a default LRU implementation. #132 .
    4. Config for Automated GC and tests for the entire feature. #133 .
    opened by aarshkshah1992 0
  • Interfaces for Extendable GC and LRU implementation

    Interfaces for Extendable GC and LRU implementation

    For #134 .

    PR 3 for the Automated GC Work.

    Interface for users to be able to plug in their own GC algorithm. We ship with a default LRU implementation.

    opened by aarshkshah1992 0
  • Dagstore gc and event loop changes

    Dagstore gc and event loop changes

    For #134

    PR 2 for the Automated GC work.

    The dagstore event loop performs automated watermark based GC and handles quota reservations and releases.

    Extendable GC interface used in this PR and an LRU implementation for it is defined at https://github.com/filecoin-project/dagstore/pull/132.

    opened by aarshkshah1992 0
  • Upgrader should reserve and release allocations if transient size is unknown

    Upgrader should reserve and release allocations if transient size is unknown

    For #134 .

    PR 1 for the Automated GC work.

    Upgrader should reserve and release allocations if the transient size is unknown before downloading a file from a remote mount that does not support random access.

    opened by aarshkshah1992 0
  • The README has not been populated

    The README has not been populated "soon"

    This looks like abandonware even though it is supposed to be a key piece of the Lotus stack and one of the pinnacles of large-scale DAG-storage engineering ever made. Is the design up to date? Are there any pointers on how to re-use this somewhere out of Lotus? Is that encouraged?

    opened by hsanjuan 0
Releases(v0.5.2)
  • v0.5.2(Feb 23, 2022)

  • v0.4.5(Jan 12, 2022)

  • v0.5.0(Nov 12, 2021)

  • v0.4.3(Aug 18, 2021)

  • v0.4.1(Aug 5, 2021)

  • v0.4.0(Jul 30, 2021)

  • v0.3.2(Jul 29, 2021)

  • v0.3.0(Jul 26, 2021)

    • Clean up how DAGStore GC is performed.
    • Client should create a filesystem Index repo if it wants one.
    • Persist the shard transient path to disk so we can reuse the transient on dagstore restart.
    • Reconcile and cleanup unreferenced transients on startup.
    • Utility to stop recovering a shard after a given number of failures.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jul 21, 2021)

    • Update the CARv2 dependency to use the Beta release.
    • Fix a bug where we can still end up blocking when dispatching a result to a nil response channel.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jul 13, 2021)

    initial release of the DAG store, supporting:

    • registration of shards, both with eager and lazy initialization
    • mounts system for data location referencing
    • "upgrader" mount that mirrors a remote/non-random-access/non-seekable shard into a local, seekable, random-access shard
    • acquisition of shards at any time, regardless of state of the shard
    • shard accessor abstraction to enable different access patterns to shard data; right now supporting Blockstore()
    • release of shards
    • failure notifications and management
    • failure recovery through explicit call to RecoverShard
    • event tracing
    • refcounting and GC
    • more!
    Source code(tar.gz)
    Source code(zip)
Owner
Filecoin
Filecoin
Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

Patrick D'appollonio 177 Aug 10, 2022
Split multiple Kubernetes files into smaller files with ease. Split multi-YAML files into individual files.

kubectl-slice: split Kubernetes YAMLs into files kubectl-slice is a neat tool that allows you to split a single multi-YAML Kubernetes manifest into mu

Patrick D'appollonio 175 Aug 5, 2022
RoutePlanner suggests circular walks or runs based on start location and desired distance.

RoutePlanner Backend This repository contains code that powers the routeplanner app. The app suggests circular walks or runs based on start location a

null 0 Nov 5, 2021
A Go library for calculating the sunset/sunrise time from a given location.

solar A library for calculating the sunset/sunrise time from a given location, as well as a function to calculate the whitepoint. It is a port of ~ken

null 7 Nov 28, 2021
Deduplicated and GC-friendly string store

This library helps to store big number of strings in structure with small number of pointers to make it friendly to Go garbage collector.

Bastrykov Evgeniy 2 Nov 10, 2021
With Pasteback you can store common terms and paste them back to clipboard

Pasteback With Pasteback you can store common terms and paste them back to clipboard when needed. There is one central spot to put every string into y

null 0 Nov 22, 2021
a tool for creating exploited media files for discord

Discord-Exploits A program for creating exploited media files for discord written in Go. Usage discord-exploits is a command line utility, meaning you

schmenn 219 Dec 29, 2021
ssdt - Survey security.txt files

ssdt - Survey security.txt files A program to quickly survey security.txt files found on the Alexa Top 1 Million websites. The program takes about 15

null 82 Jul 8, 2022
A tool to check problems about meta files of Unity

A tool to check problems about meta files of Unity on Git repositories, and also the tool can do limited autofix for meta files of auto-generated files.

DeNA 60 Jun 24, 2022
libraries for various programming languages that make it easy to generate per-process trace files that can be loaded into chrome://tracing

chrometracing: chrome://tracing trace_event files The chrometracing directory contains libraries for various programming languages that make it easy t

Google 22 Jul 8, 2022
A program for extracting information from chrome session files.

What A tool for dumping the contents of chrome session files. Why? Most tools for reading SNSS files (chrome session files) are either broken or outda

null 37 Jul 23, 2022
Small utility to allow simpler, quicker testing of parsing files in crowdsec

cs_parser_test Small utility to allow simpler, quicker testing of parsing files in crowdsec Usage $ sudo cs_parser_test -t syslog /var/log/mail.log N

david reid 3 Jul 13, 2021
LogAnalyzer - Analyze logs with custom regex patterns.Can search for particular patterns on multiple files in a directory.

LogAnalyzer Analyze logs with custom regex patterns.Can search for particular patterns on multiple files in a directory

Johnson Simon 6 May 31, 2022
Hex dump and read values of files quickly and swiftly with Go-Hex a program designed to dump any file in a hexadecimal format

Go-Hex Hex dump and read values of files quickly and swiftly with Go-Hex a program designed to dump any file in a hexadecimal format Dump Hashes ----

RE43P3R 0 Oct 10, 2021
🔍 A jq-like tool that queries files via glob.

?? fq A jq-like tool that queries files via glob. ✅ Prerequisites Go 1.17+ jq (installed and on PATH) ?? Installation $ go get github.com/siketyan/fq

Naoki Ikeguchi 5 Dec 22, 2021
Package macho implements access to and creation of Mach-O object files.

go-macho [WIP] ?? Package macho implements access to and creation of Mach-O object files. Why ?? This package goes beyond the Go's debug/macho to: Cov

Bogdan Donchenko 6 Dec 22, 2021
A utility library to do files/io/bytes processing/parsing in file-system or network.

goreader A utility library to do files/io/bytes processing/parsing in file-system or network. These features are really common to be implemented for a

VOrishirne 4 Nov 1, 2021
Go-library that facilitates the usage of .env files

Goenv Golang-library that facilitates the use of .env files. Installation go get github.com/fabioelizandro/goenv Usage Place a .env file in the root

Fabio Elizandro 1 Nov 7, 2021
Backme - A backup files organizer

backme --- A backup files organizer Quite often big files (like database dumps)

Species File Group 0 Jan 26, 2022