This library contains utilities that are useful for building distributed services.

Overview

Grafana Dskit

This library contains utilities that are useful for building distributed services.

Current state

This library is still in development. During the first stage we plan to move over utilities from the Cortex project.

Contributing

If you're interested in contributing to this project:

License

Apache 2.0 License

Issues
  • Parallelize memberlist notified message processing

    Parallelize memberlist notified message processing

    What this PR does:

    This PR adapts KV memberlist to process notified messages in parallel.

    It aims to facilitate vertical scalability in conditions where UDP packet pressure is high due to a high number of instances in a memberlist cluster.

    Which issue(s) this PR fixes:

    N/A

    Checklist

    • [X] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    enhancement 
    opened by ortuman 9
  • Empty ring right after startup when using memberlist

    Empty ring right after startup when using memberlist

    In Mimir, we're occasionally seeing "empty ring" ring right after a process startup (e.g. querier). It's an issue that has started after the migration to memberlist.

    Possible root cause

    I think the issue is caused by the ring client implementation not guaranteeing to wait to get the initial ring state before switching to Running state. In the following I share some thoughts about the code.

    The ring client service is expected to switch to Running state only after it initialized its internal state with the ring data structure. This is why it calls r.KVClient.Get() in the Ring.starting(): https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/ring/ring.go#L252-L256

    When using Consul or etcd as backend, the r.KVClient.Get() guarantees to return the state of the ring, but I think this guarantee has been lost in the memberlist implementation and it could return a zero data structure.

    The memberlist client Get() is implemented here: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L63-L70

    It waits until the backend KV client is running. But does waiting for it to be running guarantee the ring data structure to be populated before that? I don't think so.

    The memberlist KV.starting() just initialise memberlist but doesn't join the cluster: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L426-L453

    The memberlist cluster is joined only in the KV.running(), but that's too late, because at that point our code assume the ring data structure to be already populated: https://github.com/grafana/dskit/blob/e441b77be7780e03f2c37659839bfe90dfde7dd3/kv/memberlist/memberlist_client.go#L457-L472

    opened by pracucci 8
  • change the ruler ring key

    change the ruler ring key

    What this PR does:

    Changes the ruler's ring key from ring to ruler so it does not conflict with the ingester's ring key.

    Which issue(s) this PR fixes:

    If the ruler and ingester's rings have the same prefix, then they will both register to the same ring. This can result in the querier trying to query the ruler, which fails. As I don't think there's any need to explicitly prevent rings from sharing a common prefix (for instance, if the KV store is being used for other types of data, they may want to prefix everything grafana related under grafana, or loki, for example), we should make sure each ring key is unique.

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by trevorwhitney 8
  • add util/strings

    add util/strings

    What this PR does: This PR ports util/strings from cortex. It's two utility functions to look for a string in a collection of strings and a function to make a map from a collection of strings. This is used by the ring package.

    Checklist

    • [-] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by treid314 8
  • Add support in the manager to load multiple runtime config files

    Add support in the manager to load multiple runtime config files

    What this PR does: This PR enables users to provide a comma separated list of yaml runtime config files where they will be merged into one yaml document and sent to the underlying service.

    Which issue(s) this PR fixes:

    Fixes https://github.com/grafana/mimir/issues/1798

    Checklist

    • [X] Tests updated
    • [X] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by treid314 7
  • Allow custom status page templates

    Allow custom status page templates

    What this PR does:

    Upstream projects may want to render pages in different ways, applying custom branding where necessary. This allows passing a custom page template to both memberlist and ring status handlers through the configuration.

    Also extracted the templates into separate .gohtml files, this enables proper syntax highlighting in the IDEs. Sorry for the noise here as it's not really important for the change I'm proposing, but I just couldn't see that long constants along with the code.

    Which issue(s) this PR fixes:

    First approach for https://github.com/grafana/dskit/issues/148

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by colega 7
  • Allow to override default of `final-sleep` and change default to `0`

    Allow to override default of `final-sleep` and change default to `0`

    What this PR does: Allows to override default of final-sleep and change default to 0

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [ n/a ] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by dimitarvdimitrov 6
  • Expose memberlist label configs

    Expose memberlist label configs

    This makes three changes

    1. Switches the vendored version of memberlist to a branch in our fork where we fixed the skip-inbound-label-check (PR)
    2. Exposes the relevant options to allow users of dskit to configure the memberlist label feature
    3. Adds unit tests which test 4 configurations of memberlist clusters:
      1. TestMultipleClients Cluster with no labels and skip-inbound-label-check disabled, expected to succeed in joining
      2. TestMultipleClientsWithMixedLabelsAndExpectFailure Cluster where some members have labels and some don't and skip-inbound-label-check is disabled, this cluster is expected to fail because the members can't join each other
      3. TestMultipleClientsWithMixedLabelsAndSkipLabelCheck Cluster where some members have labels and some don't and skip-inbound-label-check is enabled, this cluster is expected to succeed in joining
      4. TestMultipleClientsWithSameLabelWithoutSkipLabelCheck Cluster where all members have the same label and skip-inbound-label-check is disabled, this cluster is expected to succeed in joining

    The above unit tests basically test a migration scenario where we migrate an existing cluster which currently doesn't use labels to use labels by going through the following steps:

    1. Enable skip-inbound-label-check so that members with different labels (some without any label) can join each other
    2. Roll out a label to all processes, since this change gets rolled out slowly across the pods it is important that pods with different labels can join each other into one cluster
    3. Disable skip-inbound-label-check

    Furthermore the unit tests also verify that if skip-inbound-label-check is disabled, processes which have different labels cannot join each other, providing isolation between the processes.

    opened by replay 5
  • Change moduleService to implement NamedService

    Change moduleService to implement NamedService

    What this PR does:

    The moduleService has a name, and it's trivial to implement ServiceName method.

    I came across this while trying to add debug information about running services in mimir

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [ ] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by dimitarvdimitrov 5
  • Watch runtime configuration with file notifications

    Watch runtime configuration with file notifications

    What this PR does:

    Introduces a new way to detect changes in the runtime configuration file using file notifications. The underlying library used, fsnotify, supports multiple platforms, but I've written this largely with Linux's inotify in mind.

    After initialization, instead of continuously reading a file every reload period, reads are not performed until a change actually happens. Runtime configuration changes are often rare and initiated manually, so it wouldn't be uncommon for no notifications to happen at all. From a thread perspective, an OS thread is blocked on an epoll wait system call with an infinite timeout waiting for event notifications from inotify. If an event is detected on the runtime configuration file a read loop is initiated until a successful read occurs, then it reverts to waiting for another notification.

    The benefit of these changes is reducing unnecessary CPU work and unnecessary IOPS. It could allow for users to set a shorter reload period duration if they were unwilling to pay the cost of many background reads previously.

    The downside of these changes is the complexity. The kernel has to know about the file changes, so a file stored in NFS or FUSE can deliver no notifications at all. The notification watch is also based on an inode rather than a name, so the directory containing the runtime configuration file is watched rather than the file itself to catch replacements. This leaves the possibility that the directory itself could be replaced, or even mounted against, both of which are currently unhandled. If the runtime configuration file is placed in a busy directory, this could also result in even more CPU work than polling. I also can't speak to the performance or behavior of other platform implementations.

    I'm testing these changes still, but wanted to open this draft PR early for feedback given the tradeoffs. My thoughts are that this could be interesting if there is a known workflow for runtime configuration updates and this is tested with that workflow beforehand. Otherwise, I wouldn't recommend this be used.

    Which issue(s) this PR fixes: N/A

    Checklist

    • [x] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    enhancement 
    opened by andyasp 5
  • autodetect interfaces on private networks

    autodetect interfaces on private networks

    What this PR does: This PR adds a new package netutil which includes a function PrivateNetworkInterfaces that scans all system network interfaces and return those that are on a private network. This will make some configuration steps easier, especially for systems that use consistent network device naming.

    Which issue(s) this PR fixes:

    This is similar to PR 100 but takes advantage of the new IsPrivate function in Go 1.17 and includes unit tests.

    Fixes #

    Checklist

    • [X] Tests updated
    • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by aldernero 5
  • runtutil: add ExhaustCloseWithErrCapture

    runtutil: add ExhaustCloseWithErrCapture

    Code from https://github.com/thanos-io/thanos/blob/6d1b98d49781/pkg/runutil/, slightly modified for consistency.

    This enables grafana/e2e to build without a dependency on Thanos - https://github.com/grafana/e2e/pull/4

    Checklist

    • [x] Tests updated
    • NA CHANGELOG.md updated
    opened by bboreham 1
  • Add net.LookipIP DNS provider implementation

    Add net.LookipIP DNS provider implementation

    As we are looking forward to using dskit inside Grafana for HA, we would require an implementation for the kv/memberlist.DNSProvider. Currently it looks like the interface closely matches the API from Thanos, but we would need something lighter, possibly built over net.LookupIP. This PR introduces the "builtin" DNSProvider implementation and a basic test to make sure it works.

    opened by zserge 1
  • Ring members return as unhealthy and stuck the cluster

    Ring members return as unhealthy and stuck the cluster

    Problem

    On Loki (and apparently in other projects) we're facing this weird scenario where ring members return as unhealthy and once this happens to more than one member, the ring gets stuck (maybe because we're running with replication_factor = 3? :thinking: )

    What we know so far is:

    • Clicking manually to forget the unhealthy members does fix the issue
    • I don't think we ever faced this with typical healthy rollouts; I think all occasions it did happen were when pods were deleted or OOMKilled or the k8s node hosting the pods had an outage, so maybe it has a relationship with our heartbeat logic :thinking:
    • For Loki, this happens for both, ingesters members and distributors members, so it isn't related to a specific implementation of the project
    • The typical log line we use to identify this behavior is the one below (for Loki):
    level=warn ts=2022-07-28T17:32:28.308736477Z caller=grpc_logging.go:43 method=/httpgrpc.HTTP/Handle duration=179.426µs err="rpc error: code = Code(500) desc = at least 2 live replicas required, could only find 1 - unhealthy instances: x.x.x.x:9095,y.y.y.y:9095\n" msg=gRPC
    

    This is probably not enough detail to track down what is wrong, so if this happens again I'll make sure to grab a memory dump and other things that might help.

    bug 
    opened by DylanGuedes 0
  • Add error class tracking to ring.DoBatch

    Add error class tracking to ring.DoBatch

    What this PR does:

    Adapts code from https://github.com/cortexproject/cortex/pull/4388 by @alanprot. Thank you @alanprot!

    DoBatch now tracks errors based on their grpc error code. 4xx and 5xx errors are tracked differently. DoBatch returns when there is a quorum of either 4xx or 5xx errors, but does not combine them.

    Examples (order does not matter):

    • 2xx, 2xx, _ -> 2xx (current behaviour; early return)
    • 4xx, 4xx, _ -> 4xx (current behaviour; early return)
    • 5xx, 5xx, _ -> 5xx (current behaviour; early return)
    • 5xx, 5xx, 4xx -> 5xx (previously whichever error came first)
    • 5xx, 4xx, 4xx -> 4xx (previously whichever error came first)
    • 2xx, 4xx, 5xx -> either 4xx or 5xx, whichever error came last (current behaviour)

    Which issue(s) this PR fixes:

    Fixes #

    Checklist

    • [x] Tests updated
    • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
    opened by dimitarvdimitrov 0
  • Inefficient TCP connections use by memberlist transport

    Inefficient TCP connections use by memberlist transport

    We use a custom transport for memberlist, based on TCP protocol. The main reason why we use TCP is being able to transfer messages which are bigger than the maximum payload of an UDP packet (typically, slightly less than 64KB).

    Currently, the TCP transport is implemented in an inefficient way with regards to TCP connection establishment. For every single packet a node needs to transfer to another node, the implementations creates a new TCP connection, writes the packet and then close the connection. See: https://github.com/grafana/dskit/blob/ead3f9308bb7b413ce997182dd4d7c6e038bc68f/kv/memberlist/tcp_transport.go#L438

    We should consider alternatives like:

    • Pros/cons of keeping long-lived TCP connections between nodes, and multiplexing multiple packets over the same connection
    • Using a mix of UDP and TCP, selecting the protocol based on the message size (in this case, TLS support wouldn't be available)
    component/memberlist 
    opened by pracucci 2
Owner
Grafana Labs
Grafana Labs is behind leading open source projects Grafana and Loki, and the creator of the first open & composable observability platform.
Grafana Labs
An experimental library for building clustered services in Go

Donut is a library for building clustered applications in Go. Example package main import ( "context" "log" "os" // Wait for etcd client v3.4, t

David Forsythe 97 Jul 13, 2021
The repository aims to share some useful about distributed system

The repository aims to share some useful about distributed system

小新爱上大象 3 Dec 14, 2021
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Dapr 18.8k Aug 18, 2022
Skynet is a framework for distributed services in Go.

##Introduction Skynet is a communication protocol for building massively distributed apps in Go. It is not constrained to Go, so it will lend itself n

null 2k Jul 25, 2022
A distributed, proof of stake blockchain designed for the financial services industry.

Provenance Blockchain Provenance is a distributed, proof of stake blockchain designed for the financial services industry.

Provenance Blockchain, Inc. 58 Aug 5, 2022
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Sergey Shepelev 25 Dec 24, 2019
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

etcd-io 40.8k Aug 13, 2022
Full-featured BitTorrent client package and utilities

torrent This repository implements BitTorrent-related packages and command-line utilities in Go. The emphasis is on use as a library from other projec

Matt Joiner 4.5k Aug 16, 2022
AppsFlyer 486 Aug 14, 2022
A Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services

sleuth sleuth is a Go library that provides master-less peer-to-peer autodiscovery and RPC between HTTP services that reside on the same network. It w

null 355 Aug 15, 2022
Lockgate is a cross-platform locking library for Go with distributed locks using Kubernetes or lockgate HTTP lock server as well as the OS file locks support.

Lockgate Lockgate is a locking library for Go. Classical interface: 2 types of locks: shared and exclusive; 2 modes of locking: blocking and non-block

werf 229 Jul 18, 2022
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

null 21 Jan 4, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

null 44 Jul 19, 2022
Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Richard Bertok 58 May 29, 2022
distributed data sync with operational transformation/transforms

DOT The DOT project is a blend of operational transformation, CmRDT, persistent/immutable datastructures and reactive stream processing. This is an im

DOT & Chain 72 Aug 14, 2022
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

emitter 3.3k Aug 7, 2022
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Aug 11, 2022
Go Micro is a framework for distributed systems development

Go Micro Go Micro is a framework for distributed systems development. Overview Go Micro provides the core requirements for distributed systems develop

Asim Aslam 18.8k Aug 8, 2022