Golang implementation of the Raft consensus protocol

Related tags

raft
Overview

raft CircleCI

raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing consensus.

The use cases for such a library are far-reaching, such as replicated state machines which are a key component of many distributed systems. They enable building Consistent, Partition Tolerant (CP) systems, with limited fault tolerance as well.

Building

If you wish to build raft you'll need Go version 1.2+ installed.

Please check your installation with:

go version

Documentation

For complete documentation, see the associated Godoc.

To prevent complications with cgo, the primary backend MDBStore is in a separate repository, called raft-mdb. That is the recommended implementation for the LogStore and StableStore.

A pure Go backend using BoltDB is also available called raft-boltdb. It can also be used as a LogStore and StableStore.

Tagged Releases

As of September 2017, HashiCorp will start using tags for this library to clearly indicate major version updates. We recommend you vendor your application's dependency on this library.

  • v0.1.0 is the original stable version of the library that was in master and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0.

  • v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces.

    This version includes some new features as well, including non voting servers, a new address provider abstraction in the transport layer, and more resilient snapshots.

Protocol

raft is based on "Raft: In Search of an Understandable Consensus Algorithm"

A high level overview of the Raft protocol is described below, but for details please read the full Raft paper followed by the raft source. Any questions about the raft protocol should be sent to the raft-dev mailing list.

Protocol Description

Raft nodes are always in one of three states: follower, candidate or leader. All nodes initially start out as a follower. In this state, nodes can accept log entries from a leader and cast votes. If no entries are received for some time, nodes self-promote to the candidate state. In the candidate state nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers. In addition, if stale reads are not acceptable, all queries must also be performed on the leader.

Once a cluster has a leader, it is able to accept new log entries. A client can request that a leader append a new log entry, which is an opaque binary blob to Raft. The leader then writes the entry to durable storage and attempts to replicate to a quorum of followers. Once the log entry is considered committed, it can be applied to a finite state machine. The finite state machine is application specific, and is implemented using an interface.

An obvious question relates to the unbounded nature of a replicated log. Raft provides a mechanism by which the current state is snapshotted, and the log is compacted. Because of the FSM abstraction, restoring the state of the FSM must result in the same state as a replay of old logs. This allows Raft to capture the FSM state at a point in time, and then remove all the logs that were used to reach that state. This is performed automatically without user intervention, and prevents unbounded disk usage as well as minimizing time spent replaying logs.

Lastly, there is the issue of updating the peer set when new servers are joining or existing servers are leaving. As long as a quorum of nodes is available, this is not an issue as Raft provides mechanisms to dynamically update the peer set. If a quorum of nodes is unavailable, then this becomes a very challenging issue. For example, suppose there are only 2 peers, A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum. This means the cluster is unable to add, or remove a node, or commit any additional log entries. This results in unavailability. At this point, manual intervention would be required to remove either A or B, and to restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster of 5 can tolerate 2 node failures. The recommended configuration is to either run 3 or 5 raft servers. This maximizes availability without greatly sacrificing performance.

In terms of performance, Raft is comparable to Paxos. Assuming stable leadership, committing a log entry requires a single round trip to half of the cluster. Thus performance is bound by disk I/O and network latency.

Issues
  • Rework goroutines and synchronization

    Rework goroutines and synchronization

    Today, the division of work and the synchronization between goroutines gets to be hard to follow in places. I think we can do better, to make the library more maintainable and eliminate potential race conditions from accidentally shared state. Ideally, it'll become more unit testable too.

    This commit includes a diagram and description of where I think we should go. I'm open to feedback on it. Some of it's probably underspecified, with details to be determined as we implement more; questions are fair game too.

    I held back on subdividing the main Raft module into a nonblocking goroutine and blocking helpers, but it's something we could consider. I haven't studied the code enough to know whether that'd be feasible or advantageous.

    The transition from here to there is going to take significant effort. Here are a few of the major differences:

    • Peer is structured completely differently from replication.go today.
    • Peer handles all communication including RequestVote, not just AppendEntries/InstallSnapshot as replication.go does today.
    • Fewer locks and shared state. commitment.go and raftstate.go remove locking/atomics, possibly merge into raft.go. Other goroutines don't get a handle to the Raft module's state.
    • Snapshots are created through a different flow.

    I started on the replication.go/peer.go changes, but it was before I had a good idea of where things were heading. I'll be happy to pick that up again later.

    /cc @superfell @cstlee @bmizerany @kr @slackpad @sean- hashicorp/raft#84

    opened by ongardie-sfdc 46
  • Cleanup Meta Ticket

    Cleanup Meta Ticket

    Here are the list of issues, grouped together if they might make sense as a single PR.

    State Races

    • [x] Raft state should use locks for any fields that are accessed together (e.g. index and term)

    Multi-Row Fetching

    • [ ] ~~Replace single row lookups with multi row lookups (LogStore / LogCache) (look at cases around log truncation)~~
    • [x] Verify the current term has not changed when preparing/processing the AppendEntries message #136

    Follower Replication:

    • [x] replicateTo should verify leadership is current during looping
    • [x] Check for any hot loops that do not break on stopCh

    Change Inflight Tracking

    • [x] Remove majorityQuorum
    • [x] Inflight tracker should map Node -> Last Commit Index (match index)
    • [x] Votes should be ignored from peers that are not part of peer set
    • [x] precommit may not be necessary with new inflight (likely will be cleaned up via #117)

    Improve Membership Tracking

    • [x] Peer changes should have separate channel and do not pipeline (we don't want more than one peer change in flight at a time) #117
    • [x] Peers.json should track index and any AddPeer or RemovePeer are ignored from older indexes - #117

    Crashes / Restart Issues

    • [ ] ~~Panic with old snapshots #85~~
    • [ ] ~~TrailingLogs set to 0 with restart bug #86~~

    New Tests

    • [x] Config change under loss of quorum: #127
    • Setup cluster with {A, B, C, D}
    • Assume leader is A
    • Partition {C, D}
    • Remove {B}
    • Test should fail to remove B (quorum cannot be reached)

    /cc: @superfell @ongardie @sean- @ryanuber @slackpad

    opened by armon 23
  • Adds in-place upgrade and manual recovery support.

    Adds in-place upgrade and manual recovery support.

    This adds several important capabilities to help in upgrading to the new Raft protocol version:

    1. We can migrate an existing peers.json file, which is sometimes the source of truth for the old version of the library before this support was moved to be fully in snapshots + raft log as the official source.
    2. If we are using protocol version 0 where we don't support server IDs, operators can continue to use peers.json as an interface to manually recover from a loss of quorum.
    3. We left ourselves open for a more full-featured recovery manager by giving a new RecoverCluster interface access to a complete Configuration object to consume. This will allow us to manually pick which server is a voter for manual elections (set 1 to a voter and the rest to nonvoters, the 1 voter will elect itself), as well as basically any other configuration we want to set.

    This also gives a path for introducing Raft servers running the new version of the library into a cluster running the old code. Things would work like this:

    // These are the versions of the protocol (which includes RPC messages as
    // well as Raft-specific log entries) that this server can _understand_. Use
    // the ProtocolVersion member of the Config object to control the version of
    // the protocol to use when _speaking_ to other servers. This is not currently
    // written into snapshots so they are unversioned. Note that depending on the
    // protocol version being spoken, some otherwise understood RPC messages may be
    // refused. See isVersionCompatible for details of this logic.
    //
    // There are notes about the upgrade path in the description of the versions
    // below. If you are starting a fresh cluster then there's no reason not to
    // jump right to the latest protocol version. If you need to interoperate with
    // older, version 0 Raft servers you'll need to drive the cluster through the
    // different versions in order.
    //
    // The version details are complicated, but here's a summary of what's required
    // to get from an version 0 cluster to version 3:
    //
    // 1. In version N of your app that starts using the new Raft library with
    //    versioning, set ProtocolVersion to 1.
    // 2. Make version N+1 of your app require version N as a prerequisite (all
    //    servers must be upgraded). For version N+1 of your app set ProtocolVersion
    //    to 2.
    // 3. Similarly, make version N+2 of your app require version N+1 as a
    //    prerequisite. For version N+2 of your app, set ProtocolVersion to 3.
    //
    // During this upgrade, older cluster members will still have Server IDs equal
    // to their network addresses. To upgrade an older member and give it an ID, it
    // needs to leave the cluster and re-enter:
    //
    // 1. Remove the server from the cluster with RemoveServer, using its network
    //    address as its ServerID.
    // 2. Update the server's config to a better ID (restarting the server).
    // 3. Add the server back to the cluster with AddVoter, using its new ID.
    //
    // You can do this during the rolling upgrade from N+1 to N+2 of your app, or
    // as a rolling change at any time after the upgrade.
    //
    // Version History
    //
    // 0: Original Raft library before versioning was added. Servers running this
    //    version of the Raft library use AddPeerDeprecated/RemovePeerDeprecated
    //    for all configuration changes, and have no support for LogConfiguration.
    // 1: First versioned protocol, used to interoperate with old servers, and begin
    //    the migration path to newer versions of the protocol. Under this version
    //    all configuration changes are propagated using the now-deprecated
    //    RemovePeerDeprecated Raft log entry. This means that server IDs are always
    //    set to be the same as the server addresses (since the old log entry type
    //    cannot transmit an ID), and only AddPeer/RemovePeer APIs are supported.
    //    Servers running this version of the protocol can understand the new
    //    LogConfiguration Raft log entry but will never generate one so they can
    //    remain compatible with version 0 Raft servers in the cluster.
    // 2: Transitional protocol used when migrating an existing cluster to the new
    //    server ID system. Server IDs are still set to be the same as server
    //    addresses, but all configuration changes are propagated using the new
    //    LogConfiguration Raft log entry type, which can carry full ID information.
    //    This version supports the old AddPeer/RemovePeer APIs as well as the new
    //    ID-based AddVoter/RemoveServer APIs which should be used when adding
    //    version 3 servers to the cluster later. This version sheds all
    //    interoperability with version 0 servers, but can interoperate with newer
    //    Raft servers running with protocol version 1 since they can understand the
    //    new LogConfiguration Raft log entry, and this version can still understand
    //    their RemovePeerDeprecated Raft log entries. We need this protocol version
    //    as an intermediate step between 1 and 3 so that servers will propagate the
    //    ID information that will come from newly-added (or -rolled) servers using
    //    protocol version 3, but since they are still using their address-based IDs
    //    from the previous step they will still be able to track commitments and
    //    their own voting status properly. If we skipped this step, servers would
    //    be started with their new IDs, but they wouldn't see themselves in the old
    //    address-based configuration, so none of the servers would think they had a
    //    vote.
    // 3: Protocol adding full support for server IDs and new ID-based server APIs
    //    (AddVoter, AddNonvoter, etc.), old AddPeer/RemovePeer APIs are no longer
    //    supported. Version 2 servers should be swapped out by removing them from
    //    the cluster one-by-one and re-adding them with updated configuration for
    //    this protocol version, along with their server ID. The remove/add cycle
    //    is required to populate their server ID. Note that removing must be done
    //    by ID, which will be the old server's address.
    
    // These are versions of snapshots that this server can _understand_. Currently,
    // it is always assumed that this server generates the latest version, though
    // this may be changed in the future to include a configurable version.                                                                                                              //
    // Version History
    //
    // 0: Original Raft library before versioning was added. The peers portion of
    //    these snapshots is encoded in the legacy format which requires decodePeers
    //    to parse. This version of snapshots should only be produced by the
    //    unversioned Raft library.
    // 1: New format which adds support for a full configuration structure and its
    //    associated log index, with support for server IDs and non-voting server
    //    modes. To ease upgrades, this also includes the legacy peers structure but
    //    that will never be used by servers that understand version 1 snapshots.
    //    Since the original Raft library didn't enforce any versioning, we must
    //    include the legacy peers structure for this version, but we can deprecate
    //    it in the next snapshot version.
    

    This isn't super great, but will give us a path to keep things compatible with existing clusters as we roll out the changes. We can make some higher-level tooling in Consul to help orchestrate this.

    opened by slackpad 21
  • [v2] Rejecting vote request... since we have a leader

    [v2] Rejecting vote request... since we have a leader

    I am using the v2-stage-one branch and while everything seems to work fine for the most part, I do have one issue:

    I have a cluster of 3 nodes. I take one node down gracefully (used consul as an example of leave/shutdown logic, and waiting for changes to propagate) and the cluster maintains itself at 2 nodes. If I then, however, try to restart the same node (with any combination of ServerID and addr:port), the new node sits there and requests a leader vote forever, with the other two nodes logging [WARN] raft: Rejecting vote request from ... since we already have a leader

    I used Consul as an example of the implementation, fwiw.

    waiting-reply 
    opened by jonbonazza 19
  • Abstract Logging behind an interface

    Abstract Logging behind an interface

    Which’ll make it easier for consumers of the library to plug in their logging framework of choice. Addresses issue #141

    opened by superfell 15
  • Discussion: reworking membership changes

    Discussion: reworking membership changes

    This is meant for discussion and is not meant to be merged. See the text in membership.md.

    opened by ongardie 14
  • buffer applyCh with up to conf.MaxAppendEntries

    buffer applyCh with up to conf.MaxAppendEntries

    This change improves throughput in busy Raft clusters. By buffering messages, individual RPCs contain more Raft messages. In my tests, this improves throughput from about 4.5 kqps to about 5.5 kqps.

    As-is: n1-standard-8-c8deaa9d333f69fb56c8935036e7ca5c-no-buffer

    With my change: n1-standard-8-f2fed4ebe05df23eb322c805b503a144

    (Both tests were performed with 3 n1-standard-8 nodes on Google Compute Engine in the europe-west1-d region.)

    thinking 
    opened by stapelberg 13
  • Library needs more active maintenance.

    Library needs more active maintenance.

    bunch of useful PR's are just sitting unmerged with no comments from maintainers about when they'll be merged.

    its gotten to the point I've been considering having to fork the library and merge them manually to get the benefit of everyones work.

    opened by james-lawrence 11
  • add FileStore (like InmemStore, but persistent)

    add FileStore (like InmemStore, but persistent)

    This is a trivial implementation of a persistent logstore. It mostly follows InmemStore, except it writes to files (one file per log entry, one file per stablestore key).

    As you write in the README, it may be desirable to avoid the cgo dependency which raft-mdb brings, which is why I wrote this ;).

    With regards to stability, I’m running this code in a pet project of mine without noticing any issues. Additionally, I’ve swapped the InmemStore for my FileStore in all *_test.go files in the raft repository and successfully ran the tests.

    With regards to performance, I’ve run a couple of benchmarks (end-to-end messages/s in the actual application I’m working on) using raft-mdb vs. this FileStore, with various little tweaks:

    All tests were run by measuring the messages/s over 13 runs, then averaging the results. The underlying storage is an INTEL SSDSC2BP48 (480G consumer SSD).

    raft-mdb+sync+cache:   557 msgs/s (recommended backend)
    filestore:             736 msgs/s (no fsync!)
    filestore+cache:      1131 msgs/s (no fsync!)
    filestore+sync:        418 msgs/s
    filestore+sync+cache:  516 msgs/s
    filestore+rename:      718 msgs/s (!)
    

    filestore+cache is FileStore, but with a proof-of-concept cache: every log entry is kept in memory and GetLog() just copies the log entry instead of reading from disk. This is obviously much faster, but a real cache would need to be developed. I’m thinking keeping config.MaxAppendEntries in a ringbuffer should fit the access pattern quite well.

    filestore+sync is FileStore, but with defer f.Sync() after defer f.Close() in StoreLogs().

    filestore+sync+cache is the combination of both of the above.

    filestore+rename is FileStore, but writing into a temporary file which is then renamed to its final path. I’m curious to hear what you have to say about that. The semantics this code guarantees are that a log entry is either fully present or not present at all (in case of power loss for example). AIUI, the raft protocol should be able to cope with this situation. With regards to performance, this outperforms the current raft-mdb (1.2x).

    Perhaps it would even make sense to replace InmemStore by FileStore entirely — you’re saying InmemStore should only be used for unit tests, and FileStore can do that, too. That way, people wouldn’t even have the chance to abuse InmemStore and use it in production :).

    But that can wait for follow-up commits. For now, I’m mostly interested to see whether you’d want to merge this? I don’t think every project (which doesn’t want to use raft-mdb) should need to implement this independently :).

    opened by stapelberg 10
  • Fixes races with Raft configurations member, adds GetConfiguration accessor.

    Fixes races with Raft configurations member, adds GetConfiguration accessor.

    This adds GetConfiguration() which is item 7 from https://github.com/hashicorp/raft/issues/84#issuecomment-228928110. This also resolves item 12.

    While adding that I realized that we should move the "no snapshots during config changes" outside the FSM snapshot thread to fix racy access to the configurations and also because that's not really a concern of the FSM. That thread just exists to let the FSM do its thing, but the main snapshot thread is the one that should know about the configuration.

    ~~To add an external GetConfiguration() API I added an RW mutex which I'm not super thrilled about, but it lets us correctly manipulate this state from the outside, and among the different Raft threads.~~

    opened by slackpad 10
  • Add retract directive for v1.1.3

    Add retract directive for v1.1.3

    We had issues go get-ing [email protected] due to checksum mismatch. It turns out we created tag v1.1.3, fetched it as a module, then deleted the tag and recreated it on GitHub, which violates the assumption that modules are immutable (once fetched from Go's checksum proxies).

    This PR ensures that users will not accidentally add v1.1.3 as a dependency.

    opened by kisunji 1
  • node not part of the cluster is not allowed to vote

    node not part of the cluster is not allowed to vote

    When a follower is removed from the cluster it's removed from the configuration servers list but still run a follower loop (originally a follower node or ShutdownOnRemove set to false for Leader). When that node timeout (because it's not receiving heartbeat from the other nodes in the cluster), it transition to a Candidate state and try to trigger an election. 2 possible cases:

    • the cluster is stable and have a leader: the vote request will be rejected
    • the cluster is unstable and don't have a leader: the vote is granted and the removed node will be part of the voting nodes.

    This uncover 2 issue:

    • a removed node should not transition to a Candidate state. This is fixed in #476
    • a node that is not part of the cluster should not be allowed to vote: this a bit trickier to fix because the RequestVoteRequest only have the ServerAddress which do not represent a unique ID of the node, the one stored in the configuration server list. A test that reproduce the issue is in this PR.

    To keep track of things in this PR here is a list of what I intend to achieve:

    • [x] Add ID and Addr field in RPCHeader
    • [x] Transition from using Candidate and Leader fields in Protocol Version 3 to using Addr (part of RPC header) field in protocol version 4
    • [x] Use ID to validate that the node that is triggering an election is part of the latest configuration
    • [x] Expose the Leader ID field as part of the API to be used in consul
    opened by dhiaayachi 3
  • Pre-vote campaign

    Pre-vote campaign

    The following attempts to implement a pre-voting campaign optimization.

    One downside of Raft’s leader election algorithm is that a server that has been partitioned from the cluster is likely to cause a disruption when it regains connectivity. When a server is partitioned, it will not receive heartbeats. It will soon increment its term to start an election, although it won’t be able to collect enough votes to become leader. When the server regains connectivity sometime later, its larger term number will propagate to the rest of the cluster (either through the server’s RequestVote requests or through its AppendEntries response). This will force the cluster leader to step down, and a new election will have to take place to select a new leader. Fortunately, such events are likely to be rare, and each will only cause one leader to step down.

    opened by SimonRichardson 1
  • msgpack dependencies

    msgpack dependencies

    This is more of a potential problem that's up and coming, rather than a problem right now (msgpack v1.1.5 tag and mspack v0.5.5 as seen https://github.com/hashicorp/go-msgpack/tags). The upstream msgpack has changed how time was encoded/decoded and the workaround was provided.

    I suspect either two options, decode the data with TimeNotBuiltin set and then encode it without, allowing a stable migration or it might be prudent to expose TimeNotBuiltin in the raft configuration (under another name LegacyEncoding or similar), so this can be changed for users with existing data.

    opened by SimonRichardson 4
  • Leadership election stuck in 3 node cluster in Candidate state.

    Leadership election stuck in 3 node cluster in Candidate state.

    I had 3 node clusters. One node died suddenly. Expected one of the node to become the leader. But seeing the logs. It's getting a vote from itself and asking from the dead node. But not from the other live server.

    {"level":"info","ts":"2021-09-06T10:31:34.688Z","caller":"go-hclog/stdlog.go:82","msg":"[INFO]  duplicate requestVote for same term: term=1222614"}
    {"level":"info","ts":"2021-09-06T10:31:35.307Z","caller":"go-hclog/stdlog.go:82","msg":"[WARN]  Election timeout reached, restarting election"}
    {"level":"info","ts":"2021-09-06T10:31:35.307Z","caller":"go-hclog/stdlog.go:82","msg":"[INFO]  entering candidate state: node=\"Node at serveraddr [Candidate]\" term=1222615"}
    {"level":"info","ts":"2021-09-06T10:31:35.321Z","caller":"go-hclog/stdlog.go:82","msg":"[DEBUG] votes: needed=2"}
    {"level":"info","ts":"2021-09-06T10:31:35.321Z","caller":"go-hclog/stdlog.go:82","msg":"[DEBUG] vote granted: from=**Same Server** term=1222615 tally=1"}
    {"level":"info","ts":"2021-09-06T10:31:35.322Z","caller":"go-hclog/stdlog.go:82","msg":"[ERROR] failed to make requestVote RPC: target=\"{Voter **Dead server** }\" error=\"dial tcp **dead server**: connect: connection refused\""}
    {"level":"info","ts":"2021-09-06T10:31:36.510Z","caller":"go-hclog/stdlog.go:82","msg":"[INFO]  duplicate requestVote for same term: term=1222615"}
    {"level":"info","ts":"2021-09-06T10:31:37.065Z","caller":"go-hclog/stdlog.go:82","msg":"[WARN]  Election timeout reached, restarting election"}
    

    The Raft configuration from one of the servers is not listing the other live server.

    {
        "details": {
            "applied_index": "151938",
            "commit_index": "151938",
            "fsm_pending": "0",
            "last_contact": "462h40m13.433454323s",
            "last_log_index": "152647",
            "last_log_term": "29674",
            "last_snapshot_index": "147497",
            "last_snapshot_term": "29669",
            "latest_configuration": "[{Suffrage:Voter ID:ID237:6000 Address:.237:6000} {Suffrage:Voter ID:ID241:6000 Address:.241:6000}]",
            "latest_configuration_index": "0",
            "num_peers": "1",
            "protocol_version": "3",
            "protocol_version_max": "3",
            "protocol_version_min": "0",
            "snapshot_version_max": "1",
            "snapshot_version_min": "0",
            "state": "Candidate",
            "term": "1222401"
        }
    }
    

    Another Server.

    {
        "details": {
            "applied_index": "151938",
            "commit_index": "151938",
            "fsm_pending": "0",
            "last_contact": "467h14m2.550357442s",
            "last_log_index": "151938",
            "last_log_term": "29674",
            "last_snapshot_index": "147487",
            "last_snapshot_term": "29669",
            "latest_configuration": "[{Suffrage:Voter ID:ID236:6000 Address:.236:6000} {Suffrage:Voter ID:ID237:6000 Address:.237:6000} {Suffrage:Voter ID:ID241:6000 Address:.241:6000}]",
            "latest_configuration_index": "0",
            "num_peers": "2",
            "protocol_version": "3",
            "protocol_version_max": "3",
            "protocol_version_min": "0",
            "snapshot_version_max": "1",
            "snapshot_version_min": "0",
            "state": "Candidate",
            "term": "1223003"
        }
    }
    

    Library version used. github.com/hashicorp/raft v1.3.1 Default config is taken from library default config.

    opened by ankur-anand 3
  • Make Raft.LeaderCh() return a new channel for each invocation

    Make Raft.LeaderCh() return a new channel for each invocation

    .. and immediately send the current leadership state over the channel.

    This way it can be used by multiple pieces of code with disrupting another.

    Sending the value immediately avoids strange race conditions when RaftState gets updated at a slightly different moment.

    fixes #426

    thinking 
    opened by Jille 5
  • Raft client reference implementation

    Raft client reference implementation

    Hello 😄 This is kind of related to #128, I have been working on an implementation of a cluster using this library and I'm now facing a difficulty: there is nowhere an example or an explanation of how to connect remotely to a cluster (either to the Leader or Follower) and send commands to it. It should be possible, since the cluster members are talking to each other with Transport, but it's not so immediate to build. Would it be ok to have an RPCClient reference implementation or some documentation on how to connect and send commands to a running Raft cluster? I dug up a bit into Consul codebase ad could not really find it... Thank you

    enhancement docs 
    opened by inge4pres 5
  • Document Flaky Tests

    Document Flaky Tests

    We have tests that don't always pass. We are hoping we can document and gather information around the flakiness that is in our testing suite. We would like to start incrementally improving our tests!

    Please feel free to post information around a failing/flaky test you've been experiencing.

    Replication steps: go test ./... or with gotestsum: gotestsum --format=short-verbose --junitfile $TEST_RESULTS_DIR/$reportname.xml -- -tags=$GOTAGS $pkg

    Please provide: Test Name, Output, and Replication Steps.

    === Errors
    fuzzy/node.go:24:26: undefined: log
    
    

    Replications steps: gotestsum --format=short-verbose --junitfile=reportname.xml

    bug flaky-test 
    opened by schristoff 3
  • Reduce snapshot disk I/O

    Reduce snapshot disk I/O

    In our project QED the FSM persists the data on disk. On high loads this is very disk intensive task. It would be great to be able to take snapshots on-demand instead of doing at recurrent intervals.

    Also we'd like to stream the snapshot directly to the nodes instead of waiting to be written to disk first and then send it over the network.

    Are they any plans to add this functionality?

    enhancement thinking 
    opened by suizman 5
  • Race between shutdownCh and consumerCh in inmemPipeline.AppendEntries

    Race between shutdownCh and consumerCh in inmemPipeline.AppendEntries

    This select statement in inmemPipeline.AppendEntries:

    	select {
    	case i.peer.consumerCh <- rpc:
    	case <-timeout:
    		return nil, fmt.Errorf("command enqueue timeout")
    	case <-i.shutdownCh:
    		return nil, ErrPipelineShutdown
    	}
    

    can occasionally choose to the i.peer.consumerCh even if shutdownCh has been closed, resulting in the RPC succeeding even if user code had previously called InmemTransport.Close(), which is the one closing shutdownCh.

    This behavior is both unexpected and inconsistent with a real-world NetworkTransport, where Close() would close the network listener and cause the pipeline to not deliver any RPC from that point on.

    I spotted this in unit tests that were using InmemTransport. I'm attaching a sample program below which reproduces the problem, just run it in a loop and it will eventually fail.

    Note however that you'll need to run the sample program against the branch linked to #274, not master, otherwise you'll randomly hit either the race described in this issue XOR the other one described in #273 (which is what #274 fixes).

    bug flaky-test 
    opened by freeekanayaka 1
Releases(v1.2.1)
Owner
HashiCorp
Consistent workflows to provision, secure, connect, and run any infrastructure for any application.
HashiCorp
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 3.9k Oct 23, 2021
The lightweight, distributed relational database built on SQLite

rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it grace

rqlite 9k Oct 21, 2021
A distributed fault-tolerant order book matching engine

Go - Between A distributed fault-tolerant order book matching engine. Features Limit orders Market orders Order book depth Calculate market price for

Daniel Gatis 5 Sep 15, 2021
⟁ Tendermint Core (BFT Consensus) in Go

Tendermint Byzantine-Fault Tolerant State Machines. Or Blockchain, for short. Branch Tests Coverage Linting master Tendermint Core is Byzantine Fault

Tendermint 4.4k Oct 20, 2021
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

etcd-io 37.6k Oct 23, 2021
Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)

Jocko Kafka/distributed commit log service in Go. Goals of this project: Implement Kafka in Go Protocol compatible with Kafka so Kafka clients and ser

Travis Jeffery 4.4k Oct 21, 2021
A distributed MySQL binlog storage system built on Raft

What is kingbus? 中文 Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Fei Chen 825 Oct 15, 2021
Elastic Key-Value Storage With Strong Consistency and Reliability

What is Elasticell? Elasticell is a distributed NoSQL database with strong consistency and reliability. Compatible with Redis protocol Use Elasticell

Deep Fabric 483 Oct 14, 2021
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

null 19 Oct 15, 2021
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications. What is CockroachDB?

CockroachDB 22.3k Oct 18, 2021
Sandglass is a distributed, horizontally scalable, persistent, time sorted message queue.

Sandglass is a distributed, horizontally scalable, persistent, time ordered message queue. It was developed to support asynchronous tasks and message

Sandglass 1.5k Oct 3, 2021
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments Website: http://dkron.io/ Dkron is a distributed cron service,

Distributed Works 2.9k Oct 23, 2021
More effective network communication, two-way calling, notify and broadcast supported.

ARPC - More Effective Network Communication Contents ARPC - More Effective Network Communication Contents Features Performance Header Layout Installat

lesismal 385 Oct 21, 2021
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Sergey Shepelev 25 Dec 24, 2019
gathering distributed key-value datastores to become a cluster

go-ds-cluster gathering distributed key-value datastores to become a cluster About The Project This project is going to implement go-datastore in a fo

FileDrive Team 9 Oct 21, 2021
A distributed system for embedding-based retrieval

Overview Vearch is a scalable distributed system for efficient similarity search of deep learning vectors. Architecture Data Model space, documents, v

vector search infrastructure for AI applications 1.2k Oct 21, 2021
a Framework for creating microservices using technologies and design patterns of Erlang/OTP in Golang

Technologies and design patterns of Erlang/OTP have been proven over the years. Now in Golang. Up to x5 times faster than original Erlang/OTP in terms

Taras Halturin 1.1k Oct 17, 2021
A distributed, proof of stake blockchain designed for the financial services industry.

Provenance Blockchain Provenance is a distributed, proof of stake blockchain designed for the financial services industry.

Provenance Blockchain, Inc. 42 Oct 18, 2021
The pure golang implementation of nanomsg (version 1, frozen)

mangos NOTE: This is the legacy version of mangos (v1). Users are encouraged to use mangos v2 instead if possible. No further development is taking pl

nanomsg 1.5k Aug 27, 2021