Library for enabling asynchronous health checks in your service

Overview

LICENSE Build Status Maintainability Test Coverage Go Report Card Godocs

go-health

A library that enables async dependency health checking for services running on an orchestrated container platform such as kubernetes or mesos.

Why is this important?

Container orchestration platforms require that the underlying service(s) expose a "health check" which is used by the platform to determine whether the container is in a good or bad state.

While this can be achieved by simply exposing a /status endpoint that performs synchronous checks against its dependencies (followed by returning a 200 or non-200 status code), it is not optimal for a number of reasons:

  • It does not scale
    • The more dependencies you add, the longer your health check will take to complete (and potentially cause your service to be killed off by the orchestration platform).
    • Depending on the complexity of a given dependency, your check may be fairly involved where it is okay for it to take 30s+ to complete.
  • It adds unnecessary load on yours deps or at worst, becomes a DoS target
    • Non-malicious scenario
      • Thundering herd problem -- in the event of a deployment (or restart, etc.), all of your service containers are likely to have their /status endpoints checked by the orchestration platform as soon as they come up. Depending on the complexity of the checks, running that many simultaneous checks against your dependencies could cause at worst the dependencies to experience problems and at minimum add unnecessary load.
      • Security scanners -- if your organization runs periodic security scans, they may hit your /status endpoint and trigger unnecessary dep checks.
    • Malicious scenario
      • Loading up any basic HTTP benchmarking tool and pointing it at your /status endpoint could choke your dependencies (and potentially your service).

With that said, not everyone needs asynchronous checks. If your service has one dependency (and that is unlikely to change), it is trivial to write a basic, synchronous check and it will probably suffice.

However, if you anticipate that your service will have several dependencies, with varying degrees of complexity for determining their health state - you should probably think about introducing asynchronous health checks.

How does this library help?

Writing an async health checking framework for your service is not a trivial task, especially if Go is not your primary language.

This library:

  • Allows you to define how to check your dependencies.
  • Allows you to define warning and fatal thresholds.
  • Will run your dependency checks on a given interval, in the background. [1]
  • Exposes a way for you to gather the check results in a fast and thread-safe manner to help determine the final status of your /status endpoint. [2]
  • Comes bundled w/ pre-built checkers for well-known dependencies such as Redis, Mongo, HTTP and more.
  • Makes it simple to implement and provide your own checkers (by adhering to the checker interface).
  • Allows you to trigger listener functions when your health checks fail or recover using the IStatusListener interface.
  • Allows you to run custom logic when a specific health check completes by using the OnComplete hook.

[1] Make sure to run your checks on a "sane" interval - ie. if you are checking your Redis dependency once every five minutes, your service is essentially running blind for about 4.59/5 minutes. Unless you have a really good reason, check your dependencies every X seconds, rather than X minutes.

[2] go-health continuously writes dependency health state data and allows you to query that data via .State(). Alternatively, you can use one of the pre-built HTTP handlers for your /healthcheck endpoint (and thus not have to manually inspect the state data).

Example

For full examples, look through the examples dir

  1. Create an instance of health and configure a checker (or two)
import (
	health "github.com/InVisionApp/go-health/v2"
	"github.com/InVisionApp/go-health/v2/checkers"
	"github.com/InVisionApp/go-health/v2/handlers"
)

// Create a new health instance
h := health.New()

// Create a checker
myURL, _ := url.Parse("https://google.com")
myCheck, _ := checkers.NewHTTP(&checkers.HTTPConfig{
    URL: myURL,
})
  1. Register your check with your health instance
h.AddChecks([]*health.Config{
    {
        Name:     "my-check",
        Checker:  myCheck,
        Interval: time.Duration(2) * time.Second,
        Fatal:    true,
    },
})
  1. Start the health check
h.Start()

From here on, you can either configure an endpoint such as /healthcheck to use a built-in handler such as handlers.NewJSONHandlerFunc() or get the current health state of all your deps by traversing the data returned by h.State().

Sample /healthcheck output

Assuming you have configured go-health with two HTTP checkers, your /healthcheck output would look something like this:

{
    "details": {
        "bad-check": {
            "name": "bad-check",
            "status": "failed",
            "error": "Ran into error while performing 'GET' request: Get google.com: unsupported protocol scheme \"\"",
            "check_time": "2017-12-30T16:20:13.732240871-08:00"
        },
        "good-check": {
            "name": "good-check",
            "status": "ok",
            "check_time": "2017-12-30T16:20:13.80109931-08:00"
        }
    },
    "status": "ok"
}

Additional Documentation

OnComplete Hook VS IStatusListener

At first glance it may seem that these two features provide the same functionality. However, they are meant for two different use cases:

The IStatusListener is useful when you want to run a custom function in the event that the overall status of your health checks change. I.E. if go-health is currently checking the health for two different dependencies A and B, you may want to trip a circuit breaker for A and/or B. You could also put your service in a state where it will notify callers that it is not currently operating correctly. The opposite can be done when your service recovers.

The OnComplete hook is called whenever a health check for an individual dependency is complete. This means that the function you register with the hook gets called every single time go-health completes the check. It's completely possible to register different functions with each configured health check or not to hook into the completion of certain health checks entirely. For instance, this can be useful if you want to perform cleanup after a complex health check or if you want to send metrics to your APM software when a health check completes. It is important to keep in mind that this hook effectively gets called on roughly the same interval you define for the health check.

Contributing

All PR's are welcome, as long as they are well tested. Follow the typical fork->branch->pr flow.

Issues
  • Fix DATA-DOG/go-sqlmock dependency

    Fix DATA-DOG/go-sqlmock dependency

    Running go mod tidy results in the following error:

    go mod tidy
    go: finding gopkg.in/DATA-DOG/go-sqlmock.v1 v1.3.3
    github.com/myuser/myapp/pkg/health imports
    	github.com/InVisionApp/go-health/checkers tested by
    	github.com/InVisionApp/go-health/checkers.test imports
    	gopkg.in/DATA-DOG/go-sqlmock.v1: cannot find module providing package gopkg.in/DATA-DOG/go-sqlmock.v1
    

    This PR should fix the dependency.

    See also for further reference:

    • https://github.com/DATA-DOG/go-sqlmock/issues/161
    • https://github.com/heptiolabs/healthcheck/pull/24
    opened by unguiculus 4
  • Added reachable checker

    Added reachable checker

    What

    • adds reachable checker

    The reachable checker is a generic TCP checker. Use it to verify that a configured address can be contacted via a request over TCP. This is useful if you do not care about a response from the target and simply want to know if the URL is reachable.

    Testing

    1. Added unit tests keeping 100% coverage
    2. I had already written tests in a slightly different fashion than the existing checkers when I originally wrote this custom checker. If it is necessary to conform to the testing style of the other checkers let me know and I can revisit.
    opened by chesleybrown 4
  • syntax question

    syntax question

    I was looking at the example and saw this:

    handlers.NewJSONHandlerFunc(h)

    I'm new to golang, so this might just be my lack of experience. Could NewJSONHandlerFunc have been attached to the struct (so it would be h.NewJSONHandlerFunc())?

    opened by bettse 4
  • Fix global failed status when multiple fatal checks have mixed results

    Fix global failed status when multiple fatal checks have mixed results

    Hello, first of all thank you for this library!

    I really like the effective and simple implementation.

    I found a bug that I suspect is a race condition in the global status when there are multiple fatal checks with mixed order.

    Given the presence of multiple fatal checks, some failing and some passing, if the order of execution makes the passing check end last the global status was reported as ok even if other fatal check were in a failed state.

    Looking at the code, each goroutine was updating the h.failed "thread-global" variable at the end of it's check cycle. This makes the h.failed variable to have the last value set by a goroutine in order of exectuion time. If a successful fatal test run after a failing one, the h.failed variable was reporting true.

    To reproduce the bug I updated the test cases

    TestFailed/
      Should_return_false_if_a_fatally_configured_check_hasn't_errored
    

    and

    TestState/
      When_a_fatally-configured_check_fails_and_recovers,_state_should_get_updated_accordingly
    

    to use two checkers instead of one, both fatal but one failing and one passing, ordered such that the passing check was executed after the failing one.
    ( You can still test this at commit 1fa23aa )

    This PR adds a little feature and implements a fix for this behaviour.

    Feature

    Add Fatal field in the State struct.
    This information is really helpful when multiple checks (mixed fatal and not fatal) are present and the global status is failed.
    With this information exposed is clear which check is making the global status to fail.

    The bugfix is based upon the Fatal field exposed in the State struct, this could be reworked but I found the Fatal to be useful there anyway.

    Bugfix

    Remove the h.failed boolean and rely only on State information to evaluate the condition.
    With this implementation the h.Failed() function relies on safeGetStates to reported last known status.

    Thank you for looking into this!

    opened by endorama 3
  • Data race in states handling

    Data race in states handling

    Hello,

    You may want to review your mutex usage around map[string]State in health.go, as safeGetStates() is not safe: you safely copy a pointer to the map by returning the map value, but it is the access to the individual elements of the map that you should protect (or safely deep-copy the map on each read access, but it may be heavy-handed ;). Confirmed with the go race detector with a dummy check, and polling a /ready route.

    func main() {
            h := health.New()
    
            h.AddChecks([]*health.Config{
                    {
                            Name:     "my-check",
                            Checker:  foo{},
                            Interval: time.Duration(1) * time.Second,
                            Fatal:    true,
                    },
            })
            h.Start()
            http.Handle("/ready", handlers.NewJSONHandlerFunc(h, nil))
            http.ListenAndServe(":8020", nil)
    }
    

    Side-note, I was taking a look at the various health packages, maybe https://github.com/heptiolabs/healthcheck/ could be a good fit for you, it seems simpler (less packages/smaller API) but powerful and bugfree, and less open source packages to maintain (for you) or evaluate (for me) is often better ;)

    Best regards, David

    (updated: no need for a 1 ms check interval and using wrk in order to trigger the go race detector, 1s and a single curl is enough ;)

    opened by dlecorfec 3
  • Fix module

    Fix module

    Starting at v2, the module path must end in the major version.

    The fakes were updated manually, which is not ideal. They seem to have been edited manually before. I had tried to regenerate but the tests wouldn't run.

    Fixes #70

    opened by unguiculus 2
  • Module migration did not encode major version

    Module migration did not encode major version

    Release v2.1.1 cannot be used because the module migration was done incorrectly. Starting at v2, the module path must end in the major version.

    See https://github.com/golang/go/wiki/Modules#semantic-import-versioning.

    opened by unguiculus 2
  • Move checkers with external deps into sub packages

    Move checkers with external deps into sub packages

    As previously discussed in #56 this PR moves all checkers with external deps into sub packages. So, people using the checkers need to install only deps to checkers that they are really using.

    opened by maxcnunes 2
  • Ideally checkers with external dependencies should live in a different package

    Ideally checkers with external dependencies should live in a different package

    For instance, I just need to use the reachable checker and my project doesn't have any mongo dep. But, to use the reachable checker I need to install all dependencies in InVisionApp/go-health/checker and the mongo one has a dependency on github.com/globalsign/mgo. If the checkers that have some external dependency were in a different package it wouldn't force me to have a dependency on something that I'm not using. No need to move them to a different repository, just having them a subfolder would be enough to avoid this dependency issue. I can create a PR for it, but it would cause a breaking change in the project.

    Suggested packages:

    • InVisionApp/go-health/checker no external deps
    • InVisionApp/go-health/checker/mongo
    • InVisionApp/go-health/checker/redis
    • InVisionApp/go-health/checker/disk
    • InVisionApp/go-health/checker/memcache
    opened by maxcnunes 2
  • Add a `StopWithStatus(...)`

    Add a `StopWithStatus(...)`

    Add an optional StopWithStatus(...) method that on top of calling Stop() will also change the status + message of the built-in /healthcheck endpoints.

    opened by dselans 2
  • Add OnComplete hook to health check config

    Add OnComplete hook to health check config

    This adds an OnComplete hook that will be called when the health check is complete. It can be defined when creating the config for a health check.

    Health.AddCheck(&health.Config{
    		Name:     "myCheck",
    		Checker:  checker,
    		Interval: time.Duration(1) * time.Second,
    		Fatal:    false,
                    OnComplete: func(state *health.State){ ... },
    	})
    
    opened by hebime 1
  • Indentation for JSON in NewJSONHandlerFunc

    Indentation for JSON in NewJSONHandlerFunc

    When using go-health together with gin-gonic I'm finding it hard to format the JSON output from NewJSONHandlerFunc(). Would it be possible to update the signature of NewJSONHandlerFunc() to include some kind of config? I don't really know what approach that would be the best, but some pseudo examples:

    func NewJSONHandlerFunc(h health.IHealth, custom map[string]interface{}, formatConfig *health.FomatConfig) http.HandlerFunc {
        ...
    
        encoder := json.NewEncoder(...)
        if formatConfig != nil {
            encoder.SetIndent(formatConfig.Prefix, formatConfig.Indent)
        }
        json.Encode(...)
    }
    

    Or something simple like:

    func NewJSONHandlerFunc(h health.IHealth, custom map[string]interface{}, prefix, indent string) http.HandlerFunc {
        ...
        data, err := json.MarshalIndent(fullBody.data, prefix, indent)
    }
    

    Like I said, I don't know the best (go) approach for this. And if this is possible somehow already I would be happy for all the information I can get.

    Thanks for a very nice project.

    opened by antonjah 0
  • add remote server health checker

    add remote server health checker

    In many cases, we need to check remote server status, such as cpu usage percentage, memory usage percentage. I hope this project add this feature. thanks

    opened by aronlt 1
  • reduce ReachableDialer interface to easily allow grpc.Dial

    reduce ReachableDialer interface to easily allow grpc.Dial

    otherwise I need to write something

    type grpcCon struct {
    	conn *grpc.ClientConn
    }
    
    func (c *grpcCon) Close() error {
    	return c.conn.Close()
    }
    
    func (c *grpcCon) LocalAddr() net.Addr {
    	panic("not implemented")
    }
    
    func (c *grpcCon) RemoteAddr() net.Addr {
    	panic("not implemented")
    }
    
    func (c *grpcCon) Read(b []byte) (n int, err error) {
    	panic("not implemented")
    }
    
    func (c *grpcCon) Write(b []byte) (n int, err error) {
    	panic("not implemented")
    }
    
    func (c *grpcCon) SetDeadline(t time.Time) error {
    	panic("not implemented")
    }
    
    func (c *grpcCon) SetReadDeadline(t time.Time) error {
    	panic("not implemented")
    }
    
    func (c *grpcCon) SetWriteDeadline(t time.Time) error {
    	panic("not implemented")
    }
    
    func Dial(network, address string, timeout time.Duration) (net.Conn, error) {
    	c, err := grpc.Dial(address, grpc.WithInsecure(), grpc.WithTimeout(timeout))
    	if err != nil {
    		return nil, err
    	}
    	return &grpcCon{c}, nil
    }
    
    opened by sergeyt 0
  • Persist checkers state?

    Persist checkers state?

    Hi,

    I would like to persist checkers state, so if some check was in the failed state after I restart service I would like to still be in a failed state and to expect recover. I can use map from State() to save that to some JSON file, but I don't see a way to set it for checker before Start(). Is this possible?

    Thanks, Milan

    v3 feature request 
    opened by gen2brain 3
  • Omit empty fields of State struct

    Omit empty fields of State struct

    There are cases where some of the fields are presented in the serialized State without meaningful value. For example if a component's status is ok there is still fields first_failure_at and num_failures with zero values.

    Also there are situation where State struct could be used as a DTO for the overall status of the system, in this cases it is useful name and check_time also to be omitted since they also will be empty.

    Also the Fatal field must be presented always so you know if a check is fatal or not. Not only presented for the fatal checks and omitted for all non fatal checks.

    v3 
    opened by DimitarPetrov 2
Releases(v2.1.2)
Owner
InVision
InVision
Flowgraph package for scalable asynchronous system development

flowgraph Getting Started go get -u github.com/vectaport/flowgraph go test Links Wiki Slides from Minneapolis Golang Meetup, May 22nd 2019 Overview F

Scott Johnston 49 Jun 18, 2022
Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Richard Bertok 58 May 29, 2022
A distributed lock service in Go using etcd

locker A distributed lock service client for etcd. What? Why? A distributed lock service is somewhat self-explanatory. Locking (mutexes) as a service

James Gregory 48 Dec 2, 2021
Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Hamza Yusuff 6 Jun 1, 2022
Take control of your data, connect with anything, and expose it anywhere through protocols such as HTTP, GraphQL, and gRPC.

Semaphore Chat: Discord Documentation: Github pages Go package documentation: GoDev Take control of your data, connect with anything, and expose it an

Jexia.com 74 May 22, 2022
Build share and run your distributed applications.

sealer[ˈsiːlər] provides the way for distributed application package and delivery based on kubernetes.

Alibaba 1.5k Jun 29, 2022
Golang client library for adding support for interacting and monitoring Celery workers, tasks and events.

Celeriac Golang client library for adding support for interacting and monitoring Celery workers and tasks. It provides functionality to place tasks on

Stefan von Cavallar 72 Jun 24, 2022
dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise

dht Installation Install the library package with go get github.com/anacrolix/dht, or the provided cmds with go get github.com/anacrolix/dht/cmd/....

Matt Joiner 238 Jun 20, 2022
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.3k Jun 23, 2022
Compute cluster (HPC) job submission library for Go (#golang) based on the open DRMAA standard.

go-drmaa This is a job submission library for Go (#golang) which is compatible to the DRMAA standard. The Go library is a wrapper around the DRMAA C l

Daniel Gruber 39 Jun 24, 2022
Dynatomic is a library for using dynamodb as an atomic counter

Dynatomic Dynatomic is a library for using dynamodb as an atomic counter Dynatomic Motivation Usage Development Contributing Motivation The dynatomic

Tyler Finethy 14 Jan 23, 2022
A standard library for microservices.

Go kit Go kit is a programming toolkit for building microservices (or elegant monoliths) in Go. We solve common problems in distributed systems and ap

Go kit 23.2k Jun 24, 2022
Simple, fast and scalable golang rpc library for high load

gorpc Simple, fast and scalable golang RPC library for high load and microservices. Gorpc provides the following features useful for highly loaded pro

Aliaksandr Valialkin 651 Jun 20, 2022
A library that implements the outboxer pattern in go

Outboxer Outboxer is a go library that implements the outbox pattern. Getting Started Outboxer was designed to simplify the tough work of orchestratin

Ítalo Vietro 86 Jun 17, 2022
🌧 BitTorrent client and library in Go

rain BitTorrent client and library in Go. Running in production at put.io. Features Core protocol Fast extension Magnet links Multiple trackers UDP tr

Cenk Altı 742 Jun 24, 2022
A Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services

sleuth sleuth is a Go library that provides master-less peer-to-peer autodiscovery and RPC between HTTP services that reside on the same network. It w

null 352 Jun 21, 2022
An experimental library for building clustered services in Go

Donut is a library for building clustered applications in Go. Example package main import ( "context" "log" "os" // Wait for etcd client v3.4, t

David Forsythe 97 Jul 13, 2021
Go Library [DEPRECATED]

Tideland Go Library Description The Tideland Go Library contains a larger set of useful Google Go packages for different purposes. ATTENTION: The cell

Tideland 194 May 27, 2022
Lockgate is a cross-platform locking library for Go with distributed locks using Kubernetes or lockgate HTTP lock server as well as the OS file locks support.

Lockgate Lockgate is a locking library for Go. Classical interface: 2 types of locks: shared and exclusive; 2 modes of locking: blocking and non-block

werf 230 Jun 16, 2022