Time Series Alerting Framework

Overview

Bosun

Bosun is a time series alerting framework developed by Stack Exchange. Scollector is a metric collection agent. Learn more at bosun.org.

Build Status

Building

bosun and scollector are found under the cmd directory. Run go build in the corresponding directories to build each project. There's also a Makefile available for most tasks.

Running

For a full stack with all dependencies, run docker-compose up from the docker directory. Don't forget to rebuild images and containers if you change the code:

$ cd docker
$ docker-compose down
$ docker-compose up --build

If you only need the dependencies (Redis, OpenTSDB, HBase) and would like to run Bosun on your machine directly (e.g. to attach a debugger), you can bring up the dependencies with these three commands from the repository's root:

$ docker run -p 6379:6379 --name redis redis:6
$ docker build -f docker/opentsdb.Dockerfile -t opentsdb .
$ docker run -p 4242:4242 --name opentsdb opentsdb

The OpenTSDB container will be reachable at http://localhost:4242. Redis listens on its default port 6379. Bosun, if brought up in a Docker container, is available at http://localhost:8070.

Developing

Install:

  • Run make deps and make testdeps to set up all dependencies.
  • Run make generate when new static assets (like JS and CSS files) are added or changed.

The w.sh script will automatically build and run bosun in a loop. It will update itself when go/js/ts files change, and it runs in read-only mode, not sending any alerts.

$ cd cmd/bosun
$ ./w.sh

Go Version:

  • See the version number in .travis.yml in the root of this repo for the version of Go to use. Generally speaking, you should be able to use newer versions of Go if you are able to build Bosun without error.

Miniprofiler:

  • Bosun includes miniprofiler in the web UI which can help with debugging. The key combination ALT-P will show miniprofiler. This allows you to see timings, as well as the raw queries sent to TSDBs.
Issues
  • Support influxdb

    Support influxdb

    Would help if bosun supported influxdb. I didn't find a bug tracking this, so here it is.

    Since I have multiple datasources sending data to influxdb. (collectd, statsite). It would keep my dependencies low if I could use the influxdb for bosun rather than change the entire system to openTSDB.

    enhancement Needs Review / Implementation Plan bosun influxdb 
    opened by fire 52
  • Multiple backends of the same type?

    Multiple backends of the same type?

    Is it possible to have multiple instances of the same type of backends? for example multiple InfluxDB backends or multiple ElasticSearch backends? I ask because I'm trying to pull in data from two separate instances but simply creating a duplicate key results in a config error: fatal: main.go:88: conf: bosun.config:2:0: at <influxHost = xx.xx.x...>: duplicate key: influxHost

    enhancement bosun wontfix 
    opened by aodj 31
  • Distributed alert checks to prevent high load spikes

    Distributed alert checks to prevent high load spikes

    This is a solution for #2065

    The idea behind this is simple. Every check run is slightly shifted so that the checks are distributed uniformly.

    For the subset of checks that run with the period T, a shift is added to every check. The shift ranges from 0 to T-1. The shifts are incremental. For example, if we have 6 checks every 5 mins (T=5). The shifts will be 0, 1, 2, 3, 4, 0. This way, without the patch 6 checks will happen at times 0, and 5; with the patch, two checks will happen at the time 0, one at 1, one at 2, and so on. The total number of checks and check period stay the same.

    Here is the test that shows the effect of the patch on system load. Note, that the majority of checks in this system have 5 mins period. patch_test

    opened by grzkv 27
  • Config management

    Config management

    I want to deploy bosun as a dashboard & alerting system within my organization, but I feel like having config management being completely external to bosun is a major drawback. It would be super fantastic if it were possible to, entirely through the web interface, define, test, and commit a new alert, or to update an existing alert to tweak the parameters.

    Is anything like this in the works? How do you manage this in your existing deployments?

    enhancement Needs Review / Implementation Plan bosun 
    opened by nornagon 24
  • Support Dependencies

    Support Dependencies

    Problem: Something goes down which results in lots of other things being down, because of this, we get a lot of alerts.

    Common Examples:

    • A Network Partition: Some portion of hosts become unavailable from bosun's perspective
    • Host Goes Down: Everything monitored on that host becomes unavailable
    • Service dependencies: We expect some service to go down if another service goes down
    • Bosun can't query it's database (This is probably a different feature, but noting here nonetheless)

    Things I want to be able to do based on our config at Stack Exchange:

    • Have our host-based alert macro include detect if the host in Oregon (because the host name has "or-". So this is basically a dependency based on a lookup table
    • Have our host-based alerts not trigger if bosun is unable to ping the host (which would be another alert most likely)
    • Be able to have dependencies for alerts that may have no group.

    The status for any alert that is not triggering for an alert should be "unevaluated". This won't show up on the dashboard or trigger notifications.

    Two general approaches come to mind. The first is that dependencies require another alert. That other alert is run first, and the alert won't trigger based on the result of another alert. The other is that dependencies are an expression. I think the expression route only really makes sense if an alert itself can be used as an expression.

    Another possibility which I haven't thought much about is that alerts generate dependencies and not the other way around. So for example, an alert marks some tagset as something that should not be evaluated.

    Making Stuff Up....

    macro ping_location {
        template = ping.location
        $pq = max(q("sum:bosun.ping.timeout{dst_host=$loc*,host=$source}", "5m", ""))
        $grouped = t($pq,"")
        $hosts_timing_out = sum($grouped)
        $total_hosts = len($grouped)
        $percent_timeout = $hosts_timing_out / $total_hosts * 100
        crit = $percent_timeout > 10
    }
    
    #group is empty
    alert or_hosts_down {
        $source=ny-bosun01
        $loc = or-
        $name = OR Peak
        macro = ping_location
    }
    
    #Group is {dst_host=*}
    alert host_down {
       template = host_down
       cirt = max(q("sum:bosun.ping.timeout{dst_host=*", "5m", ""))
    }
    
    lookup location {
        entry host=or-* {
            alert = alert("or_hosts_down")
        }
        ...
    }
    
    macro host_based {
       #This makes it so alerts based on this macro that are host based won't trigger if 
       dependency = lookup("location", "alert") || alert("host_down")
       #Another idea here is that you can create tag synonyms for an alert. So instead of having to add this lookup function that translates, have a synonym feature of alerts and also global that says (consider this tag key to be the same as this tag key). This would also solve an issue with silences (i.e. silencing host=ny-web11 doesn't do anything for the haproxy alert that has hosts as svname). Another issue with that is the those alerts are not tag based, so we actually need inhibit in that case. 
    }
    
    
    bosun Needs Documentation 
    opened by kylebrandt 22
  • Bosun sending notifications for closed and inactive alerts

    Bosun sending notifications for closed and inactive alerts

    We have a very simple rule file, with 3 notifications (http post to PD and slack, and email) and a bunch of alert rules which trigger them. We are facing a weird issue wherein, the following happens:

    • alert triggers, sends notifications
    • a human acks the alert
    • human solves problem, alert becomes inactive
    • human closes the alert
    • notification still keeps triggering (alert is no where to be seen in the bosun UI/api) - forever!

    to explain it through logs, quite literally this is what we're seeing:

    2016/04/01 07:56:37 info: check.go:513: check alert masked.masked.write.rate.too.low start 2016/04/01 07:26:38 info: check.go:537: check alert masked.masked.write.rate.too.low done (1.378029647s): 0 crits, 0 warns, 0 unevaluated, 0 unknown 2016/04/01 07:26:38 info: alertRunner.go:55: runHistory on masked.masked.write.rate.too.low took 54.852815ms 2016/04/01 07:26:39 info: search.go:205: Backing up last data to redis 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} 2016/04/01 07:28:20 info: notify.go:57: [bosun] critical: component xyz write rate too low: 0.00 records/minute in {adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} 2016/04/01 07:28:20 info: notify.go:115: relayed alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka} to [[email protected]] sucessfully. Subject: 148 bytes. Body: 3500 bytes. 2016/04/01 07:28:20 info: notify.go:115: relayed alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka} to [[email protected]] sucessfully. Subject: 147 bytes. Body: 3497 bytes. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeAttributeToKafka}. Response code 200. 2016/04/01 07:28:20 info: notify.go:80: post notification successful for alert masked.masked.write.rate.too.low{adaptor=masked-masked-masked,colo=xyz,stream=writeActivityToKafka}. Response code 200.

    bug bosun 
    opened by angadsingh 20
  • Use templates body as payload for notifications and subject for other HTML related stuff

    Use templates body as payload for notifications and subject for other HTML related stuff

    Hi all, as described in the docs, I'm using the templates subject as body for POSTing stuff to our hipchat bot. the problem I encounter is in Bosun main view (list of alerts) where the template subject is presented when clicking an alert for details.

    image

    Suggested is to use templates' body as payload for notification (POST notifications mainly). a flag can be also added to let the user which template will use the subject as payload and which will use the body.

    Thanks, Yarden

    Notifications Post Notifications Crappy 
    opened by ayashjorden 20
  • Add Recovery Emails

    Add Recovery Emails

    When an alert instance goes from (Unknown Warning or Critical) to Normal a recovery email should be sent.

    Considerations:

    • Should recovery templates be their own template? I think they should, and repeated logic can be done via include templates
      • Who to notify? The same notifications that were notified of the previous state.
      • notifications will need a no_recovery option. This is needed if we want to hook up alerts to pagerduty (don't want our phones being dialed to let us know that an issue is recovered, at that point we can rely on email)

    My main reservation about this feature is that users are more likely not to investigate an alert that is recovered, this is dangerous because the alert could be a latent issue. However, it is better to provide a better frictionless workflow than a road block. Bosun aims to provide all the tools needed for very informative notifications so good judgements can be made at times without needing to go to a console. Furthermore, we should also add acknowledgement notifications. This will be a way to inform all recipients of an alert that someone has made a decision about this alert and hopefully committed to an action (fixing the actual problem, or tuning the alert).

    Ack emails will be described in another issue.

    This feature needs discussion and review prior to implementation.

    enhancement Needs Review / Implementation Plan bosun wontfix 
    opened by kylebrandt 20
  • Memory leak in Bosun

    Memory leak in Bosun

    I updated our test servers to the latest version of bosun from https://github.com/bosun-monitor/bosun/releases/download/20150428222252/bosun-linux-amd64 After running for slightly less than a day, it stopped responding.

    The command line where I started it revealed:

     ./bosun-linux-amd64 -c=/data/bosun.conf
    2015/05/04 16:21:54 enabling syslog
    Killed
    

    Syslog (cat /var/log/messages |grep bosun) did not reveal any log messages in the hours before the crash.

    It looks like a memory leak. The graph of bosun.collect.alloc grew gradually from 200Mb after deploying the new version to 12Gb just before the "crash": rapid memory leak

    Looking back over the last week at the memory behaviour of the previous version, there was a similar memory growth pattern in the previous version too but at a much slower rate. The bottom graph shows gradual memory increasing over the course of a week followed by two rapid increases for the newer version. bosun memory leak memory only last 7 days

    Just for interest sake, here is a general Bosun dashboard; the other stats look reasonable. Although there is a high number of go routines after restarting Bosun this appears unrelated to the leak. bosun memory leak dashboard

    More information about our setup:

    • Backend: OpenTSDB
    • Data is being passed through Bosun to OpenTSDB (as visible from the dashboard)
    • We send data points every minute at a rate of about 37000 per minute
    • In addition scollector is submitting data from one machine monitoring openTSDB, elasticsearch, Bosun, Linux and os
    • The rule file is still a small prototype:
    httpListen = :8070
    tsdbHost = localhost:4242
    
    smtpHost = ******
    emailFrom = ******
    
    macro grafanaConfig {
        $grafanaHost = ******
    }
    
    notification emailIzak {
        email = [email protected]
        next = emailIzak
        timeout = 24h
    }
    
    
    ##################### Templates #######################
    
    
    template generic {
        body = `{{template "genericHeader" .}}
        {{template "genericDef" .}}
    
        {{template "genericTags" .}}
    
        {{template "genericComputation" .}}
    
         {{if .Alert.Vars.graph}}
         <h3>{{.Alert.Vars.graphTitle}}</h3>
        <p>{{.Graph .Alert.Vars.graph}}
        {{end}}`
    
        subject =  {{.Last.Status}}: {{.Alert.Name}} on instance {{.Group.serviceinstance}}
    }
    
    template genericHeader {   
        body = `
        <h3> Possible actions </h3>   
        {{if .Alert.Vars.note}}
            <p>{{.Alert.Vars.note}}
        {{end}}
         <p><a href="{{.Ack}}">Acknowledge alert</a>
    
        {{if .Alert.Vars.grafanaDash}}
            <p><a href="{{.Alert.Vars.grafanaDash}}"> View the relevant statistics dasboard </a>
        {{end}}
        `
    }
    
    template genericDef {
        body = `
        <h3> Details </h3>
        <p><strong>Alert definition:</strong>
        <table>
            <tr>
                <td>Name:</td>
                <td>{{replace .Alert.Name "." " " -1}}</td></tr>
            <tr>
                <td>Warn:</td>
                <td>{{.Alert.Warn}}</td></tr>
            <tr>
                <td>Crit:</td>
                <td>{{.Alert.Crit}}</td></tr>
        </table>`
    }
    
    template genericTags {
        body = `<p><strong>Tags</strong>
    
        <table>
            {{range $k, $v := .Group}}
                {{if eq $k "host"}}
                    <tr><td>{{$k}}</td><td><a href="{{$.HostView $v}}">{{$v}}</a></td></tr>
                {{else}}
                    <tr><td>{{$k}}</td><td>{{$v}}</td></tr>
                {{end}}
            {{end}}
        </table>`
    }
    
    template genericComputation {
        body = `
        <p><strong>Computation</strong>
    
        <table>
            {{range .Computations}}
                <tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
            {{end}}
        </table>`
    }
    
    template unkown {
        subject = {{.Name}}: {{.Group | len}} unknown alerts. 
        body = `
        <p>Unknown alerts imply no data is being recorded for their monitored time series. Therefore we cannot know what is happening. 
        <p>Time: {{.Time}}
        <p>Name: {{.Name}}
        <p>Alerts:
        {{range .Group}}
            <br>{{.}}
        {{end}}`
    }
    
    unknownTemplate = unkown
    
    
    #################### alerts #######################
    
    
    alert FlowRouterBytesZero {
        template = generic
        $query = "sum:bytes.bytes.counter.value{serviceinstance=*}"
    
        $note = The flow router has reported zero bytes in the last 2 minutes. This note should contain extra information specifying what action the operator should take to resolve it. 
        $graph =q($query, "24h", "")
        $graphTitle = Flow router traffic in the last 24 hours
        macro = grafanaConfig
        $grafanaDash = $grafanaHost/dashboard/db/per-flow-route-bytes-drill-down
    
        $avgBytesPer2Min = avg(q($query, "2m", ""))
        $avgBytesPer5Min = avg(q($query, "5m", ""))
    
        warn =  $avgBytesPer2Min == 0
        crit =  $avgBytesPer5Min == 0
        critNotification = emailIzak
    }
    
    
    opened by IzakMarais 17
  • Add series aggregation DSL function `aggregate`

    Add series aggregation DSL function `aggregate`

    This PR adds an aggregate DSL function, which allows one to combine different series in a seriesSet using a specified aggregator (currently min, max, p50, avg).

    This is particularly useful when comparing data across different weeks (using the over) function. In our case, for anomaly detection, we want to compare the current day's data with an aggregated view of the same day in previous weeks. In particular, we want to compare each point in the last day to the median of each point in the corresponding day for the last 3 weeks, so that any anomalies that occurred in a previous week are ignored. This way we compare with a hypothetical "perfect" day.

    For example:

    $weeks = over("avg:10m-avg-zero:os.cpu", "24h", "1w", 3)
    $a = aggregate($weeks, "", "p50")
    merge($a, $q)
    

    Which looks like this:

    screen shot 2018-08-17 at 4 51 27 pm

    Or, if we wanted to combine series but maintain the region and color groups`, that query would look like this:

    $weeks = over("avg:10m-avg-zero:os.cpu{region=*,color=*}", "24h", "1w", 3)
    aggregate($weeks, "region,color", "p50")
    

    which would result in one merged series for each unique region/color combination.

    I am very happy to take suggestions for changes / improvements. With regards to naming the function, I would have probably chosen "merge", but since that is already taken, I went with the OpenTSDB terminology and used "aggregate".

    opened by hermanschaaf 16
  • Unable to query bosun after running for a minute

    Unable to query bosun after running for a minute

    I have installed Hbase, opentsdb and bosun on a machine running Centos7. I can see the bosun website fine, but any query I try to run from the graph page is giving some error. I've put the bosun output into a log file, and there are 2 kinds of errors that pop up. Sometime it's too many open files:

    2016/03/04 11:10:23 error: queue.go:102: Post http://localhost:8070/api/put: dial tcp 127.0.0.1:8070: socket: too many open files

    Sometimes it's just a timeout.

    2016/03/04 11:14:06 error: queue.go:102: Post http://localhost:8070/api/put: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

    Sometimes restarting seems to help, other times not so much. The longest I've had bosun running without these errors is a day.

    opened by VictoriaD 16
  • Added

    Added "* L4TOUT" to haproxyCheckStatus


    Description

    Scollector did not manage to collect data from HAProxy (HAProxy version 2.0.13-2ubuntu0.5). Got error:

    Apr 28 16:26:34 ServerName scollector[1741859]: error: interval.go:65: haproxy-1-http://localhost:1936/;csv: unknown check status * L4TOUT
    Apr 28 16:26:49 ServerName scollector[1741859]: error: interval.go:65: haproxy-1-http://localhost:1936/;csv: unknown check status * L4TOUT
    

    Print from HAProxy: image

    Simply added "* L4TOUT" so that its a valid check status for haproxyCheckStatus

    Type of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    How has this been tested?

    • [x] HAProxy collection now works again for HAProxy version 2.0.13-2ubuntu0.5

    Checklist:

    • [x] This contribution follows the project's code of conduct
    • [x] This contribution follows the project's contributing guidelines
    • [ ] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] Any dependent changes have been merged and published in downstream modules
    opened by AlexanderRydberg 0
  • Only process some metrics when OpenTSDB is enabled

    Only process some metrics when OpenTSDB is enabled


    Description

    When OpenTSDB is not enabled, the processing of metrics sending to OpenTSDB is in vain.

    The underlying reason to make this change is to make the scheduler run more accurately.

    In production, it takes about 100 - 300ms to process these metrics. Suppose the time to process metric is always 200ms and one alert is scheduled to run every minute, the actual number of alert execution for one day becomes 60 * 60 * 24 / 60.2 = 1435.2, less than expected 1440. Whether the reduced 5 times execution matters or not depends on use cases and people may have different opinions.

    The real problem we have is one important minutely SLO metric bosun_uptime relying on the accuracy of the scheduler. In current situation, because of this extra processing time, every few minutes, the minutely alert starting time is delayed 1s, which causes the metric missing problem.

    Ideally, we may introduce jitter to reduce the impact of metrics processing time or optimze the processing time, but both are tricky to implement. This change is not very elegant but straightforward.

    Type of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    How has this been tested?

    Test in production

    Checklist:

    • [x] This contribution follows the project's code of conduct
    • [x] This contribution follows the project's contributing guidelines
    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] Any dependent changes have been merged and published in downstream modules
    opened by harudark 0
  • Enable scheduled web cache cleanup

    Enable scheduled web cache cleanup


    Description

    var cacheObj = cache.New("web", 100) is a cache for web requests. For some heavy Graphite queries, due to the existence of cache, the memory used by json unmarshalling cannot be released for long time. Create a schedule task to clear the cache.

    Type of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    How has this been tested?

    This has been running with the following configuration in production.

    ......
    # Enable scheduled web cache clear task. Default is false.
    ScheduledClearWebCache = true
    
    # The frequency of scheduled web cache clear task. Default is "24h".
    ScheduledClearWebCacheDuration = "24h"
    ......
    

    Checklist:

    • [x] This contribution follows the project's code of conduct
    • [x] This contribution follows the project's contributing guidelines
    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] Any dependent changes have been merged and published in downstream modules
    opened by harudark 0
  • Improve post notification metrics

    Improve post notification metrics


    Description

    • add 3xx, 4xx and 5xx breakdowns
    • consider network errors as post failure

    Type of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    How has this been tested?

    • [x] I queried api/health endpoint and verified metrics are expected

    Checklist:

    • [x] This contribution follows the project's code of conduct
    • [x] This contribution follows the project's contributing guidelines
    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] Any dependent changes have been merged and published in downstream modules
    opened by harudark 0
  • Fix route name of `/api/reload`

    Fix route name of `/api/reload`


    Description

    Fixes route name of /api/reload

    Type of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    How has this been tested?

    No tests required.

    Checklist:

    • [x] This contribution follows the project's code of conduct
    • [x] This contribution follows the project's contributing guidelines
    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [x] I have made corresponding changes to the documentation
    • [x] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    • [x] Any dependent changes have been merged and published in downstream modules
    wontfix 
    opened by Alex-Hou 1
  • Clarify release status

    Clarify release status

    We package this for NixOS, and we like to use the latest stable release from upstream.

    https://github.com/bosun-monitor/bosun/releases/tag/0.8.0-preview is listed as the latest release on GitHub. Is it a stable release or should it be marked pre-release? I ask because it has the "-preview" suffix attached to it, making me think it is an unstable release.

    bug 
    opened by ryantm 4
Releases(0.8.0-preview)
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 43k Jun 21, 2022
checkah is an agentless SSH system monitoring and alerting tool.

CHECKAH checkah is an agentless SSH system monitoring and alerting tool. Features: agentless check over SSH (password, keyfile, agent) config file bas

deadc0de 6 Apr 5, 2022
Butler - Aggregation and Alerting Platform

Welcome to Butler Table of Contents Welcome About The Project Contributing Developer Workflow Getting Started Configuration About The Project Contribu

Butler 2 Mar 1, 2022
Felix Geisendörfer 28 Feb 9, 2022
Time based rotating file writer

cronowriter This is a simple file writer that it writes message to the specified format path. The file path is constructed based on current date and t

Yuta UEKUSA 47 Feb 8, 2022
Visualise Go program GC trace data in real time

This project is no longer maintained I'm sorry but I do not have the bandwidth to maintain this tool. Please do not send issues or PRs. Thank you. gcv

Dave Cheney 1.1k Jun 17, 2022
gosivy - Real-time visualization tool for Go process metrics

Gosivy tracks Go process's metrics and plot their evolution over time right into your terminal, no matter where it's running on. It helps you understand how your application consumes the resources.

Ryo Nakao 442 Jun 16, 2022
List files and their creation, modification and access time on android

andfind List files and their access, modification and creation date on a Android

Tek 2 Jan 5, 2022
dateparse time by struct tag

dateparse_tag dateparse time by struct tag intro&简介 WithTagName() // 自定义你想要使用的tag名称,默认为dateFormat WithDefaultTagValue() // 定义这个tag的默认值,默认为 default Wit

coward 1 Jan 13, 2022
A simple digital clock written in go to show time in hh : mm : ss format in console

Go console clock a simple digital clock written in go to show time in "hh : mm :

Mojtaba Khodami 0 Feb 3, 2022
Accident & Emergency (A&E) Waiting Time

Hospital_AE Accident & Emergency (A&E) Waiting Time Priority will be accorded to patients triaged as critical, emergency and urgent. The following dat

null 0 Feb 11, 2022
The open telemetry framework

DISCONTINUATION OF PROJECT This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Intel SDI 1.8k Jun 15, 2022
A CUE-based framework for portable, evolvable, schema

Scuemata Scuemata is a system for writing schemas. Like JSON Schema or OpenAPI, it is general-purpose, and most obviously useful as an IDL. Unlike JSO

Grafana Labs 74 Jun 23, 2022
A simple logging framework for Go program.

ASLP A Go language based log library, simple, convenient and concise. Three modes, standard output, file mode and common mode. Convenient, simple and

丙杺 1 Jan 9, 2022
Time Series Alerting Framework

Bosun Bosun is a time series alerting framework developed by Stack Exchange. Scollector is a metric collection agent. Learn more at bosun.org. Buildin

Bosun 3.3k Jun 22, 2022
Time Series Alerting Framework

Bosun Bosun is a time series alerting framework developed by Stack Exchange. Scollector is a metric collection agent. Learn more at bosun.org. Buildin

Bosun 3.3k Jun 28, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

InfluxData 2.1k Jun 23, 2022
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.

unitdb Unitdb is blazing fast specialized time-series database for microservices, IoT, and realtime internet connected devices. As Unitdb satisfy the

Saffat Technologies 91 Jun 9, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 43.1k Jun 24, 2022
VictoriaMetrics: fast, cost-effective monitoring solution and time series database

VictoriaMetrics VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It is available in binary release

VictoriaMetrics 6.6k Jun 24, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 43k Jun 21, 2022
Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator

Trickster is an HTTP reverse proxy/cache for http applications and a dashboard query accelerator for time series databases. Learn more below, and chec

null 1.7k Jun 22, 2022
A toolkit for replaying time series data.

Replay Toolkit The replay package provides some simple tools for replaying captured data at realtime. I use this in various tools that take logged dat

Dustin Sallings 16 Aug 13, 2019
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Bleemeo 16 Jun 18, 2022
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

LinDB 2.3k Jun 29, 2022
TalariaDB is a distributed, highly available, and low latency time-series database for Presto

TalariaDB is a distributed, highly available, and low latency time-series database that stores real-time data. It's built on top of Badger DB.

Grab 97 Jun 18, 2022
tstorage is a lightweight local on-disk storage engine for time-series data

tstorage is a lightweight local on-disk storage engine for time-series data with a straightforward API. Especially ingestion is massively opt

Ryo Nakao 771 Jun 23, 2022
Application written in Go which polls Time-series data at specific intervals and saves to persistent storage

TPoller Server Overview The purpose of this application is to poll time-series data per time interval from any (telemetry) application running a gRPC

Bartlomiej Mika 4 Feb 10, 2022
Fast time-series data storage server accessible over gRPC

tstorage-server Persistent fast time-series data storage server accessible over gRPC. tstorage-server is lightweight local on-disk storage engine serv

Bartlomiej Mika 5 Oct 12, 2021