Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds

Overview

scrutiny_view

scrutiny

CI codecov GitHub license Godoc Go Report Card GitHub release

WebUI for smartd S.M.A.R.T monitoring

NOTE: Scrutiny is a Work-in-Progress and still has some rough edges.

WARNING: Once the InfluxDB branch is merged, Scrutiny will use both sqlite and InfluxDB for data storage. Unfortunately, this may not be backwards compatible with the database structures in the master (sqlite only) branch.

Introduction

If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the smartd daemon. If not, it's an incredible open source project described as the following:

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

Theses S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with smartd:

  • There are more than a hundred S.M.A.R.T attributes, however smartd does not differentiate between critical and informational metrics
  • smartd does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
  • S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.
  • smartd is a command line only tool. For head-less servers a web UI would be more valuable.

Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.

Features

Scrutiny is a simple but focused application, with a couple of core features:

  • Web UI Dashboard - focused on Critical metrics
  • smartd integration (no re-inventing the wheel)
  • Auto-detection of all connected hard-drives
  • S.M.A.R.T metric tracking for historical trends
  • Customized thresholds using real world failure rates
  • Temperature tracking
  • Provided as an all-in-one Docker image (but can be installed manually)
  • Future Configurable Alerting/Notifications via Webhooks
  • (Future) Hard Drive performance testing & tracking

Getting Started

RAID/Virtual Drives

Scrutiny uses smartctl --scan to detect devices/drives.

  • All RAID controllers supported by smartctl are automatically supported by Scrutiny.
    • While some RAID controllers support passing through the underlying SMART data to smartctl others do not.
    • In some cases --scan does not correctly detect the device type, returning incomplete SMART data. Scrutiny will eventually support overriding detected device type via the config file.
  • If you use docker, you must pass though the RAID virtual disk to the container using --device (see below)
    • This device may be in /dev/* or /dev/bus/*.
    • If you're unsure, run smartctl --scan on your host, and pass all listed devices to the container.

Docker

If you're using Docker, getting started is as simple as running the following command:

docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
--name scrutiny \
analogj/scrutiny
  • /run/udev is necessary to provide the Scrutiny collector with access to your device metadata
  • --cap-add SYS_RAWIO is necessary to allow smartctl permission to query your device SMART data
    • NOTE: If you have NVMe drives, you must add --cap-add SYS_ADMIN as well. See issue #26
  • --device entries are required to ensure that your hard disk devices are accessible within the container.
  • analogj/scrutiny is a omnibus image, containing both the webapp server (frontend & api) as well as the S.M.A.R.T metric collector. (see below)

Hub/Spoke Deployment

In addition to the Omnibus image (available under the latest tag) there are 2 other Docker images available:

  • analogj/scrutiny:collector - Contains the Scrutiny data collector, smartctl binary and cron-like scheduler. You can run one collector on each server.
  • analogj/scrutiny:web - Contains the Web UI, API and Database. Only one container necessary
docker run -it --rm -p 8080:8080 \
--name scrutiny-web \
analogj/scrutiny:web

docker run -it --rm \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e SCRUTINY_API_ENDPOINT=http://SCRUTINY_WEB_IPADDRESS:8080 \
--name scrutiny-collector \
analogj/scrutiny:collector

Manual Installation (without-Docker)

While the easiest way to get started with Scrutiny is using Docker, it is possible to run it manually without much work. You can even mix and match, using Docker for one component and a manual installation for the other.

See docs/INSTALL_MANUAL.md for instructions.

Usage

Once scrutiny is running, you can open your browser to http://localhost:8080 and take a look at the dashboard.

If you're using the omnibus image, the collector should already have run, and your dashboard should be populate with every drive that Scrutiny detected. The collector is configured to run once a day, but you can trigger it manually by running the command below.

For users of the docker Hub/Spoke deployment or manual install: initially the dashboard will be empty. After the first collector run, you'll be greeted with a list of all your hard drives and their current smart status.

docker exec scrutiny /scrutiny/bin/scrutiny-collector-metrics run

Configuration

By default Scrutiny looks for its YAML configuration files in /scrutiny/config

There are two configuration files available:

Neither file is required, however if provided, it allows you to configure how Scrutiny functions.

Notifications

Scrutiny supports sending SMART device failure notifications via the following services:

  • Custom Script (data provided via environmental variables)
  • Email
  • Webhooks
  • Discord
  • Gotify
  • Hangouts
  • IFTTT
  • Join
  • Mattermost
  • Pushbullet
  • Pushover
  • Slack
  • Teams
  • Telegram
  • Tulip

Check the notify.urls section of example.scrutiny.yml for more information and documentation for service specific setup.

Testing Notifications

You can test that your notifications are configured correctly by posting an empty payload to the notifications health check API.

curl -X POST http://localhost:8080/api/health/notify

Debug mode & Log Files

Scrutiny provides various methods to change the log level to debug and generate log files.

Web Server/API

You can use environmental variables to enable debug logging and/or log files for the web server:

DEBUG=true
SCRUTINY_LOG_FILE=/tmp/web.log

You can configure the log level and log file in the config file:

log:
  file: '/tmp/web.log'
  level: DEBUG

Or if you're not using docker, you can pass CLI arguments to the web server during startup:

scrutiny start --debug --log-file /tmp/web.log

Collector

You can use environmental variables to enable debug logging and/or log files for the collector:

DEBUG=true
COLLECTOR_LOG_FILE=/tmp/collector.log

Or if you're not using docker, you can pass CLI arguments to the collector during startup:

scrutiny-collector-metrics run --debug --log-file /tmp/collector.log

Contributing

Please see the CONTRIBUTING.md for instructions for how to develop and contribute to the scrutiny codebase.

Work your magic and then submit a pull request. We love pull requests!

If you find the documentation lacking, help us out and update this README.md. If you don't have the time to work on Scrutiny, but found something we should know about, please submit an issue.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Jason Kulatunga - Initial Development - @AnalogJ

Licenses

Sponsors

Scrutiny is only possible with the help of my Github Sponsors.

They read a simple reddit announcement post and decided to trust & finance a developer they've never met. It's an exciting and incredibly humbling experience.

If you found Scrutiny valuable, please consider supporting my work

Comments
  • NVMe drives not correctly detected by Scrutiny

    NVMe drives not correctly detected by Scrutiny

    Output of

    [email protected]:/scrutiny# smartctl -j -x /dev/nvme0
    
    {
      "json_format_version": [
        1,
        0
      ],
      "smartctl": {
        "version": [
          7,
          0
        ],
        "svn_revision": "4883",
        "platform_info": "x86_64-linux-4.19.107-Unraid",
        "build_info": "(local build)",
        "argv": [
          "smartctl",
          "-j",
          "-x",
          "/dev/nvme0"
        ],
        "exit_status": 0
      },
      "device": {
        "name": "/dev/nvme0",
        "info_name": "/dev/nvme0",
        "type": "nvme",
        "protocol": "NVMe"
      },
      "model_name": "Force MP510",
      "serial_number": "yes",
      "firmware_version": "ECFM12.3",
      "nvme_pci_vendor": {
        "id": 6535,
        "subsystem_id": 6535
      },
      "nvme_ieee_oui_identifier": 6584743,
      "nvme_total_capacity": 480103981056,
      "nvme_unallocated_capacity": 0,
      "nvme_controller_id": 1,
      "nvme_number_of_namespaces": 1,
      "nvme_namespaces": [
        {
          "id": 1,
          "size": {
            "blocks": 937703088,
            "bytes": 480103981056
          },
          "capacity": {
            "blocks": 937703088,
            "bytes": 480103981056
          },
          "utilization": {
            "blocks": 937703088,
            "bytes": 480103981056
          },
          "formatted_lba_size": 512,
          "eui64": {
            "oui": 6584743,
            "ext_id": 171819811633
          }
        }
      ],
      "user_capacity": {
        "blocks": 937703088,
        "bytes": 480103981056
      },
      "logical_block_size": 512,
      "local_time": {
        "time_t": 1600380529,
        "asctime": "Thu Sep 17 22:08:49 2020 Europe"
      },
      "smart_status": {
        "passed": true,
        "nvme": {
          "value": 0
        }
      },
      "nvme_smart_health_information_log": {
        "critical_warning": 0,
        "temperature": 38,
        "available_spare": 100,
        "available_spare_threshold": 5,
        "percentage_used": 1,
        "data_units_read": 6734413,
        "data_units_written": 15749028,
        "host_reads": 29048027,
        "host_writes": 17155968,
        "controller_busy_time": 298,
        "power_cycles": 4,
        "power_on_hours": 6420,
        "unsafe_shutdowns": 4,
        "media_errors": 0,
        "num_err_log_entries": 8382,
        "warning_temp_time": 0,
        "critical_comp_time": 0
      },
      "temperature": {
        "current": 38
      },
      "power_cycle_count": 4,
      "power_on_time": {
        "hours": 6420
      }
    }
    
    bug waiting for response 
    opened by Roxedus 44
  • [BUG] Crashes on boot

    [BUG] Crashes on boot

    Describe the bug After updating to 0.4.9-omnibus, scrutiny can no longer boot - it always crashes as it starts up.

    System

    • Ubuntu 22.04 arm64
    • Docker Compose
      scrutiny:
        container_name: scrutiny
        image: ghcr.io/analogj/scrutiny:v0.4.9-omnibus
        privileged: true
        volumes:
          - /run/udev:/run/udev:ro
          - /dev:/dev
          - scrutiny-config:/opt/scrutiny/config
          - scrutiny-db:/opt/scrutiny/influxdb
        networks:
          - scrutiny-nginx
    

    Log Files

     ___   ___  ____  __  __  ____  ____  _  _  _  _
    / __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
    \__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
    (___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
    github.com/AnalogJ/scrutiny                             dev-0.4.9
    
    Start the scrutiny server
    time="2022-06-05T18:02:42Z" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n"
    panic: failed to check influxdb setup status - Get "http://localhost:8086/api/v2/setup": dial tcp: lookup localhost: device or resource busy
    
    goroutine 1 [running]:
    github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0x103a6c0, 0x4000010ce8}, {0x1043620, 0x4000430070})
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xd4
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0x400042a630, {0x1043620, 0x4000430070})
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:27 +0x90
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0x400042a630)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:105 +0x530
    main.main.func2(0x40003ffcc0)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x288
    github.com/urfave/cli/v2.(*Command).Run(0x400042e120, 0x40003ffb40)
    	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/command.go:164 +0x648
    github.com/urfave/cli/v2.(*App).RunContext(0x40002fc480, {0x1026870, 0x400003a028}, {0x4000032060, 0x2, 0x2})
    	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:306 +0x840
    github.com/urfave/cli/v2.(*App).Run(...)
    	/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:215
    main.main()
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x73c
    
    bug 
    opened by ViRb3 37
  • [BUG] Latest image crashes on startup

    [BUG] Latest image crashes on startup

    I just switched over to the docker images ghcr.io/analogj/scrutiny:master-web and ghcr.io/analogj/scrutiny:master-collector since the docker hub ones have been taken down. Now my web instance is crashing on startup with this error message:

    goroutine 1 [running]:
    github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc00038a070, 0x12a4b00, 0xc0003faa80, 0x129f9a0)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000385610, 0x12a4b00, 0xc0003faa80, 0x1)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000385610, 0x0, 0x0)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
    main.main.func2(0xc000387340, 0x4, 0x6)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
    github.com/urfave/cli/v2.(*Command).Run(0xc0003ef200, 0xc0003871c0, 0x0, 0x0)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0
    github.com/urfave/cli/v2.(*App).RunContext(0xc0003fe000, 0x128e820, 0xc0000c8010, 0xc0000be020, 0x2, 0x2, 0x0, 0x0)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814
    github.com/urfave/cli/v2.(*App).Run(...)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215
    main.main()
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
    2022/05/13 14:38:05 Loading configuration file: /opt/scrutiny/config/scrutiny.yaml
    time="2022-05-13T14:38:05Z" level=info msg="Trying to connect to scrutiny sqlite db: \n"
    time="2022-05-13T14:38:05Z" level=info msg="Successfully connected to scrutiny sqlite db: \n"
    panic: a username and password is required for a setup
    

    There is no mention in the readme or the examble configs of a username/password, so what are the credentials that the application is missing and crashing over? Also, this error feels a lot like something that should be handled by the application and an informative error presented to the user.

    bug 
    opened by altosys 30
  • [Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data

    [Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data

    Describe the bug Seagate Ironwolf drives show as FAILED with high seek and read error counts

    Expected behavior

    Some way to configure per drive some extra arguments to smartctl calls.

    Seagate ironwolfs use a 48 bit value that is made up of 16 bits of error count and 32 bit of total count of read or seek events.

    For smartctl I have to manually specify the correct bits to read from: smartctl /dev/sdb -a -v 1,raw48:54 -v 7,raw48:54

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       0
    

    And smartctl without the specification:

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       200450784
      3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       12399940
    

    The 200450784 value above is 0xBF2A2E0, which is only 28 bits of data (so only part of the count, not the error), the full hex would be: 00000BF2A2E0 where it would then be split as [0000][0BF2A2E0] and 0 is the actual value of Raw_Read_Error_Rate

    Screenshots

    image

    image

    documentation enhancement 
    opened by Parlane 25
  • [BUG]smartctl checksum errors

    [BUG]smartctl checksum errors

    Hi,

    i'm using the linuxserver.io docker image (latest tag) and currently am getting the following errors when running "scrutiny-collector-metrics run"

    `[email protected]:/# scrutiny-collector-metrics run


    / ) / )( _ ( )( )( )( )( ( )( / ) _ ( ( ) / )()( )( )( ) ( \ / (/ _)()_)() () ()()_) () AnalogJ/scrutiny/metrics dev-0.1.13

    INFO[0000] Verifying required tools type=metrics INFO[0000] Sending detected devices to API, for filtering & validation type=metrics INFO[0000] Main: Waiting for workers to finish type=metrics INFO[0000] Collecting smartctl results for sdd type=metrics INFO[0000] Collecting smartctl results for sda type=metrics INFO[0000] Collecting smartctl results for sdb type=metrics INFO[0000] Collecting smartctl results for sdc type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sda" ], "exit_status": 4 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJDN6SD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sda type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdb" ], "exit_status": 4 }, "device": { "name": "/dev/sdb", "info_name": "/dev/sdb", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJ8S5BD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdb type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdd" ], "exit_status": 4 }, "device": { "name": "/dev/sdd", "info_name": "/dev/sdd", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "2YJDUTKD", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdd type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-4.14.24-qnap", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "/dev/sdc" ], "exit_status": 4 }, "device": { "name": "/dev/sdc", "info_name": "/dev/sdc", "type": "scsi", "protocol": "SCSI" }, "vendor": "WDC", "product": "WD100EMAZ-00WJTA", "model_name": "WDC WD100EMAZ-00WJTA", "revision": "83.H", "scsi_version": "SPC-3", "user_capacity": { "blocks": 19532873728, "bytes": 10000831348736 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5400, "form_factor": { "scsi_value": 2, "name": "3.5 inches" }, "serial_number": "JEHN4M1N", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1601292556, "asctime": "Mon Sep 28 20:29:16 2020 KST" }, "temperature": { "current": 0, "drive_trip": 0 } } ERRO[0000] smartctl returned an error code (4) while processing sdc type=metrics ERRO[0000] smartctl detected a checksum error type=metrics INFO[0000] Publishing smartctl results for unknown type=metrics INFO[0001] Main: Completed type=metrics [email protected]:/# `

    After running, I can only see /dev/sda in the web UI and it has no details (SMART reports as failed).

    I'm running this on a QNAP TS453Be.

    Thanks,

    bug waiting for response 
    opened by paulmorabito 25
  • [BUG] Cron not working

    [BUG] Cron not working

    Describe the bug The cron job doesn't not work (anymore) The collector / webapp combo still works if I trigger it manually, but for some reason the cron job doesn't work anymore (it was working before) I haven't changed my setup (apart from update the images) and I can't see anything wrong on the logs.... Happy to provide you with logs or anything that you need...

    Expected behavior The webapp should update daily with new metrics

    Current behavior The webapp does not get updated unless you run scrutiny-collector-metrics run in each collector

    bug waiting for response 
    opened by ryck 22
  • [BUG] Collector: pfsense

    [BUG] Collector: pfsense "Command not found."

    Describe the bug When running the collector from pfsense shell, as admin or root, after making it exectuable, I get the error "Command not found." To be fair, I'm not fluent with *BSD, so I'm not sure if this is even possible on pfsense.

    Expected behavior Collector should run.

    Screenshots NA

    Log Files

    [2.5.2-RELEASE][[email protected]pfsense]/opt/scrutiny/bin: ls -l
    total 5
    -rwxrwxrwx  1 root  wheel  648 Nov  8 11:08 scrutiny-collector-metrics-freebsd-amd64
    
    [2.5.2-RELEASE][[email protected]]/opt/scrutiny/bin: scrutiny-collector-metrics-freebsd-amd64 
    scrutiny-collector-metrics-freebsd-amd64: Command not found.
    

    and

    [2.5.2-RELEASE][[email protected]]/opt/scrutiny/bin: su root
    # scrutiny-collector-metrics-freebsd-amd64 
    su: scrutiny-collector-metrics-freebsd-amd64: not found
    
    bug waiting for response 
    opened by BadCo-NZ 21
  • [BUG] No Data for LSI MegaRaid /dev/bus/0

    [BUG] No Data for LSI MegaRaid /dev/bus/0

    Describe the bug I tried configuring my LSI Megaraid Controller with Scrutiny by using the following configurations. I can't add /dev/bus/0 to the devices directly because then i get: ERROR: for scrutiny Cannot start service scrutiny: error gathering device information while adding custom device "/dev/bus/0": no such file or directory

    What am i doing wrong?

    docker-compose.yml

    version: '3.5'
    services:
      scrutiny:
        container_name: scrutiny
        image: ghcr.io/analogj/scrutiny:master-omnibus
        cap_add:
          - SYS_RAWIO
        volumes:
          - /run/udev:/run/udev:ro
          - /storage/volumes/scrutiny-config:/opt/scrutiny/config
          - /storage/volumes/scrutiny-data:/opt/scrutiny/influxdb
        devices:
          - "/dev/sda"
          - "/dev/sdb"
          - "/dev/bus" # /dev/bus/0 can not be found because its a pseudo device
    

    and

    collector.yaml

    version: 1
    host:
      id: ""
    devices:
      - device: /dev/sda
        type: "scsi"
      - device: /dev/sdb
        type: "scsi"
      - device: /dev/bus/0
        type:
          - megaraid,1
          - megaraid,2
          - megaraid,3
          - megaraid,6
          - megaraid,7
          - megaraid,8
    

    Expected behavior Should detect 6 raid disks and one physical disk.

    Screenshots image

    Smartctl scan

    {
      "json_format_version": [
        1,
        0
      ],
      "smartctl": {
        "version": [
          7,
          3
        ],
        "svn_revision": "5338",
        "platform_info": "x86_64-linux-5.17.0-2-amd64",
        "build_info": "(local build)",
        "argv": [
          "smartctl",
          "--scan",
          "-j"
        ],
        "exit_status": 0
      },
      "devices": [
        {
          "name": "/dev/sda",
          "info_name": "/dev/sda",
          "type": "scsi",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/sdb",
          "info_name": "/dev/sdb",
          "type": "scsi",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_01]",
          "type": "megaraid,1",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_02]",
          "type": "megaraid,2",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_03]",
          "type": "megaraid,3",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_06]",
          "type": "megaraid,6",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_07]",
          "type": "megaraid,7",
          "protocol": "SCSI"
        },
        {
          "name": "/dev/bus/0",
          "info_name": "/dev/bus/0 [megaraid_disk_08]",
          "type": "megaraid,8",
          "protocol": "SCSI"
        }
      ]
    }
    
    bug waiting for response 
    opened by Alfagun74 20
  • [FEAT] Add support to FreeBSD

    [FEAT] Add support to FreeBSD

    Hi, I'm running several FreeNAS systems and this tool looks great. The problem is that FreeNAS is running on FreeBSD and from what I have seen it is not currently possible to run on FreeBSD. And FreeNAS seems to be among the systems best suited for such a tool. I would love to know if there is an idea in the near future to support FreeBSD systems etc. Thanks Itay By the way, is there a plan for adding Dark Mode?

    enhancement good first issue waiting for response 
    opened by Itay1787 19
  • Testing influx version and get password and username not found

    Testing influx version and get password and username not found

    This is run on a new docker container now settings from old version where left.

    `Docker:~/docker-compose$ docker logs scrutiny [s6-init] making user provided files available at /var/run/s6/etc...exited 0. [s6-init] ensuring user provided files have correct perms...exited 0. [fix-attrs.d] applying ownership & permissions fixes... [fix-attrs.d] done. [cont-init.d] executing container initialization scripts... [cont-init.d] 01-timezone: executing... [cont-init.d] 01-timezone: exited 0. [cont-init.d] 50-config: executing... [cont-init.d] 50-config: exited 0. [cont-init.d] done. [services.d] starting services waiting for influxdb waiting for scrutiny service to start starting cron [services.d] done. starting influxdb influxdb not ready scrutiny api not ready ts=2022-05-08T15:12:02.531330Z lvl=info msg="Welcome to InfluxDB" log_id=0aL3UZtl000 version=v2.2.0 commit=a2f8538837 build_date=2022-04-06T17:36:40Z ts=2022-05-08T15:12:02.535848Z lvl=info msg="Resources opened" log_id=0aL3UZtl000 service=bolt path=/scrutiny/influxdb/influxd.bolt ts=2022-05-08T15:12:02.535900Z lvl=info msg="Resources opened" log_id=0aL3UZtl000 service=sqlite path=/scrutiny/influxdb/influxd.sqlite ts=2022-05-08T15:12:02.536757Z lvl=info msg="Bringing up metadata migrations" log_id=0aL3UZtl000 service="KV migrations" migration_count=19 ts=2022-05-08T15:12:02.615243Z lvl=info msg="Bringing up metadata migrations" log_id=0aL3UZtl000 service="SQL migrations" migration_count=5 ts=2022-05-08T15:12:02.629517Z lvl=info msg="Using data dir" log_id=0aL3UZtl000 service=storage-engine service=store path=/scrutiny/influxdb/engine/data ts=2022-05-08T15:12:02.629583Z lvl=info msg="Compaction settings" log_id=0aL3UZtl000 service=storage-engine service=store max_concurrent_compactions=8 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648 ts=2022-05-08T15:12:02.629595Z lvl=info msg="Open store (start)" log_id=0aL3UZtl000 service=storage-engine service=store op_name=tsdb_open op_event=start ts=2022-05-08T15:12:02.629632Z lvl=info msg="Open store (end)" log_id=0aL3UZtl000 service=storage-engine service=store op_name=tsdb_open op_event=end op_elapsed=0.039ms ts=2022-05-08T15:12:02.629654Z lvl=info msg="Starting retention policy enforcement service" log_id=0aL3UZtl000 service=retention check_interval=30m ts=2022-05-08T15:12:02.629659Z lvl=info msg="Starting precreation service" log_id=0aL3UZtl000 service=shard-precreation check_interval=10m advance_period=30m ts=2022-05-08T15:12:02.630081Z lvl=info msg="Starting query controller" log_id=0aL3UZtl000 service=storage-reads concurrency_quota=1024 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=1024 ts=2022-05-08T15:12:02.631198Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0aL3UZtl000 max_select_point=0 max_select_series=0 max_select_buckets=0 ts=2022-05-08T15:12:02.636081Z lvl=info msg=Listening log_id=0aL3UZtl000 service=tcp-listener transport=http addr=:8086 port=8086 scrutiny api not ready starting scrutiny 2022/05/08 09:12:07 No configuration file found at /scrutiny/config/scrutiny.yaml. Using Defaults. time="2022-05-08T09:12:07-06:00" level=info msg="Trying to connect to scrutiny sqlite db: \n"


    / ) / )( _ ( )( )( )( )( ( )( / ) _ ( ( ) / )()( )( )( ) ( \ / (/ _)()_)() () ()()_) () github.com/AnalogJ/scrutiny dev-0.3.12

    Start the scrutiny server [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

    • using env: export GIN_MODE=release
    • using code: gin.SetMode(gin.ReleaseMode)

    time="2022-05-08T09:12:07-06:00" level=info msg="Successfully connected to scrutiny sqlite db: \n" panic: a username and password is required for a setup

    goroutine 1 [running]: github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129e540, 0xc000114078, 0x12a3720, 0xc000482230, 0x129e5c0) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6 github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000113290, 0x12a3720, 0xc000482230, 0x1) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xcf github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000113290, 0x0, 0x0) /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:91 +0x234 main.main.func2(0xc00011b380, 0x4, 0x6) /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198 github.com/urfave/cli/v2.(*Command).Run(0xc000484480, 0xc00011b200, 0x0, 0x0) /go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0 github.com/urfave/cli/v2.(*App).RunContext(0xc000102600, 0x128d440, 0xc0001a8010, 0xc0001a0020, 0x2, 0x2, 0x0, 0x0) /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814 github.com/urfave/cli/v2.(*App).Run(...) /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215 main.main() /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a waiting for influxdb starting scrutiny 2022/05/08 09:12:07 No configuration file found at /scrutiny/config/scrutiny.yaml. Using Defaults. `

    bug waiting for response 
    opened by woolmonkey 18
  • [FEAT] Add Instructions for Bring-your-own-InfluxDB with restricted access token

    [FEAT] Add Instructions for Bring-your-own-InfluxDB with restricted access token

    I want to use my existing influxdb in the same server, but it is not working and I need some help.

    This is my compose file:

     scrutiny:
       image: ghcr.io/analogj/scrutiny:master-omnibus
       container_name: scrutiny
       cap_add:
         - SYS_RAWIO
       volumes:
         - /docker/scrutiny:/opt/scrutiny/config
         - /run/udev:/run/udev:ro
       ports:
         - 8017:8080
       devices:
         - /dev/sda:/dev/sda
         - /dev/sdb:/dev/sdb
         - /dev/sdc:/dev/sdc
         - /dev/sdd:/dev/sdd
         - /dev/sde:/dev/sde
       restart: unless-stopped   
    

    This is my scrutiny.yaml file

    log:
      file: ""
      level: INFO
    notify:
      urls: []
    web:
      database:
        location: /opt/scrutiny/config/scrutiny.db
      influxdb:
        bucket: scrutiny
        host: MY-SERVER-LOCAL-IP
        org: MYORG
        port: "8086"
        retention_policy: true
        token: MY-TOKEN
      listen:
        basepath: ""
        host: 0.0.0.0
        port: "8080"
      src:
        frontend:
          path: /opt/scrutiny/web
    

    I have configured a new bucket in my influxdb instance called scrutiny and a new API token with read and write permissions in that bucket.

    In logs, there is this error:

    panic: organization 'MYORG' not found
    
    goroutine 1 [running]:
    github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware(0x129f920, 0xc000408088, 0x12a4b00, 0xc000471650, 0x129f9a0)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:14 +0xe6
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000405ad0, 0x12a4b00, 0xc000471650, 0x14)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xd8
    github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000405ad0, 0x0, 0x0)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:97 +0x234
    main.main.func2(0xc00040f400, 0x4, 0x6)
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:112 +0x198
    github.com/urfave/cli/v2.(*Command).Run(0xc0004737a0, 0xc00040f280, 0x0, 0x0)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:164 +0x4e0
    github.com/urfave/cli/v2.(*App).RunContext(0xc000484000, 0x128e820, 0xc000130010, 0xc000126020, 0x2, 0x2, 0x0, 0x0)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:306 +0x814
    github.com/urfave/cli/v2.(*App).Run(...)
    	/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:215
    main.main()
    	/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:137 +0x65a
    

    It seems it cannot found my organization? The name is correct, I have other services working fine with that influxedb instance.

    I am missing something?

    documentation waiting for response 
    opened by goliath888 15
  • Tutorial: SMART Monitoring with Scrutiny across machines

    Tutorial: SMART Monitoring with Scrutiny across machines

    S.M.A.R.T. Monitoring with Scrutiny across machines

    drawing-3-1671744407

    🤔 The problem:

    Scrutiny offers a nice Docker package called "Omnibus" that can monitor HDDs attached to a Docker host with relative ease. Scrutiny can also be installed in a Hub-Spoke layout where Web interface, Database and Collector come in 3 separate packages. The official documentation assumes that the spokes in the "Hub-Spokes layout" run Docker, which is not always the case. The third approach is to install Scrutiny manually, entirely outside of Docker.

    💡 The solution:

    This tutorial provides a hybrid configuration where the Hub lives in a Docker instance while the spokes have only Scrutiny Collector installed manually. The Collector periodically send data to the Hub. It's not mind-boggling hard to understand but someone might struggle with the setup. This is for them.

    🖥️ My setup:

    I have a Proxmox cluster where one VM runs Docker and all monitoring services - Grafana, Prometheus, various exporters, InfluxDB and so forth. Another VM runs the NAS - OpenMediaVault v6, where all hard drives reside. The Scrutiny Collector is triggered every 30min to collect data on the drives. The data is sent to the Docker VM, running InfluxDB.

    Setting up the Hub

    drawing-3-1671744714

    The Hub consists of Scrutiny Web - a web interface for viewing the SMART data. And InfluxDB, where the smartmon data is stored.

    🔗This is the official Hub-Spoke layout in docker-compose. We are going to reuse parts of it. The ENV variables provide the necessary configuration for the initial setup, both for InfluxDB and Scrutiny.

    If you are working with and existing InfluxDB instance, you can forgo all the INIT variables as they already exist.

    The official Scrutiny documentation has a sample scrutiny.yamlfile that normally contains the connection and notification details but I always find it easier to configure as much as possible in the docker-compose.

    version: "3.4"
    
    networks:
      monitoring:       # A common network for all monitoring services to communicate into
        external: true
      notifications:    # To Gotify or another Notification service
        external: true
    
    services:
      influxdb:
        container_name: influxdb
        image: influxdb:2.1-alpine
        ports:
          - 8086:8086
        volumes:
          - ${DIR_CONFIG}/influxdb2/db:/var/lib/influxdb2
          - ${DIR_CONFIG}/influxdb2/config:/etc/influxdb2
        environment:
          - DOCKER_INFLUXDB_INIT_MODE=setup
          - DOCKER_INFLUXDB_INIT_USERNAME=Admin
          - DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD}
          - DOCKER_INFLUXDB_INIT_ORG=homelab
          - DOCKER_INFLUXDB_INIT_BUCKET=scrutiny
          - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=your-very-secret-token
        restart: unless-stopped
        networks:
          - monitoring
    
      scrutiny:
        container_name: scrutiny
        image: ghcr.io/analogj/scrutiny:master-web
        ports:
          - 8080:8080
        volumes:
          - ${DIR_CONFIG}/scrutiny/config:/opt/scrutiny/config
        environment:
          - SCRUTINY_WEB_INFLUXDB_HOST=influxdb
          - SCRUTINY_WEB_INFLUXDB_PORT=8086
          - SCRUTINY_WEB_INFLUXDB_TOKEN=your-very-secret-token
          - SCRUTINY_WEB_INFLUXDB_ORG=homelab
          - SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
          # Optional but highly recommended to notify you in case of a problem
          - SCRUTINY_WEB_NOTIFY_URLS=["http://gotify:80/message?token=a-gotify-token"]
        depends_on:
          - influxdb
        restart: unless-stopped
        networks:
          - notifications
          - monitoring
    

    A freshly initialized Scrutiny instance can be accessed on port 8080, eg. 192.168.0.100:8080. The interface will be empty because no metrics have been collected yet.

    Setting up a Spoke without Docker

    drawing-3-1671744208

    A spoke consists of the Scrutiny Collector binary that is run on a set interval via crontab and sends the data to the Hub. The official documentation describes the manual setup of the Collector - dependencies and step by step commands. I have a shortened version that does the same thing but in one line of code.

    # Installing dependencies
    apt install smartmontools -y 
    
    # 1. Create directory for the binary
    # 2. Download the binary into that directory
    # 3. Make it exacutable
    # 4. List the contents of the library for confirmation
    mkdir -p /opt/scrutiny/bin && \
    curl -L https://github.com/AnalogJ/scrutiny/releases/download/v0.5.0/scrutiny-collector-metrics-linux-amd64 > /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 && \
    chmod +x /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 && \
    ls -lha /opt/scrutiny/bin
    

    When downloading Github Release Assests, make sure that you have the correct version. The provided example is with Release v0.5.0. [The release list can be found here.](https://github.com/analogj/scrutiny/releases)

    Once the Collector is installed, you can run it with the following command. Make sure to add the correct address and port of your Hub as --api-endpoint.

    /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://192.168.0.100:8080"
    

    This will run the Collector once and populate the Web interface of your Scrutiny instance. In order to collect metrics for a time series, you need to run the command repeatedly. Here is an example for crontab, running the Collector every 15min.

    # open crontab
    crontab -e
    
    # add a line for Scrutiny
    */15 * * * * /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://192.168.0.100:8080"
    

    The Collector has its own independent config file that lives in /opt/scrutiny/config/collector.yaml but I did not find a need to modify it. A default collector.yaml can be found in the official documentation.

    Setting up a Spoke with Docker

    drawing-3-1671744277

    Setting up a remote Spoke in Docker requires you to split the official Hub-Spoke layout docker-compose.yml. In the following docker-compose you need to provide the ${API_ENDPOINT}, in my case http://192.168.0.100:8080. Also all drives that you wish to monitor need to be presented to the container under devices.

    The image handles the periodic scanning of the drives.

    version: "3.4"
    
    services:
    
      collector:
        image: 'ghcr.io/analogj/scrutiny:master-collector'
        cap_add:
          - SYS_RAWIO
        volumes:
          - '/run/udev:/run/udev:ro'
        environment:
          COLLECTOR_API_ENDPOINT: ${API_ENDPOINT}
        devices:
          - "/dev/sda"
          - "/dev/sdb"
    
    opened by TinJoy59 0
  • [BUG] No temps logged between hours of 0100 and 1700

    [BUG] No temps logged between hours of 0100 and 1700

    Describe the bug Scrutiny doesn't log temperatures between the hours of 0100 and 1700. It DOES log temperatures hourly from 1700-0100.

    Expected behavior Temperatures are logged hourly, 24 hours a day.

    Log Files I'm running hub-and-spoke, with one spoke on the same machine as the hub and one spoke on a remote machine connected via VPN. Hub is an RPI 4, spoke is an RPI 3, both running aarch64 Arch Linux. Log files are attached for each, and the output of /api/summary. rpi3-collector.log rpi4-collector.log api-summary.txt

    docker info RPI 4 (hub and spoke) Client: Context: default Debug Mode: false Plugins: compose: Docker Compose (Docker Inc., 2.13.0)

    Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 3 Server Version: 20.10.21 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m runc version: init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.19.7-1-aarch64-ARCH Operating System: Arch Linux ARM OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 7.614GiB Name: rpi4 ID: TGJZ:IXFZ:ZOAI:3AKS:D3JZ:FOIE:UT6S:QRTI:Y6DW:EZR7:77FE:KXKH Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

    RPI 3 (spoke) Client: Context: default Debug Mode: false Plugins: compose: Docker Compose (Docker Inc., 2.13.0)

    Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 1 Server Version: 20.10.21 Storage Driver: devicemapper Pool Name: docker-179:2-1443081-pool Pool Blocksize: 65.54kB Base Device Size: 10.74GB Backing Filesystem: ext4 Udev Sync Supported: true Data file: /dev/loop0 Metadata file: /dev/loop1 Data loop file: /var/lib/docker/devicemapper/devicemapper/data Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Data Space Used: 465.5MB Data Space Total: 107.4GB Data Space Available: 28.3GB Metadata Space Used: 17.79MB Metadata Space Total: 2.147GB Metadata Space Available: 2.13GB Thin Pool Minimum Free Space: 10.74GB Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.187 (2022-11-10) Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661.m runc version: init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.19.8-1-aarch64-ARCH Operating System: Arch Linux ARM OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 894.5MiB Name: rpi3 ID: 4ZJX:WUJX:XF3A:L4NJ:3QTR:U6RP:QZBQ:PYCD:TWPX:QDCE:IDYJ:FWII Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

    bug 
    opened by trevorgross 0
  • [FEAT] mdadm RAID status

    [FEAT] mdadm RAID status

    Is your feature request related to a problem? Please describe.

    It would be awesome if Scrutiny could show the basic status of any standard mdadm RAID devices.

    Describe the solution you'd like

    If mdadm RAID array is detected it could be shown similar to an individual hard drive with the basic raid health information, e.g.

    • Device name (e.g. /dev/md0)
    • RAID Typology (1,6,1+0 etc...)
    • Disks in the array (could even link to the physical disks already shown)
      • Disks active in the array
      • Disks marked as failed in the array
      • Disks spare in the array
    • RAID rebuild rate
    • Bitmap enabled (t/f)
    • Which disk has which sync set (e.g. set-A /dev/sdb1)

    Example mdstat output:

    cat /proc/mdstat
    Personalities : [raid10]
    md0 : active raid10 sde1[4] sdf1[5] sdc1[2] sdb1[0] sdd1[1] sda1[3]
          29298911232 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
          bitmap: 0/110 pages [0KB], 131072KB chunk
    

    Example mdadm --detail output:

    mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Fri Jul  5 18:06:18 2019
            Raid Level : raid10
            Array Size : 29298911232 (27.29 TiB 30.00 TB)
         Used Dev Size : 9766303744 (9.10 TiB 10.00 TB)
          Raid Devices : 6
         Total Devices : 6
           Persistence : Superblock is persistent
    
         Intent Bitmap : Internal
    
           Update Time : Thu Dec 15 11:03:24 2022
                 State : clean
        Active Devices : 6
       Working Devices : 6
        Failed Devices : 0
         Spare Devices : 0
    
                Layout : near=2
            Chunk Size : 512K
    
    Consistency Policy : bitmap
    
                  Name : nas:0  (local to host nas)
                  UUID : 18a081b2:0264df3b:88465833:1bf34071
                Events : 315607
    
        Number   Major   Minor   RaidDevice State
           0       8       17        0      active sync set-A   /dev/sdb1
           1       8       49        1      active sync set-B   /dev/sdd1
           2       8       33        2      active sync set-A   /dev/sdc1
           3       8        1        3      active sync set-B   /dev/sda1
           5       8       81        4      active sync set-A   /dev/sdf1
           4       8       65        5      active sync set-B   /dev/sde1
    
    opened by sammcj 0
  • Bump express from 4.17.1 to 4.18.2 in /webapp/frontend

    Bump express from 4.17.1 to 4.18.2 in /webapp/frontend

    Bumps express from 4.17.1 to 4.18.2.

    Release notes

    Sourced from express's releases.

    4.18.2

    4.18.1

    • Fix hanging on large stack of sync routes

    4.18.0

    ... (truncated)

    Changelog

    Sourced from express's changelog.

    4.18.2 / 2022-10-08

    4.18.1 / 2022-04-29

    • Fix hanging on large stack of sync routes

    4.18.0 / 2022-04-25

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • [FEAT] Add the possibility to run and submit metrics immediately/with delay after collector startup

    [FEAT] Add the possibility to run and submit metrics immediately/with delay after collector startup

    Is your feature request related to a problem? Please describe. When you deploy the collector, you have no feedback that it really works besides the lonely message starting cron. You can play with the cron schedule to test it faster then change it back which is quite some useless work. You can also access the container and run it manually scrutiny-collector-metrics run which is also useless extra work.

    Describe the solution you'd like It would be better if the collector runs once immediately at deployment to provide an immediate feedback

    Well, maybe run it after a certain delay 20-30s since in swarm mode, the depends_on is not supported so if it starts immediately it might be unable to reach the end point which is still starting up).

    ENV_VAR can be used as well to enable the first run as well as fix the delay duration (in seconds).

    Additional context -/-

    opened by Enissay 0
  • Bump qs from 6.5.2 to 6.5.3 in /webapp/frontend

    Bump qs from 6.5.2 to 6.5.3 in /webapp/frontend

    Bumps qs from 6.5.2 to 6.5.3.

    Changelog

    Sourced from qs's changelog.

    6.5.3

    • [Fix] parse: ignore __proto__ keys (#428)
    • [Fix] utils.merge: avoid a crash with a null target and a truthy non-array source
    • [Fix] correctly parse nested arrays
    • [Fix] stringify: fix a crash with strictNullHandling and a custom filter/serializeDate (#279)
    • [Fix] utils: merge: fix crash when source is a truthy primitive & no options are provided
    • [Fix] when parseArrays is false, properly handle keys ending in []
    • [Fix] fix for an impossible situation: when the formatter is called with a non-string value
    • [Fix] utils.merge: avoid a crash with a null target and an array source
    • [Refactor] utils: reduce observable [[Get]]s
    • [Refactor] use cached Array.isArray
    • [Refactor] stringify: Avoid arr = arr.concat(...), push to the existing instance (#269)
    • [Refactor] parse: only need to reassign the var once
    • [Robustness] stringify: avoid relying on a global undefined (#427)
    • [readme] remove travis badge; add github actions/codecov badges; update URLs
    • [Docs] Clean up license text so it’s properly detected as BSD-3-Clause
    • [Docs] Clarify the need for "arrayLimit" option
    • [meta] fix README.md (#399)
    • [meta] add FUNDING.yml
    • [actions] backport actions from main
    • [Tests] always use String(x) over x.toString()
    • [Tests] remove nonexistent tape option
    • [Dev Deps] backport from main
    Commits
    • 298bfa5 v6.5.3
    • ed0f5dc [Fix] parse: ignore __proto__ keys (#428)
    • 691e739 [Robustness] stringify: avoid relying on a global undefined (#427)
    • 1072d57 [readme] remove travis badge; add github actions/codecov badges; update URLs
    • 12ac1c4 [meta] fix README.md (#399)
    • 0338716 [actions] backport actions from main
    • 5639c20 Clean up license text so it’s properly detected as BSD-3-Clause
    • 51b8a0b add FUNDING.yml
    • 45f6759 [Fix] fix for an impossible situation: when the formatter is called with a no...
    • f814a7f [Dev Deps] backport from main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
Releases(v0.5.0)
Owner
Jason Kulatunga
Devops/Automation guy. I build tools so you don't have to. I build things. Then I break them. Most of the time I fix them again.
Jason Kulatunga
An Open Source video surveillance management system for people making this world a safer place.

Kerberos Open Source Docker Hub | Documentation | Website Kerberos Open source (v3) is a cutting edge video surveillance management system made availa

Kerberos.io 351 Dec 30, 2022
A hello world project with NES.css and Netlify Functions

A hello world project powered by NES.css and Netlify Functions. The frontend part is a simple HTML page showing a progress bar. The input is received

Jang Rush 0 Jan 9, 2022
A flexible process data collection, metrics, monitoring, instrumentation, and tracing client library for Go

Package monkit is a flexible code instrumenting and data collection library. See documentation at https://godoc.org/gopkg.in/spacemonkeygo/monkit.v3 S

Space Monkey Go 470 Dec 14, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 46.1k Dec 31, 2022
A GNU/Linux monitoring and profiling tool focused on single processes.

Uroboros is a GNU/Linux monitoring tool focused on single processes. While utilities like top, ps and htop provide great overall details, they often l

Simone Margaritelli 650 Dec 26, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

InfluxData 2.2k Dec 26, 2022
rtop is an interactive, remote system monitoring tool based on SSH

rtop rtop is a remote system monitor. It connects over SSH to a remote system and displays vital system metrics (CPU, disk, memory, network). No speci

RapidLoop 2k Dec 30, 2022
distributed monitoring system

OWL OWL 是由国内领先的第三方数据智能服务商 TalkingData 开源的一款企业级分布式监控告警系统,目前由 Tech Operation Team 持续开发更新维护。 OWL 后台组件全部使用 Go 语言开发,Go 语言是 Google 开发的一种静态强类型、编译型、并发型,并具有垃圾回

null 826 Dec 24, 2022
Ping monitoring engine used in https://ping.gg

Disclaimer: If you are new to Go this is not a good place to learn best practices, the code is not very idiomatic and there's probably a few bad ideas

null 424 Dec 22, 2022
Simple and extensible monitoring agent / library for Kubernetes: https://gravitational.com/blog/monitoring_kubernetes_satellite/

Satellite Satellite is an agent written in Go for collecting health information in a kubernetes cluster. It is both a library and an application. As a

Teleport 197 Nov 10, 2022
A system and resource monitoring tool written in Golang!

Grofer A clean and modern system and resource monitor written purely in golang using termui and gopsutil! Currently compatible with Linux only. Curren

PES Open Source Community 248 Jan 8, 2023
An open-source and enterprise-level monitoring system.

Falcon+ Documentations Usage Open-Falcon API Prerequisite Git >= 1.7.5 Go >= 1.6 Getting Started Docker Please refer to ./docker/README.md. Build from

Open-Falcon 7k Jan 1, 2023
Distributed simple and robust release management and monitoring system.

Agente Distributed simple and robust release management and monitoring system. **This project on going work. Road map Core system First worker agent M

StreetByters Community 30 Nov 17, 2022
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Hamed Yousefi 40 Nov 10, 2022
checkah is an agentless SSH system monitoring and alerting tool.

CHECKAH checkah is an agentless SSH system monitoring and alerting tool. Features: agentless check over SSH (password, keyfile, agent) config file bas

deadc0de 9 Oct 14, 2022
mtail - extract internal monitoring data from application logs for collection into a timeseries database

mtail - extract internal monitoring data from application logs for collection into a timeseries database mtail is a tool for extracting metrics from a

Google 3.4k Dec 29, 2022
SigNoz 4.7k Sep 24, 2021
Detecctor is a ⚡ fast, fully customizable 💗 monitoring platform. It uses Telegram as a notification 📥 service

Detecctor is a ⚡ fast, fully customizable ?? monitoring platform. It uses Telegram as a notification ?? service. The main components are a TCP server, MongoDB and multiple clients.

null 2 Nov 16, 2021
Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system.

Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system. Cloudprobe

null 242 Dec 30, 2022