Beyang Liu on June 8, 2016
Today, Sourcegraph is excited to announce Checkup, a simple tool that lets you easily create distributed, self-hosted health checks and status pages.
The Checkup status page works out of the box
Monitoring uptime is a crucial part of running any web service. It lets you sleep well at night knowing that you’ll be notified if an outage occurs. There are many existing SaaS health check services that offer a range of features, but when we tried them at Sourcegraph, we found none that fully met our needs. Here were a few of the pain points:
At Sourcegraph, we grew tired of fighting our health checks to accomplish tasks we thought should have been straightforward. Why did we have to define our checks using a thick JavaScript web client or some clunky API? Why couldn’t they be defined in a simple config file and perhaps even versioned with the code itself?
We talked to Matt Holt, creator of the Caddy web server, and found we weren't alone. So we decided to sponsor Matt to create a tool that would do health checks the way we thought they should be done as developers.
Health checks should be as easy to create and maintain as unit tests. We wanted an interface that lets you easily say, “Here are a bunch of URLs I want to test. Here’s the expected behavior for each.” It seemed to us that the best interface for declaring these was not a GUI that forced you to point and click, but a config file. Here’s an example checkup.json:
{ "checkers": [{ "type": "http", "endpoint_name": "Website", "endpoint_url": "https://sourcegraph.com", "attempts": 5 }], "storage": { "provider": "s3", "access_key_id": "<yours>", "secret_access_key": "<yours>", "bucket": "<yours>" } }
You simply specify a list of endpoints in JSON and provide Checkup access to an S3 bucket. (Checkup can automatically provision this for you if you’re not familiar with the AWS control panel.)
Then all you need to do to check the health of your endpoints is run
**$ checkup** == Website - https://sourcegraph.com Threshold: 0 Max: 136.296933ms Min: 37.716659ms Median: 51.626374ms Mean: 65.212206ms All: [{54.489828ms } {45.93124ms } {51.626374ms } {136.296933ms } {37.716659ms }] Assessment: healthy
You can have Checkup upload this data to your S3 bucket with
$ checkup --store
or have it run every 10 minutes with
$ checkup every 10m ^C
And with Checkup and Caddy, you get a nice, simple status page like the one above that works out of the box and pulls data live from your S3 bucket.
Because running your health checks is just a simple terminal command, you can now run them in development and CI — just like unit tests. Oftentimes, endpoints fail simply because someone on the team pushed a bug. Now you can use your health checks to catch these errors in the testing phase.
To get geographically distributed checks, you simply run Checkup from multiple AWS regions. Checkup works smoothly on the cheapest EC2 micro instances. Checkup can easily be extended to work with other cloud computing providers or even custom storage services your company uses internally. The code is open source, well-documented, and pull requests are welcome.
We’re releasing Checkup, because we think it will save a lot of time and frustration for many engineers. We hope others will find it useful and extend it so that together we can make uptime monitoring simpler and more hassle-free. Try it out and let us know what you think!