The Hund Blog

Status Pages and Monitoring Services Should Be the Same Thing

And that’s why we are launching our own monitoring service.

Since 2015, one of Hund’s original goals has been to provide a native monitoring solution for customers to monitor the health of their services. We feel that automation is key to a useful status page. Your status page should be the source of truth for service health. If your status page does not immediately update when something goes wrong, then your users begin to lose trust in it.

Today, we are very pleased to announce the general availability of our in-house monitoring solution, which is available at no additional charge.

Functionality

Monitoring is redundant across five introductory global regions: Seattle, WA; Dallas, TX; Piscataway, NJ; London, UK; and Sydney, AU. All of these locations provide IPv6 support and have redundant 10-gigabit networks with low-latency.

Monitoring Regions Map

Checks are available for ICMP (ping), HTTP/S, and DNS with full support for IPv6. ICMP and HTTP/S checks report metrics for each of these locations.

Checks may run at any frequency, up to a maximum frequency of every 30 seconds, providing double the standard data resolution while halving the time of outage detection.

HTTP/S checks provide many useful metrics for system administrators, including redirect time, name lookup time, TCP connection time, TLS handshake time, content generation time, content transfer time, total elapsed time, and time to first byte (TTFB).

Our DNS check provides an intuitive interface for asserting records yielded by queries for a variety of record types including A, AAAA, MX, NS, TXT, SRV, and PTR. The check also comes equipped with an SOA record check that properly validates serial consistency across nameservers.

Design Choices

We’ve paid great attention to detail to provide a robust, efficient, and reliable service from the application design to selecting infrastructure providers with high SLAs and low-latency, redundant gigabit networks.

Our entire monitoring infrastructure is written in Elixir and runs on Erlang/OTP, which provides us fault tolerance, hot-swapping functionality, efficient concurrency, high-availability, and first-class transparent distribution. Erlang allows us to seamlessly distribute our platform across the globe, as well as assign specific roles to each node in the network. Independently-running workers, data collectors, and servers exchange information in soft real-time to provide status updates as fast as possible.

On the monitoring side of the application, we chose libcurl because it’s reliable, fast, extensive, well tested, provides IPv6 support, and it’s used in some of the world’s largest high volume applications.

We chose MongoDB as the database for the platform due to its fault tolerance, high performance, and high availability. MongoDB is great for writing and querying large volumes of metric data, and we’ve had excellent results using it as the database for our status pages.

We take data loss seriously. To ensure data generated by workers is never lost before it can be committed to the database, we took extensive measures to save data locally the moment any data-carrying node is unable to send its data to the next receiver on the way to the database. Nodes writing to the database will keep all saved data until they are absolutely guaranteed that the data has made it safely to MongoDB.

Furthermore, the database itself is replicated across secondary MongoDB nodes, and the raw data is continuously backed up to prevent any other form of data loss, as we already do for our status page database.

We hope our monitoring service is useful for you. If you have any feedback or questions, let us know!

You can try Hund for free today, no credit card required.