Our company runs hundreds of servers with thousands of clients connecting to them every minute. This translates to thousands of HTTP and DNS endpoints that need to be reachable at all times. In order to monitor these channels we developed Rapsheet a monitoring tool that performs continuous testing of all channels against a variety of third party infrastructure providers. Rapsheet uses asyncio libraries such as aiodns and aiohttp to manage bulk requests and queries.
DNS queries are performed against separate DNS servers such as OpenDNS, CloudFlare and Google's DNS servers. HTTP requests check for availability and actual content correctness. (As a bonus, HTTP and Domains are also constantly checked against third party blacklist providers like Google's SafeBrowsing db).
The other thing that’s different about Rapsheet is its philosophy. It doesn’t do fancy graphs or pretty world-map overlays (even though we do project it on an office wall). Rapsheet is built on the principle of preventing alarm deafness. (Alarm deafness comes from alarms being in an alert state for long periods of time). Rapsheet's chief goal was to strive towards a zero state measured, so any metric in a non-zero state would be an obvious call to action.
A very basic understanding of DNS and HTTP would be useful for this talk. We also think that the general thinking on logging, alerting and alert blindness will be useful for all devs pushing out code that needs to be monitored at scale. Rapsheet is now publically available on Dockerhub and this talk is a crash course on its background thinking, design and functionality.