Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heartbeats and Healthchecks

Heartbeats and Healthchecks

In this presentation, I explain what patterns to keep in mind to build fault-tolerant edge computing devices.

This version of the talk was given at ConFoo Montreal in February 2024.


Companion Code: github.com/workloads/baedge

Kerim Satirli

February 22, 2024


eInk conference badge


This repository has all the code (and information) you need to get started with building your own eInk conference badge.

Nomad Pack for eInk conference badge


This repository has the [Nomad Pack](https://github.com/hashicorp/nomad-pack) configuration for the `baedge` application server.

More Decks by Kerim Satirli

Other Decks in Programming


  1. computing that takes place at or near the physical location

    of the producer or consumer of data. Similar: noun point of presence mobile datacenter edge com·put·ing
  2. Noise on the Net Spectrum Scan (2.4 GHz) 08:24 08:48

    09:12 09:36 10:00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Channels
  3. Suggested Solutions store data with device- specific IDs to prevent

    reconciling conflicts retry and back-off with sensible intervals, don't just increase the delay implement code that allows a control server to push config changes unique IDs back-off push config implement code that runs in offline / delayed connectivity situations local-first
  4. job "baedge" { group "server" { disconnect { ttl =

    "4h" stop_after = "2h" replace = true reconcile = "keep_original" } } } baedge.nomad.hcl Offline "Easy-Mode"
  5. Suggested Solutions get feedback from people not involved in the

    building process subject to physical stress to understand operational impact buy devices from multiple vendors to account for revisions don't train throw 'em test broadly deploy devices in real environments often, avoid mocked stages field early
  6. Suggested Solutions limit access credentials on devices to absolute bare

    minimum automatically rotate secrets, and expire rotated secrets audit access logs early and often, use data to make informed choices limit rotate audit physically seal and disconnect ports you don't actively use. seal
  7. job "baedge" { group "server" { identity { name =

    "baedge" aud = [ "oidc.baedge.local", ] file = true ttl = "30m" } } } baedge.nomad.hcl Rotating Credentials
  8. "Nomad" screen "Baedge" screen ConFoo Attendee Allocation: 1a2b3c4d Address:

    Version: 1.7.5 Nomad Runtime ConFoo Attendee Model: 2in7b Revision: v2 {Ba,E}dge Hardware