HALLO HALLO : Three-line Bio CTO & Co-founder at Distributed systems and monitoring enthusiast Open-Source developer Clojure Libraries, OpenBSD, Riemann, Collectd, and more. @pyr Exoscale 2 . 1
A DIFFERENT CLOUD PROVIDER A DIFFERENT CLOUD PROVIDER Not yet another virtual datacenter product Integration with automation tooling Integration in language-specific libraries Focus on horizontally-scalable applications Local storage Security groups 15 . 1
RIEMANN RIEMANN The common saying back then was monitoring sucks Push-based model was a great fit for our use case Riemann was in a rough state back then A great opportunity to contribute 20 . 1
THINGS OUR EARLY ADOPTERS ENJOYED THINGS OUR EARLY ADOPTERS ENJOYED Vagrant support Security groups instead of firewalling A public IP per instance 23 . 1
WARP WARP Open Source TLS client certificate-based authentication IRC support Haskell Go agent Prefigured our inclination for Clojure at the orchestration layer 27 . 1
OBJECT STORAGE OBJECT STORAGE The obvious choice for our crowd Architecturally simpler than distributed block storage A good complement to our local storage backed instances 33 . 1
OBJECT STORAGE NEEDS OBJECT STORAGE NEEDS S3 is the sole player in that field: we need API compatibility The only alternative at the time was bad HTTP extensions 34 . 1
WRITING AN OBJECT STORE WRITING AN OBJECT STORE We focused on how to store large objects Tempted by a description of the (non-OpenSource) approach by Datastax on top of Cassandra 36 . 1
CHOOSING CASSANDRA CHOOSING CASSANDRA Great library support, thanks @mpenet! Simple for us to operate Very few moving parts Our implementation could remain fully stateless 37 . 1
WHAT WE DIDN'T ANTICIPATE WHAT WE DIDN'T ANTICIPATE It's not all about actual data storage The S3 API is a beast The S3 API is under specified The S3 API is not versioned The S3 API client landscape is a mess 39 . 1
A QUICK DIGRESSION: S3 REQUESTS A QUICK DIGRESSION: S3 REQUESTS Operation: put object foo in bucket bar: PUT /foo Host bar.sos-ch-dk-2.exo.io Authorization: AWS .... <...> 40 . 1
A QUICK DIGRESSION: S3 REQUESTS A QUICK DIGRESSION: S3 REQUESTS Operation: Copy object bim from bucket bam to object foo in bucket bar: PUT /foo Host bar.sos-ch-dk-2.exo.io Authorization: AWS .... X-Amz-Copy-Source: /bim/bam X-Amz-Copy-Source-If-Unmodified-Since: ARE YOU KIDDING ME? 42 . 1
BY THE WAY BY THE WAY Storing terrabytes of data on off-the-shelf hardware doesn't come by easy either Input and output payloads of arbitrary lengths aren't easy Compojure, Ring, and usual suspects are out 43 . 1
SOME THINGS WE LEARNED SOME THINGS WE LEARNED This was our largest application to date Component didn't exist We built a hacky similar thing based on plain maps Maintenance of the application starts becoming an issue Maps can lead to threading malformed data for a while 44 . 1
THINGS ARE RUNNING SMOOTHLY THINGS ARE RUNNING SMOOTHLY Load on the platform is increasing We have a lot of event generating systems Tons of logs Tongs of metrics 46 . 1
A FIRST CANDIDATE: BANDWIDTH A FIRST CANDIDATE: BANDWIDTH METERING METERING Traffic accounting on hypervisors, with a small C agent 30 second aggregates sent over to Kafka A Clojure Kafka consumer on the other end 50 . 1
KEY TAKEWAY KEY TAKEWAY Non-glue Clojure code is around 150 loc Altogether around 500 lines It seems as though Clojure was written to write Kafka consumers 51 . 1
KALZONE: DYNAMIC DNS WITH KAFKA KALZONE: DYNAMIC DNS WITH KAFKA Works great across a large number of clients Great foundation for more infrastructure inventory solutions Kafka log compaction is a huge plus 56 . 1
BILLING ISSUES BILLING ISSUES The cron based approach to billing is showing its limit Hard to keep it at a hourly rate because it takes too long 62 . 1
KEY TAKEWAYS KEY TAKEWAYS Incredible reliability The system can weather temporary failures with no billing impact Transducers fit in perfectly with Kafka We wrote a few of our own 67 . 1
SUDDEN S3 PICKUP IN USAGE SUDDEN S3 PICKUP IN USAGE Our initial implementation limits the throughput Tail latencies go through the roof Cassandra is just not great at doing dense nodes We knew this going in We hit the wall hard 69 . 1
WE NEED A NUMBER OF NEW API WE NEED A NUMBER OF NEW API CAPABILITIES CAPABILITIES V4 signatures are becoming the norm for S3 Better ACL support is needed The docker registry exercises all weird properties of the API 70 . 1
BETTING ON BETTING ON CORE.ASYNC CORE.ASYNC To better understand netty internals we settle on writing our own facade This brings less baggage than aleph A storage agent in C Zookeeper for agent discovery We keep Cassandra for metadata storage 73 . 1
KEY LEARNINGS KEY LEARNINGS Component is our go-to daemon structuring tool Netty is hard Reconciling byte buffer manipulation with the immutable Clojure world can be tricky Transducers were a life saver against memory leaks Test on sequences Runs against core.async channels Spec helps a lot with reliability and maintenance We still don't do enough generative testing 76 . 1
BUILDING ON KUBERNETES BUILDING ON KUBERNETES We previously bet on Mesos Recent changes make running Clojure apps on Kubernetes nice and easy Upcoming library for configuration of Kubernetes applications Upcoming library to build Kubernetes controllers in Clojure 81 . 1
AN API GATEWAY AN API GATEWAY The frontdoor to our infrastructure Leverages all our work around asynchronous networking A great way to put spec to work Will give us great capabilities to do smart RBAC 82 . 1
FRONTEND FRONTEND We use it for internal tooling already It's time to switch our main console Re-frame gives us great confidence in making the jump 83 . 1