5 years of Clojure

5 years of Clojure

A talk subtitled "building better infrastructure with parentheses" given at Clojure Dutch Days 2018

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

April 21, 2018
Tweet

Transcript

  1. 5 YEARS OF CLOJURE 5 YEARS OF CLOJURE PIERRE-YVES RITSCHARD

    ( PIERRE-YVES RITSCHARD ( ) ) @PYR @PYR 1
  2. HALLO HALLO : Three-line Bio CTO & Co-founder at Distributed

    systems and monitoring enthusiast Open-Source developer Clojure Libraries, OpenBSD, Riemann, Collectd, and more. @pyr Exoscale 2 . 1
  3. 5 YEARS OF CLOJURE 5 YEARS OF CLOJURE Building better

    infrastructure with parentheses 3 . 1
  4. EXOSCALE EXOSCALE Infrastructure as a service Zones in Frankfurt, Vienna,

    Zürich, Geneva 4 . 1
  5. EXOSCALE EXOSCALE 5 . 1

  6. EXOSCALE EXOSCALE provider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key =

    "${var.exoscale_secret_key}" } resource "exoscale_instance" "web" { template = "Ubuntu 17.04" disk_size = "50g" profile = "medium" ssh_key = "production" } 6 . 1
  7. I THOUGHT THIS WAS A CLOJURE I THOUGHT THIS WAS

    A CLOJURE CONFERENCE! CONFERENCE! 7 . 1
  8. WHAT'S IN A CLOUD PROVIDER WHAT'S IN A CLOUD PROVIDER

    Datacenter operations So ware development 8 . 1
  9. SOFTWARE AT EXOSCALE SOFTWARE AT EXOSCALE Virtual machine instance orchestrator

    Object storage controller Network controller (SDN) Customer management Metering system Billing Web portal 9 . 1
  10. ISN'T ALL OF THIS BASH, PERL, AND ISN'T ALL OF

    THIS BASH, PERL, AND YAML? YAML? 10 . 1
  11. CLOJURE NOT AN OBVIOUS CHOICE CLOJURE NOT AN OBVIOUS CHOICE

    The JVM had/has bad press with infrastructure folk 11 . 1
  12. CLOJURE AT EXOSCALE: A TIMELINE CLOJURE AT EXOSCALE: A TIMELINE

    12 . 1
  13. 2012: THE EARLY DAYS 2012: THE EARLY DAYS 13 .

    1
  14. WE STARTED WITH WE STARTED WITH 3 people A bit

    of time A product idea 14 . 1
  15. A DIFFERENT CLOUD PROVIDER A DIFFERENT CLOUD PROVIDER Not yet

    another virtual datacenter product Integration with automation tooling Integration in language-specific libraries Focus on horizontally-scalable applications Local storage Security groups 15 . 1
  16. THINGS THAT DIDN'T EXIST IN 2012 THINGS THAT DIDN'T EXIST

    IN 2012 Ansible Terraform Docker 16 . 1
  17. THINGS THAT DIDN'T EXIST IN 2012 THINGS THAT DIDN'T EXIST

    IN 2012 Television Wifi 17 . 1
  18. OUR MINIMAL STACK OUR MINIMAL STACK Apache Cloudstack Puppet Good

    old MySQL A third-party customer management tool Python + AngularJS Riemann 18 . 1
  19. OUR MINIMAL STACK OUR MINIMAL STACK 19 . 1

  20. RIEMANN RIEMANN The common saying back then was monitoring sucks

    Push-based model was a great fit for our use case Riemann was in a rough state back then A great opportunity to contribute 20 . 1
  21. 2013: GOING LIVE 2013: GOING LIVE 21 . 1

  22. BACKEND DEVELOPERS DOING BACKEND DEVELOPERS DOING FRONTEND FRONTEND 22 .

    1
  23. THINGS OUR EARLY ADOPTERS ENJOYED THINGS OUR EARLY ADOPTERS ENJOYED

    Vagrant support Security groups instead of firewalling A public IP per instance 23 . 1
  24. IMPROVING RELEASE AUTOMATION IMPROVING RELEASE AUTOMATION 24 . 1

  25. WARP WARP 25 . 1

  26. WARP WARP 26 . 1

  27. WARP WARP Open Source TLS client certificate-based authentication IRC support

    Haskell Go agent Prefigured our inclination for Clojure at the orchestration layer 27 . 1
  28. TROUBLE KICKS IN TROUBLE KICKS IN Late payments Bitcoin mining

    on free credit 28 . 1
  29. SOLVING ABUSE SOLVING ABUSE Need to pull data from a

    bunch of places Standard FSM type of problem 29 . 1
  30. A NEW FAVORITE: A NEW FAVORITE: CORE.MATCH CORE.MATCH (match [state

    new-state unpaid-invoices?] [:ok :warning _ ] :warn! [:ok :critical _ ] :suspend! [:warning :critical _ ] :suspend! [:warning :ok _ ] :active! [:critical :ok false ] :active! [:critical :warning false ] :active! [_ _ _ ] nil) 30 . 1
  31. SOME THINGS WE LEARNED SOME THINGS WE LEARNED Running Clojure

    processes in good old cron is perfect Logback's logging context is a huge plus 31 . 1
  32. 2014: THE YEAR OF STORAGE 2014: THE YEAR OF STORAGE

    32 . 1
  33. OBJECT STORAGE OBJECT STORAGE The obvious choice for our crowd

    Architecturally simpler than distributed block storage A good complement to our local storage backed instances 33 . 1
  34. OBJECT STORAGE NEEDS OBJECT STORAGE NEEDS S3 is the sole

    player in that field: we need API compatibility The only alternative at the time was bad HTTP extensions 34 . 1
  35. OBJECT STORAGE IN THE WILD OBJECT STORAGE IN THE WILD

    Ceph Riak-CS Swi Costly vendor-backed solutions 35 . 1
  36. WRITING AN OBJECT STORE WRITING AN OBJECT STORE We focused

    on how to store large objects Tempted by a description of the (non-OpenSource) approach by Datastax on top of Cassandra 36 . 1
  37. CHOOSING CASSANDRA CHOOSING CASSANDRA Great library support, thanks @mpenet! Simple

    for us to operate Very few moving parts Our implementation could remain fully stateless 37 . 1
  38. WE WERE (ALMOST) YOUNG AND (WAY WE WERE (ALMOST) YOUNG

    AND (WAY TOO) NAIVE TOO) NAIVE How are could it be? 38 . 1
  39. WHAT WE DIDN'T ANTICIPATE WHAT WE DIDN'T ANTICIPATE It's not

    all about actual data storage The S3 API is a beast The S3 API is under specified The S3 API is not versioned The S3 API client landscape is a mess 39 . 1
  40. A QUICK DIGRESSION: S3 REQUESTS A QUICK DIGRESSION: S3 REQUESTS

    Operation: put object foo in bucket bar: PUT /foo Host bar.sos-ch-dk-2.exo.io Authorization: AWS .... <...> 40 . 1
  41. A QUICK DIGRESSION: S3 REQUESTS A QUICK DIGRESSION: S3 REQUESTS

    Operation: update acl for object foo in bucket bar: PUT /foo?acl Host bar.sos-ch-dk-2.exo.io Authorization: AWS .... X-Amz-ACL: bucket-owner-full-control 41 . 1
  42. A QUICK DIGRESSION: S3 REQUESTS A QUICK DIGRESSION: S3 REQUESTS

    Operation: Copy object bim from bucket bam to object foo in bucket bar: PUT /foo Host bar.sos-ch-dk-2.exo.io Authorization: AWS .... X-Amz-Copy-Source: /bim/bam X-Amz-Copy-Source-If-Unmodified-Since: ARE YOU KIDDING ME? 42 . 1
  43. BY THE WAY BY THE WAY Storing terrabytes of data

    on off-the-shelf hardware doesn't come by easy either Input and output payloads of arbitrary lengths aren't easy Compojure, Ring, and usual suspects are out 43 . 1
  44. SOME THINGS WE LEARNED SOME THINGS WE LEARNED This was

    our largest application to date Component didn't exist We built a hacky similar thing based on plain maps Maintenance of the application starts becoming an issue Maps can lead to threading malformed data for a while 44 . 1
  45. 2015: SCALING UP 2015: SCALING UP 45 . 1

  46. THINGS ARE RUNNING SMOOTHLY THINGS ARE RUNNING SMOOTHLY Load on

    the platform is increasing We have a lot of event generating systems Tons of logs Tongs of metrics 46 . 1
  47. WE CAN'T DO EVERYTHING WITH CRON WE CAN'T DO EVERYTHING

    WITH CRON So we install a Kafka cluster 47 . 1
  48. WHY KAFKA? WHY KAFKA? Partition-isolated consistency Disaggregating memory 48 .

    1
  49. WHY KAFKA? WHY KAFKA? 49 . 1

  50. A FIRST CANDIDATE: BANDWIDTH A FIRST CANDIDATE: BANDWIDTH METERING METERING

    Traffic accounting on hypervisors, with a small C agent 30 second aggregates sent over to Kafka A Clojure Kafka consumer on the other end 50 . 1
  51. KEY TAKEWAY KEY TAKEWAY Non-glue Clojure code is around 150

    loc Altogether around 500 lines It seems as though Clojure was written to write Kafka consumers 51 . 1
  52. THIS HAMMER NEEDS NEW NAILS THIS HAMMER NEEDS NEW NAILS

    We have a recurring issue with DNS updates and need more flexibility building zones 52 . 1
  53. AN EXPERIMENT: BLOG POST DRIVEN AN EXPERIMENT: BLOG POST DRIVEN

    DEVELOPMENT DEVELOPMENT
  54. 53 . 1

  55. LOG COMPACTION LOG COMPACTION 54 . 1

  56. LOG COMPACTON LOG COMPACTON 55 . 1

  57. KALZONE: DYNAMIC DNS WITH KAFKA KALZONE: DYNAMIC DNS WITH KAFKA

    Works great across a large number of clients Great foundation for more infrastructure inventory solutions Kafka log compaction is a huge plus 56 . 1
  58. 2016: FAST GROWTH 2016: FAST GROWTH 57 . 1

  59. SECURED FUNDING IN LATE 2015 SECURED FUNDING IN LATE 2015

    58 . 1
  60. USE OF PROCEEDS USE OF PROCEEDS People A new datacenter

    59 . 1
  61. SELLING ON THE WEB SELLING ON THE WEB We simplify

    our online funnel A drip process 60 . 1
  62. DRIP PROCESS DRIP PROCESS core.match to the rescue again Yet

    another reason to write a cron 61 . 1
  63. BILLING ISSUES BILLING ISSUES The cron based approach to billing

    is showing its limit Hard to keep it at a hourly rate because it takes too long 62 . 1
  64. AT A CROSSROADS AT A CROSSROADS 63 . 1

  65. AT A CROSSROADS AT A CROSSROADS 64 . 1

  66. AT A CROSSROADS AT A CROSSROADS 65 . 1

  67. KAFKA TO THE RESCUE KAFKA TO THE RESCUE A full

    rewrite of our billing stack Sub 1k loc 66 . 1
  68. KEY TAKEWAYS KEY TAKEWAYS Incredible reliability The system can weather

    temporary failures with no billing impact Transducers fit in perfectly with Kafka We wrote a few of our own 67 . 1
  69. 2017: TOO MUCH DATA 2017: TOO MUCH DATA 68 .

    1
  70. SUDDEN S3 PICKUP IN USAGE SUDDEN S3 PICKUP IN USAGE

    Our initial implementation limits the throughput Tail latencies go through the roof Cassandra is just not great at doing dense nodes We knew this going in We hit the wall hard 69 . 1
  71. WE NEED A NUMBER OF NEW API WE NEED A

    NUMBER OF NEW API CAPABILITIES CAPABILITIES V4 signatures are becoming the norm for S3 Better ACL support is needed The docker registry exercises all weird properties of the API 70 . 1
  72. WE FIND A GOOD PAPER WE FIND A GOOD PAPER

    Ambry attacks the same problem space The paper lays out a great strategy 71 . 1
  73. LET'S WRITE A DISTRIBUTED SYSTEM LET'S WRITE A DISTRIBUTED SYSTEM

    FROM SCRATCH FROM SCRATCH What could go wrong? 72 . 1
  74. BETTING ON BETTING ON CORE.ASYNC CORE.ASYNC To better understand netty

    internals we settle on writing our own facade This brings less baggage than aleph A storage agent in C Zookeeper for agent discovery We keep Cassandra for metadata storage 73 . 1
  75. NEW THINGS NEW THINGS Component Spec A larger reagent frontend

    app 74 . 1
  76. UI UI 75 . 1

  77. KEY LEARNINGS KEY LEARNINGS Component is our go-to daemon structuring

    tool Netty is hard Reconciling byte buffer manipulation with the immutable Clojure world can be tricky Transducers were a life saver against memory leaks Test on sequences Runs against core.async channels Spec helps a lot with reliability and maintenance We still don't do enough generative testing 76 . 1
  78. 2018: WORLD DOMINATION! 2018: WORLD DOMINATION! 77 . 1

  79. OUR CURRENT STATE OUR CURRENT STATE 78 . 1

  80. GOOD CORE LIBRARIES GOOD CORE LIBRARIES Unilog Kinsky Net Reporter

    Raven Uncaught Signal 79 . 1
  81. WHAT WE'RE MISSING WHAT WE'RE MISSING A good daemon template

    Some goverance around our library A clojure for systems developement 80 . 1
  82. BUILDING ON KUBERNETES BUILDING ON KUBERNETES We previously bet on

    Mesos Recent changes make running Clojure apps on Kubernetes nice and easy Upcoming library for configuration of Kubernetes applications Upcoming library to build Kubernetes controllers in Clojure 81 . 1
  83. AN API GATEWAY AN API GATEWAY The frontdoor to our

    infrastructure Leverages all our work around asynchronous networking A great way to put spec to work Will give us great capabilities to do smart RBAC 82 . 1
  84. FRONTEND FRONTEND We use it for internal tooling already It's

    time to switch our main console Re-frame gives us great confidence in making the jump 83 . 1
  85. LOOKING BACK LOOKING BACK 84 . 1

  86. WHAT WE DON'T DO IN CLOJURE WHAT WE DON'T DO

    IN CLOJURE SQL-backed APIs Low-level development 85 . 1
  87. THE USUAL QUESTIONS THE USUAL QUESTIONS Community Hiring 86 .

    1
  88. THANKS THANKS We need help building all of this! 87

    . 1