$30 off During Our Annual Pro Sale. View Details »

5 years of Clojure

5 years of Clojure

A talk subtitled "building better infrastructure with parentheses" given at Clojure Dutch Days 2018

Pierre-Yves Ritschard

April 21, 2018
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Programming

Transcript

  1. 5 YEARS OF CLOJURE
    5 YEARS OF CLOJURE
    PIERRE-YVES RITSCHARD (
    PIERRE-YVES RITSCHARD ( )
    )
    @PYR
    @PYR
    1

    View Slide

  2. HALLO
    HALLO
    : Three-line Bio
    CTO & Co-founder at
    Distributed systems and monitoring enthusiast
    Open-Source developer
    Clojure Libraries, OpenBSD, Riemann, Collectd, and
    more.
    @pyr
    Exoscale
    2 . 1

    View Slide

  3. 5 YEARS OF CLOJURE
    5 YEARS OF CLOJURE
    Building better infrastructure with parentheses
    3 . 1

    View Slide

  4. EXOSCALE
    EXOSCALE
    Infrastructure as a service
    Zones in Frankfurt, Vienna, Zürich,
    Geneva
    4 . 1

    View Slide

  5. EXOSCALE
    EXOSCALE
    5 . 1

    View Slide

  6. EXOSCALE
    EXOSCALE
    provider "exoscale" {
    api_key = "${var.exoscale_api_key}"
    secret_key = "${var.exoscale_secret_key}"
    }
    resource "exoscale_instance" "web" {
    template = "Ubuntu 17.04"
    disk_size = "50g"
    profile = "medium"
    ssh_key = "production"
    }
    6 . 1

    View Slide

  7. I THOUGHT THIS WAS A CLOJURE
    I THOUGHT THIS WAS A CLOJURE
    CONFERENCE!
    CONFERENCE!
    7 . 1

    View Slide

  8. WHAT'S IN A CLOUD PROVIDER
    WHAT'S IN A CLOUD PROVIDER
    Datacenter
    operations
    So ware
    development
    8 . 1

    View Slide

  9. SOFTWARE AT EXOSCALE
    SOFTWARE AT EXOSCALE
    Virtual machine instance
    orchestrator
    Object storage controller
    Network controller (SDN)
    Customer management
    Metering system
    Billing
    Web portal
    9 . 1

    View Slide

  10. ISN'T ALL OF THIS BASH, PERL, AND
    ISN'T ALL OF THIS BASH, PERL, AND
    YAML?
    YAML?
    10 . 1

    View Slide

  11. CLOJURE NOT AN OBVIOUS CHOICE
    CLOJURE NOT AN OBVIOUS CHOICE
    The JVM had/has bad press with infrastructure folk
    11 . 1

    View Slide

  12. CLOJURE AT EXOSCALE: A TIMELINE
    CLOJURE AT EXOSCALE: A TIMELINE
    12 . 1

    View Slide

  13. 2012: THE EARLY DAYS
    2012: THE EARLY DAYS
    13 . 1

    View Slide

  14. WE STARTED WITH
    WE STARTED WITH
    3 people
    A bit of time
    A product
    idea
    14 . 1

    View Slide

  15. A DIFFERENT CLOUD PROVIDER
    A DIFFERENT CLOUD PROVIDER
    Not yet another virtual datacenter product
    Integration with automation tooling
    Integration in language-specific libraries
    Focus on horizontally-scalable
    applications
    Local storage
    Security groups
    15 . 1

    View Slide

  16. THINGS THAT DIDN'T EXIST IN 2012
    THINGS THAT DIDN'T EXIST IN 2012
    Ansible
    Terraform
    Docker
    16 . 1

    View Slide

  17. THINGS THAT DIDN'T EXIST IN 2012
    THINGS THAT DIDN'T EXIST IN 2012
    Television
    Wifi
    17 . 1

    View Slide

  18. OUR MINIMAL STACK
    OUR MINIMAL STACK
    Apache Cloudstack
    Puppet
    Good old MySQL
    A third-party customer management
    tool
    Python + AngularJS
    Riemann
    18 . 1

    View Slide

  19. OUR MINIMAL STACK
    OUR MINIMAL STACK
    19 . 1

    View Slide

  20. RIEMANN
    RIEMANN
    The common saying back then was monitoring
    sucks
    Push-based model was a great fit for our use case
    Riemann was in a rough state back then
    A great opportunity to contribute
    20 . 1

    View Slide

  21. 2013: GOING LIVE
    2013: GOING LIVE
    21 . 1

    View Slide

  22. BACKEND DEVELOPERS DOING
    BACKEND DEVELOPERS DOING
    FRONTEND
    FRONTEND
    22 . 1

    View Slide

  23. THINGS OUR EARLY ADOPTERS ENJOYED
    THINGS OUR EARLY ADOPTERS ENJOYED
    Vagrant support
    Security groups instead of
    firewalling
    A public IP per instance
    23 . 1

    View Slide

  24. IMPROVING RELEASE AUTOMATION
    IMPROVING RELEASE AUTOMATION
    24 . 1

    View Slide

  25. WARP
    WARP
    25 . 1

    View Slide

  26. WARP
    WARP
    26 . 1

    View Slide

  27. WARP
    WARP
    Open Source
    TLS client certificate-based authentication
    IRC support
    Haskell Go agent
    Prefigured our inclination for Clojure at the orchestration
    layer
    27 . 1

    View Slide

  28. TROUBLE KICKS IN
    TROUBLE KICKS IN
    Late payments
    Bitcoin mining on free
    credit
    28 . 1

    View Slide

  29. SOLVING ABUSE
    SOLVING ABUSE
    Need to pull data from a bunch of
    places
    Standard FSM type of problem
    29 . 1

    View Slide

  30. A NEW FAVORITE:
    A NEW FAVORITE: CORE.MATCH
    CORE.MATCH
    (match [state new-state unpaid-invoices?]
    [:ok :warning _ ] :warn!
    [:ok :critical _ ] :suspend!
    [:warning :critical _ ] :suspend!
    [:warning :ok _ ] :active!
    [:critical :ok false ] :active!
    [:critical :warning false ] :active!
    [_ _ _ ] nil)
    30 . 1

    View Slide

  31. SOME THINGS WE LEARNED
    SOME THINGS WE LEARNED
    Running Clojure processes in good old cron is
    perfect
    Logback's logging context is a huge plus
    31 . 1

    View Slide

  32. 2014: THE YEAR OF STORAGE
    2014: THE YEAR OF STORAGE
    32 . 1

    View Slide

  33. OBJECT STORAGE
    OBJECT STORAGE
    The obvious choice for our crowd
    Architecturally simpler than distributed block storage
    A good complement to our local storage backed
    instances
    33 . 1

    View Slide

  34. OBJECT STORAGE NEEDS
    OBJECT STORAGE NEEDS
    S3 is the sole player in that field: we need API
    compatibility
    The only alternative at the time was bad HTTP extensions
    34 . 1

    View Slide

  35. OBJECT STORAGE IN THE WILD
    OBJECT STORAGE IN THE WILD
    Ceph
    Riak-CS
    Swi
    Costly vendor-backed
    solutions
    35 . 1

    View Slide

  36. WRITING AN OBJECT STORE
    WRITING AN OBJECT STORE
    We focused on how to store large objects
    Tempted by a description of the (non-OpenSource) approach by
    Datastax on top of Cassandra
    36 . 1

    View Slide

  37. CHOOSING CASSANDRA
    CHOOSING CASSANDRA
    Great library support, thanks @mpenet!
    Simple for us to operate
    Very few moving parts
    Our implementation could remain fully
    stateless
    37 . 1

    View Slide

  38. WE WERE (ALMOST) YOUNG AND (WAY
    WE WERE (ALMOST) YOUNG AND (WAY
    TOO) NAIVE
    TOO) NAIVE
    How are could it be?
    38 . 1

    View Slide

  39. WHAT WE DIDN'T ANTICIPATE
    WHAT WE DIDN'T ANTICIPATE
    It's not all about actual data storage
    The S3 API is a beast
    The S3 API is under specified
    The S3 API is not versioned
    The S3 API client landscape is a
    mess
    39 . 1

    View Slide

  40. A QUICK DIGRESSION: S3 REQUESTS
    A QUICK DIGRESSION: S3 REQUESTS
    Operation: put object foo in bucket bar:
    PUT /foo
    Host bar.sos-ch-dk-2.exo.io
    Authorization: AWS ....
    <...>
    40 . 1

    View Slide

  41. A QUICK DIGRESSION: S3 REQUESTS
    A QUICK DIGRESSION: S3 REQUESTS
    Operation: update acl for object foo in bucket bar:
    PUT /foo?acl
    Host bar.sos-ch-dk-2.exo.io
    Authorization: AWS ....
    X-Amz-ACL: bucket-owner-full-control
    41 . 1

    View Slide

  42. A QUICK DIGRESSION: S3 REQUESTS
    A QUICK DIGRESSION: S3 REQUESTS
    Operation: Copy object bim from bucket bam to object foo in
    bucket bar:
    PUT /foo
    Host bar.sos-ch-dk-2.exo.io
    Authorization: AWS ....
    X-Amz-Copy-Source: /bim/bam
    X-Amz-Copy-Source-If-Unmodified-Since: ARE YOU KIDDING ME?
    42 . 1

    View Slide

  43. BY THE WAY
    BY THE WAY
    Storing terrabytes of data on off-the-shelf hardware doesn't come
    by easy either
    Input and output payloads of arbitrary lengths aren't easy
    Compojure, Ring, and usual suspects are out
    43 . 1

    View Slide

  44. SOME THINGS WE LEARNED
    SOME THINGS WE LEARNED
    This was our largest application to date
    Component didn't exist
    We built a hacky similar thing based on plain maps
    Maintenance of the application starts becoming an
    issue
    Maps can lead to threading malformed data for a while
    44 . 1

    View Slide

  45. 2015: SCALING UP
    2015: SCALING UP
    45 . 1

    View Slide

  46. THINGS ARE RUNNING SMOOTHLY
    THINGS ARE RUNNING SMOOTHLY
    Load on the platform is increasing
    We have a lot of event generating
    systems
    Tons of logs
    Tongs of metrics
    46 . 1

    View Slide

  47. WE CAN'T DO EVERYTHING WITH CRON
    WE CAN'T DO EVERYTHING WITH CRON
    So we install a Kafka cluster
    47 . 1

    View Slide

  48. WHY KAFKA?
    WHY KAFKA?
    Partition-isolated
    consistency
    Disaggregating memory
    48 . 1

    View Slide

  49. WHY KAFKA?
    WHY KAFKA?
    49 . 1

    View Slide

  50. A FIRST CANDIDATE: BANDWIDTH
    A FIRST CANDIDATE: BANDWIDTH
    METERING
    METERING
    Traffic accounting on hypervisors, with a small C
    agent
    30 second aggregates sent over to Kafka
    A Clojure Kafka consumer on the other end
    50 . 1

    View Slide

  51. KEY TAKEWAY
    KEY TAKEWAY
    Non-glue Clojure code is around 150 loc
    Altogether around 500 lines
    It seems as though Clojure was written to write Kafka
    consumers
    51 . 1

    View Slide

  52. THIS HAMMER NEEDS NEW NAILS
    THIS HAMMER NEEDS NEW NAILS
    We have a recurring issue with DNS updates and need more
    flexibility building zones
    52 . 1

    View Slide

  53. AN EXPERIMENT: BLOG POST DRIVEN
    AN EXPERIMENT: BLOG POST DRIVEN
    DEVELOPMENT
    DEVELOPMENT

    View Slide

  54. 53 . 1

    View Slide

  55. LOG COMPACTION
    LOG COMPACTION
    54 . 1

    View Slide

  56. LOG COMPACTON
    LOG COMPACTON
    55 . 1

    View Slide

  57. KALZONE: DYNAMIC DNS WITH KAFKA
    KALZONE: DYNAMIC DNS WITH KAFKA
    Works great across a large number of clients
    Great foundation for more infrastructure inventory
    solutions
    Kafka log compaction is a huge plus
    56 . 1

    View Slide

  58. 2016: FAST GROWTH
    2016: FAST GROWTH
    57 . 1

    View Slide

  59. SECURED FUNDING IN LATE 2015
    SECURED FUNDING IN LATE 2015
    58 . 1

    View Slide

  60. USE OF PROCEEDS
    USE OF PROCEEDS
    People
    A new
    datacenter
    59 . 1

    View Slide

  61. SELLING ON THE WEB
    SELLING ON THE WEB
    We simplify our online
    funnel
    A drip process
    60 . 1

    View Slide

  62. DRIP PROCESS
    DRIP PROCESS
    core.match to the rescue
    again
    Yet another reason to write a
    cron
    61 . 1

    View Slide

  63. BILLING ISSUES
    BILLING ISSUES
    The cron based approach to billing is showing its limit
    Hard to keep it at a hourly rate because it takes too
    long
    62 . 1

    View Slide

  64. AT A CROSSROADS
    AT A CROSSROADS
    63 . 1

    View Slide

  65. AT A CROSSROADS
    AT A CROSSROADS
    64 . 1

    View Slide

  66. AT A CROSSROADS
    AT A CROSSROADS
    65 . 1

    View Slide

  67. KAFKA TO THE RESCUE
    KAFKA TO THE RESCUE
    A full rewrite of our billing
    stack
    Sub 1k loc
    66 . 1

    View Slide

  68. KEY TAKEWAYS
    KEY TAKEWAYS
    Incredible reliability
    The system can weather temporary failures with no billing
    impact
    Transducers fit in perfectly with Kafka
    We wrote a few of our own
    67 . 1

    View Slide

  69. 2017: TOO MUCH DATA
    2017: TOO MUCH DATA
    68 . 1

    View Slide

  70. SUDDEN S3 PICKUP IN USAGE
    SUDDEN S3 PICKUP IN USAGE
    Our initial implementation limits the
    throughput
    Tail latencies go through the roof
    Cassandra is just not great at doing dense
    nodes
    We knew this going in
    We hit the wall hard
    69 . 1

    View Slide

  71. WE NEED A NUMBER OF NEW API
    WE NEED A NUMBER OF NEW API
    CAPABILITIES
    CAPABILITIES
    V4 signatures are becoming the norm for S3
    Better ACL support is needed
    The docker registry exercises all weird properties of the
    API
    70 . 1

    View Slide

  72. WE FIND A GOOD PAPER
    WE FIND A GOOD PAPER
    Ambry attacks the same problem
    space
    The paper lays out a great strategy
    71 . 1

    View Slide

  73. LET'S WRITE A DISTRIBUTED SYSTEM
    LET'S WRITE A DISTRIBUTED SYSTEM
    FROM SCRATCH
    FROM SCRATCH
    What could go wrong?
    72 . 1

    View Slide

  74. BETTING ON
    BETTING ON CORE.ASYNC
    CORE.ASYNC
    To better understand netty internals we settle on writing our own
    facade
    This brings less baggage than aleph
    A storage agent in C
    Zookeeper for agent discovery
    We keep Cassandra for metadata storage
    73 . 1

    View Slide

  75. NEW THINGS
    NEW THINGS
    Component
    Spec
    A larger reagent frontend
    app
    74 . 1

    View Slide

  76. UI
    UI
    75 . 1

    View Slide

  77. KEY LEARNINGS
    KEY LEARNINGS
    Component is our go-to daemon structuring tool
    Netty is hard
    Reconciling byte buffer manipulation with the immutable
    Clojure world can be tricky
    Transducers were a life saver against memory leaks
    Test on sequences
    Runs against core.async channels
    Spec helps a lot with reliability and maintenance
    We still don't do enough generative testing
    76 . 1

    View Slide

  78. 2018: WORLD DOMINATION!
    2018: WORLD DOMINATION!
    77 . 1

    View Slide

  79. OUR CURRENT STATE
    OUR CURRENT STATE
    78 . 1

    View Slide

  80. GOOD CORE LIBRARIES
    GOOD CORE LIBRARIES
    Unilog
    Kinsky
    Net
    Reporter
    Raven
    Uncaught
    Signal
    79 . 1

    View Slide

  81. WHAT WE'RE MISSING
    WHAT WE'RE MISSING
    A good daemon template
    Some goverance around our
    library
    A clojure for systems
    developement
    80 . 1

    View Slide

  82. BUILDING ON KUBERNETES
    BUILDING ON KUBERNETES
    We previously bet on Mesos
    Recent changes make running Clojure apps on Kubernetes nice
    and easy
    Upcoming library for configuration of Kubernetes applications
    Upcoming library to build Kubernetes controllers in Clojure
    81 . 1

    View Slide

  83. AN API GATEWAY
    AN API GATEWAY
    The frontdoor to our infrastructure
    Leverages all our work around asynchronous
    networking
    A great way to put spec to work
    Will give us great capabilities to do smart RBAC
    82 . 1

    View Slide

  84. FRONTEND
    FRONTEND
    We use it for internal tooling already
    It's time to switch our main console
    Re-frame gives us great confidence in making the
    jump
    83 . 1

    View Slide

  85. LOOKING BACK
    LOOKING BACK
    84 . 1

    View Slide

  86. WHAT WE DON'T DO IN CLOJURE
    WHAT WE DON'T DO IN CLOJURE
    SQL-backed APIs
    Low-level
    development
    85 . 1

    View Slide

  87. THE USUAL QUESTIONS
    THE USUAL QUESTIONS
    Community
    Hiring
    86 . 1

    View Slide

  88. THANKS
    THANKS
    We need help building all of
    this!
    87 . 1

    View Slide