5 YEARS OF CLOJURE
5 YEARS OF CLOJURE
PIERRE-YVES RITSCHARD (
PIERRE-YVES RITSCHARD ( )
)
@PYR
@PYR
1
Slide 2
Slide 2 text
HALLO
HALLO
: Three-line Bio
CTO & Co-founder at
Distributed systems and monitoring enthusiast
Open-Source developer
Clojure Libraries, OpenBSD, Riemann, Collectd, and
more.
@pyr
Exoscale
2 . 1
Slide 3
Slide 3 text
5 YEARS OF CLOJURE
5 YEARS OF CLOJURE
Building better infrastructure with parentheses
3 . 1
Slide 4
Slide 4 text
EXOSCALE
EXOSCALE
Infrastructure as a service
Zones in Frankfurt, Vienna, Zürich,
Geneva
4 . 1
I THOUGHT THIS WAS A CLOJURE
I THOUGHT THIS WAS A CLOJURE
CONFERENCE!
CONFERENCE!
7 . 1
Slide 8
Slide 8 text
WHAT'S IN A CLOUD PROVIDER
WHAT'S IN A CLOUD PROVIDER
Datacenter
operations
So ware
development
8 . 1
Slide 9
Slide 9 text
SOFTWARE AT EXOSCALE
SOFTWARE AT EXOSCALE
Virtual machine instance
orchestrator
Object storage controller
Network controller (SDN)
Customer management
Metering system
Billing
Web portal
9 . 1
Slide 10
Slide 10 text
ISN'T ALL OF THIS BASH, PERL, AND
ISN'T ALL OF THIS BASH, PERL, AND
YAML?
YAML?
10 . 1
Slide 11
Slide 11 text
CLOJURE NOT AN OBVIOUS CHOICE
CLOJURE NOT AN OBVIOUS CHOICE
The JVM had/has bad press with infrastructure folk
11 . 1
Slide 12
Slide 12 text
CLOJURE AT EXOSCALE: A TIMELINE
CLOJURE AT EXOSCALE: A TIMELINE
12 . 1
Slide 13
Slide 13 text
2012: THE EARLY DAYS
2012: THE EARLY DAYS
13 . 1
Slide 14
Slide 14 text
WE STARTED WITH
WE STARTED WITH
3 people
A bit of time
A product
idea
14 . 1
Slide 15
Slide 15 text
A DIFFERENT CLOUD PROVIDER
A DIFFERENT CLOUD PROVIDER
Not yet another virtual datacenter product
Integration with automation tooling
Integration in language-specific libraries
Focus on horizontally-scalable
applications
Local storage
Security groups
15 . 1
Slide 16
Slide 16 text
THINGS THAT DIDN'T EXIST IN 2012
THINGS THAT DIDN'T EXIST IN 2012
Ansible
Terraform
Docker
16 . 1
Slide 17
Slide 17 text
THINGS THAT DIDN'T EXIST IN 2012
THINGS THAT DIDN'T EXIST IN 2012
Television
Wifi
17 . 1
Slide 18
Slide 18 text
OUR MINIMAL STACK
OUR MINIMAL STACK
Apache Cloudstack
Puppet
Good old MySQL
A third-party customer management
tool
Python + AngularJS
Riemann
18 . 1
Slide 19
Slide 19 text
OUR MINIMAL STACK
OUR MINIMAL STACK
19 . 1
Slide 20
Slide 20 text
RIEMANN
RIEMANN
The common saying back then was monitoring
sucks
Push-based model was a great fit for our use case
Riemann was in a rough state back then
A great opportunity to contribute
20 . 1
THINGS OUR EARLY ADOPTERS ENJOYED
THINGS OUR EARLY ADOPTERS ENJOYED
Vagrant support
Security groups instead of
firewalling
A public IP per instance
23 . 1
WARP
WARP
Open Source
TLS client certificate-based authentication
IRC support
Haskell Go agent
Prefigured our inclination for Clojure at the orchestration
layer
27 . 1
Slide 28
Slide 28 text
TROUBLE KICKS IN
TROUBLE KICKS IN
Late payments
Bitcoin mining on free
credit
28 . 1
Slide 29
Slide 29 text
SOLVING ABUSE
SOLVING ABUSE
Need to pull data from a bunch of
places
Standard FSM type of problem
29 . 1
SOME THINGS WE LEARNED
SOME THINGS WE LEARNED
Running Clojure processes in good old cron is
perfect
Logback's logging context is a huge plus
31 . 1
Slide 32
Slide 32 text
2014: THE YEAR OF STORAGE
2014: THE YEAR OF STORAGE
32 . 1
Slide 33
Slide 33 text
OBJECT STORAGE
OBJECT STORAGE
The obvious choice for our crowd
Architecturally simpler than distributed block storage
A good complement to our local storage backed
instances
33 . 1
Slide 34
Slide 34 text
OBJECT STORAGE NEEDS
OBJECT STORAGE NEEDS
S3 is the sole player in that field: we need API
compatibility
The only alternative at the time was bad HTTP extensions
34 . 1
Slide 35
Slide 35 text
OBJECT STORAGE IN THE WILD
OBJECT STORAGE IN THE WILD
Ceph
Riak-CS
Swi
Costly vendor-backed
solutions
35 . 1
Slide 36
Slide 36 text
WRITING AN OBJECT STORE
WRITING AN OBJECT STORE
We focused on how to store large objects
Tempted by a description of the (non-OpenSource) approach by
Datastax on top of Cassandra
36 . 1
Slide 37
Slide 37 text
CHOOSING CASSANDRA
CHOOSING CASSANDRA
Great library support, thanks @mpenet!
Simple for us to operate
Very few moving parts
Our implementation could remain fully
stateless
37 . 1
Slide 38
Slide 38 text
WE WERE (ALMOST) YOUNG AND (WAY
WE WERE (ALMOST) YOUNG AND (WAY
TOO) NAIVE
TOO) NAIVE
How are could it be?
38 . 1
Slide 39
Slide 39 text
WHAT WE DIDN'T ANTICIPATE
WHAT WE DIDN'T ANTICIPATE
It's not all about actual data storage
The S3 API is a beast
The S3 API is under specified
The S3 API is not versioned
The S3 API client landscape is a
mess
39 . 1
Slide 40
Slide 40 text
A QUICK DIGRESSION: S3 REQUESTS
A QUICK DIGRESSION: S3 REQUESTS
Operation: put object foo in bucket bar:
PUT /foo
Host bar.sos-ch-dk-2.exo.io
Authorization: AWS ....
<...>
40 . 1
Slide 41
Slide 41 text
A QUICK DIGRESSION: S3 REQUESTS
A QUICK DIGRESSION: S3 REQUESTS
Operation: update acl for object foo in bucket bar:
PUT /foo?acl
Host bar.sos-ch-dk-2.exo.io
Authorization: AWS ....
X-Amz-ACL: bucket-owner-full-control
41 . 1
Slide 42
Slide 42 text
A QUICK DIGRESSION: S3 REQUESTS
A QUICK DIGRESSION: S3 REQUESTS
Operation: Copy object bim from bucket bam to object foo in
bucket bar:
PUT /foo
Host bar.sos-ch-dk-2.exo.io
Authorization: AWS ....
X-Amz-Copy-Source: /bim/bam
X-Amz-Copy-Source-If-Unmodified-Since: ARE YOU KIDDING ME?
42 . 1
Slide 43
Slide 43 text
BY THE WAY
BY THE WAY
Storing terrabytes of data on off-the-shelf hardware doesn't come
by easy either
Input and output payloads of arbitrary lengths aren't easy
Compojure, Ring, and usual suspects are out
43 . 1
Slide 44
Slide 44 text
SOME THINGS WE LEARNED
SOME THINGS WE LEARNED
This was our largest application to date
Component didn't exist
We built a hacky similar thing based on plain maps
Maintenance of the application starts becoming an
issue
Maps can lead to threading malformed data for a while
44 . 1
Slide 45
Slide 45 text
2015: SCALING UP
2015: SCALING UP
45 . 1
Slide 46
Slide 46 text
THINGS ARE RUNNING SMOOTHLY
THINGS ARE RUNNING SMOOTHLY
Load on the platform is increasing
We have a lot of event generating
systems
Tons of logs
Tongs of metrics
46 . 1
Slide 47
Slide 47 text
WE CAN'T DO EVERYTHING WITH CRON
WE CAN'T DO EVERYTHING WITH CRON
So we install a Kafka cluster
47 . 1
A FIRST CANDIDATE: BANDWIDTH
A FIRST CANDIDATE: BANDWIDTH
METERING
METERING
Traffic accounting on hypervisors, with a small C
agent
30 second aggregates sent over to Kafka
A Clojure Kafka consumer on the other end
50 . 1
Slide 51
Slide 51 text
KEY TAKEWAY
KEY TAKEWAY
Non-glue Clojure code is around 150 loc
Altogether around 500 lines
It seems as though Clojure was written to write Kafka
consumers
51 . 1
Slide 52
Slide 52 text
THIS HAMMER NEEDS NEW NAILS
THIS HAMMER NEEDS NEW NAILS
We have a recurring issue with DNS updates and need more
flexibility building zones
52 . 1
Slide 53
Slide 53 text
AN EXPERIMENT: BLOG POST DRIVEN
AN EXPERIMENT: BLOG POST DRIVEN
DEVELOPMENT
DEVELOPMENT
Slide 54
Slide 54 text
53 . 1
Slide 55
Slide 55 text
LOG COMPACTION
LOG COMPACTION
54 . 1
Slide 56
Slide 56 text
LOG COMPACTON
LOG COMPACTON
55 . 1
Slide 57
Slide 57 text
KALZONE: DYNAMIC DNS WITH KAFKA
KALZONE: DYNAMIC DNS WITH KAFKA
Works great across a large number of clients
Great foundation for more infrastructure inventory
solutions
Kafka log compaction is a huge plus
56 . 1
Slide 58
Slide 58 text
2016: FAST GROWTH
2016: FAST GROWTH
57 . 1
Slide 59
Slide 59 text
SECURED FUNDING IN LATE 2015
SECURED FUNDING IN LATE 2015
58 . 1
Slide 60
Slide 60 text
USE OF PROCEEDS
USE OF PROCEEDS
People
A new
datacenter
59 . 1
Slide 61
Slide 61 text
SELLING ON THE WEB
SELLING ON THE WEB
We simplify our online
funnel
A drip process
60 . 1
Slide 62
Slide 62 text
DRIP PROCESS
DRIP PROCESS
core.match to the rescue
again
Yet another reason to write a
cron
61 . 1
Slide 63
Slide 63 text
BILLING ISSUES
BILLING ISSUES
The cron based approach to billing is showing its limit
Hard to keep it at a hourly rate because it takes too
long
62 . 1
Slide 64
Slide 64 text
AT A CROSSROADS
AT A CROSSROADS
63 . 1
Slide 65
Slide 65 text
AT A CROSSROADS
AT A CROSSROADS
64 . 1
Slide 66
Slide 66 text
AT A CROSSROADS
AT A CROSSROADS
65 . 1
Slide 67
Slide 67 text
KAFKA TO THE RESCUE
KAFKA TO THE RESCUE
A full rewrite of our billing
stack
Sub 1k loc
66 . 1
Slide 68
Slide 68 text
KEY TAKEWAYS
KEY TAKEWAYS
Incredible reliability
The system can weather temporary failures with no billing
impact
Transducers fit in perfectly with Kafka
We wrote a few of our own
67 . 1
Slide 69
Slide 69 text
2017: TOO MUCH DATA
2017: TOO MUCH DATA
68 . 1
Slide 70
Slide 70 text
SUDDEN S3 PICKUP IN USAGE
SUDDEN S3 PICKUP IN USAGE
Our initial implementation limits the
throughput
Tail latencies go through the roof
Cassandra is just not great at doing dense
nodes
We knew this going in
We hit the wall hard
69 . 1
Slide 71
Slide 71 text
WE NEED A NUMBER OF NEW API
WE NEED A NUMBER OF NEW API
CAPABILITIES
CAPABILITIES
V4 signatures are becoming the norm for S3
Better ACL support is needed
The docker registry exercises all weird properties of the
API
70 . 1
Slide 72
Slide 72 text
WE FIND A GOOD PAPER
WE FIND A GOOD PAPER
Ambry attacks the same problem
space
The paper lays out a great strategy
71 . 1
Slide 73
Slide 73 text
LET'S WRITE A DISTRIBUTED SYSTEM
LET'S WRITE A DISTRIBUTED SYSTEM
FROM SCRATCH
FROM SCRATCH
What could go wrong?
72 . 1
Slide 74
Slide 74 text
BETTING ON
BETTING ON CORE.ASYNC
CORE.ASYNC
To better understand netty internals we settle on writing our own
facade
This brings less baggage than aleph
A storage agent in C
Zookeeper for agent discovery
We keep Cassandra for metadata storage
73 . 1
Slide 75
Slide 75 text
NEW THINGS
NEW THINGS
Component
Spec
A larger reagent frontend
app
74 . 1
Slide 76
Slide 76 text
UI
UI
75 . 1
Slide 77
Slide 77 text
KEY LEARNINGS
KEY LEARNINGS
Component is our go-to daemon structuring tool
Netty is hard
Reconciling byte buffer manipulation with the immutable
Clojure world can be tricky
Transducers were a life saver against memory leaks
Test on sequences
Runs against core.async channels
Spec helps a lot with reliability and maintenance
We still don't do enough generative testing
76 . 1
Slide 78
Slide 78 text
2018: WORLD DOMINATION!
2018: WORLD DOMINATION!
77 . 1
Slide 79
Slide 79 text
OUR CURRENT STATE
OUR CURRENT STATE
78 . 1
Slide 80
Slide 80 text
GOOD CORE LIBRARIES
GOOD CORE LIBRARIES
Unilog
Kinsky
Net
Reporter
Raven
Uncaught
Signal
79 . 1
Slide 81
Slide 81 text
WHAT WE'RE MISSING
WHAT WE'RE MISSING
A good daemon template
Some goverance around our
library
A clojure for systems
developement
80 . 1
Slide 82
Slide 82 text
BUILDING ON KUBERNETES
BUILDING ON KUBERNETES
We previously bet on Mesos
Recent changes make running Clojure apps on Kubernetes nice
and easy
Upcoming library for configuration of Kubernetes applications
Upcoming library to build Kubernetes controllers in Clojure
81 . 1
Slide 83
Slide 83 text
AN API GATEWAY
AN API GATEWAY
The frontdoor to our infrastructure
Leverages all our work around asynchronous
networking
A great way to put spec to work
Will give us great capabilities to do smart RBAC
82 . 1
Slide 84
Slide 84 text
FRONTEND
FRONTEND
We use it for internal tooling already
It's time to switch our main console
Re-frame gives us great confidence in making the
jump
83 . 1
Slide 85
Slide 85 text
LOOKING BACK
LOOKING BACK
84 . 1
Slide 86
Slide 86 text
WHAT WE DON'T DO IN CLOJURE
WHAT WE DON'T DO IN CLOJURE
SQL-backed APIs
Low-level
development
85 . 1
Slide 87
Slide 87 text
THE USUAL QUESTIONS
THE USUAL QUESTIONS
Community
Hiring
86 . 1
Slide 88
Slide 88 text
THANKS
THANKS
We need help building all of
this!
87 . 1