Lessons learned
Microservices in Clojure
by Alexey Kachayev for KyivClojure #7, 2015
Slide 2
Slide 2 text
About Me
‣ Alexey Kachayev, @kachayev
‣ CTO at Attendify.com
‣ Clojure, Scala, Erlang engineer
‣ Active open source contributor
‣ Author of Fn.py library (Python)
‣ Hobbies: Haskell, Rust, CRDTs, compilers
Slide 3
Slide 3 text
Agenda
‣ Product overview
‣ The big idea behind Microservices
‣ What we built and why
‣ Problems and pitfalls
‣ The Road Not Taken
Slide 4
Slide 4 text
Attendify Product Overview
Slide 5
Slide 5 text
Attendify
‣ Mobile applications builder
‣ Thousands of mobile apps
‣ Private social networks in each application
‣ Real-time analytic
‣ Sponsored Posts (ads)
‣ EventWall for screen projection
Slide 6
Slide 6 text
Attendify Hub
Slide 7
Slide 7 text
Social
Slide 8
Slide 8 text
Multi-Event App
Slide 9
Slide 9 text
The Idea
Slide 10
Slide 10 text
Microservices
‣ Your Server As a Function [1]
‣ Scaling, multiple languages and bla-bla-bla…
‣ Hyped as well as NoSQL, BigData etc
‣ Just google it to find more information
‣ We use it because it’s convenient
‣ As well as split your code into small functions
‣ We moved from Django project almost 2 years ago
Slide 11
Slide 11 text
Applicability
‣ If you don’t know how to split your system into small
services:
-it’s too small to be split
-you don’t know your system well enough
-how are you going to scale your engineering team?
Slide 12
Slide 12 text
What We Built
Slide 13
Slide 13 text
Current State
‣ 7 services in Clojure (from a total of 23)
‣ 82 RPC endpoints in Clojure (from a total of 290+)
‣ 17k+ LOC of Clojure code, 2850+ commits
‣ 4-6M requests handled each day
‣ 3 engineers work with Clojure on a regular basis
‣ Not only Clojure company (also Erlang, Scala, Go)
Slide 14
Slide 14 text
Brief History
‣ Started 1.5 year ago
‣ With 2 services in Clojure (sophisticated data
processing modules)
‣ Didn’t choose any of existing microservices
framework or platform
Slide 15
Slide 15 text
Ready-to-use Solutions
‣ All systems are targeted to fit in predefined
requirements (as any framework)
‣ We didn’t know all requirements in advance
‣ Requirements are subject to change (continuously)
‣ There is no “right way”
‣ Non-technical requirements (i.e. organization
structure) are rarely portable
Slide 16
Slide 16 text
Started From…
‣ JSON-RPC 2.0 protocol over HTTP transport
‣ Server: jetty & ring
‣ Service: implicit, ad-hoc definition, code copy & paste
‣ Deploy: JAR (uber), upstart, fab
‣ Discovery: URI with environment variables
‣ Security: HMAC request signature
Slide 17
Slide 17 text
Next steps (1)
‣ Better JSON-RPC:
-meta information
-another multiplexing procedure
-named params
Slide 18
Slide 18 text
Next steps (2)
‣ Deployment procedure:
-move all fab commands to shared library
-save uberjars (each version) on S3
-ping and http-based health checker
-report all activity to Slack
-run:as (to connect local service to QA or Prod
clusters)
Slide 19
Slide 19 text
Next steps (3)
‣ Switched to httpkit (http server & client)
-better benchmarks but not really applicable for our
case
-wanted to use core.async for service definitions,
but still using futures (it’s ok for us)
Slide 20
Slide 20 text
More Services, New Problems
‣ logs: unification, collect/process
‣ errors tracking: new type of errors (inter-service
communication)
‣ auth: different levels and procedures
‣ metrics: collect, view, analyze
‣ protocol: dynamic typing is hard to scale
Slide 21
Slide 21 text
Solutions So Far
‣ logs: used Loggly, not really a problem now
‣ errors tracking: Rollbar for failure reports, either
abstraction, timeouts handling as first class citizen
‣ metrics: used Graphite, now using InfluxDB
‣ protocol: schema library to params definition and
validation
Slide 22
Slide 22 text
Augustine
‣ shared library with s3-wagon
‣ defservice macro that uses multimethod
‣ protocol definition/validation with schema
‣ auth level specification and control
‣ errors, exceptions and timeouts handling
‣ meta information, req/resp ID with flake algorithm
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
Problems and Pitfalls
Slide 25
Slide 25 text
Java
‣ You don’t need to know Java to write Clojure
‣ Your http server/framework is written in Java
‣ “Java in Clojure” is easier than “Java in Java”
‣ Most probably you will deal with Java code somehow
‣ GC is your “good but very unpredictable” friend
Slide 26
Slide 26 text
Java We Use
‣ io operations, streaming & buffering
‣ XLS reader
‣ base64
‣ timers
‣ java.text.SimpleDateFormat
Slide 27
Slide 27 text
Data Communication
‣ Databases: Riak, Redis, CouchDB, PostgreSQL
‣ Our Clojure services are “data-centric” (mostly about
data manipulations)
‣ “single data responsibility” sounds good, but doesn’t
work in our case
‣ Databases are used a lot for cross-service
communications to decrease inter-services coupling
Slide 28
Slide 28 text
The Road Not Taken
Slide 29
Slide 29 text
Actual Problems
‣ scaling is a hard problem even with best instruments
‣ there is no “critical” problem that we can’t solve
‣ there is a big room for enhancements
‣ there is even bigger room for experiments
Slide 30
Slide 30 text
Investigations (1)
‣ active investigations
‣ errors processing (even with either, trying monads)
‣ distributed tracing (partially solved with req IDs)
‣ service discovery & (smart) load balancing
‣ binary protocol & TCP for inter-server communication
Slide 31
Slide 31 text
Investigations (2)
‣ not really active investigations
‣ auto-generated SDKs
‣ back pressure control (looking at Hystrix)
‣ core.async (long story)
‣ tasks cancellation
Slide 32
Slide 32 text
core.async (1)
‣ Your Server as a Transducer
‣ augustine library accepts channel as a return type
‣ httpkit provides async interface
‣ but… futures work fine for us (still?)
‣ still experimenting…
Slide 33
Slide 33 text
core.async (2)
‣ better timeouts
‣ better multiplexing
‣ easier to deal with back-pressure control
‣ async abstractions are very leaky
‣ should reimplement most parts of the code
‣ hard to debug (just like futures)
Slide 34
Slide 34 text
Finagle-Clojure
‣ github.com/finagle/finagle-clojure
‣ good interface to work with Thrift
‣ easy to start with basic template and docs
‣ inconvenient Scala runtime
‣ not-really-idiomatic Clojure
‣ no more comments for now (not using in production)
Slide 35
Slide 35 text
Thoughts
‣ No regrets about our technical decision(s)
‣ We have time to solve problems & concerns
‣ Clojure is ok for product development
‣ Clojure is ok when supporting old code
‣ Clojure development is hard when # engineers > 1**
‣ ** it’s hard to work with people in any case