Slide 1

Slide 1 text

Lessons learned Microservices in Clojure by Alexey Kachayev for KyivClojure #7, 2015

Slide 2

Slide 2 text

About Me ‣ Alexey Kachayev, @kachayev ‣ CTO at Attendify.com ‣ Clojure, Scala, Erlang engineer ‣ Active open source contributor ‣ Author of Fn.py library (Python) ‣ Hobbies: Haskell, Rust, CRDTs, compilers

Slide 3

Slide 3 text

Agenda ‣ Product overview ‣ The big idea behind Microservices ‣ What we built and why ‣ Problems and pitfalls ‣ The Road Not Taken

Slide 4

Slide 4 text

Attendify Product Overview

Slide 5

Slide 5 text

Attendify ‣ Mobile applications builder ‣ Thousands of mobile apps ‣ Private social networks in each application ‣ Real-time analytic ‣ Sponsored Posts (ads) ‣ EventWall for screen projection

Slide 6

Slide 6 text

Attendify Hub

Slide 7

Slide 7 text

Social

Slide 8

Slide 8 text

Multi-Event App

Slide 9

Slide 9 text

The Idea

Slide 10

Slide 10 text

Microservices ‣ Your Server As a Function [1] ‣ Scaling, multiple languages and bla-bla-bla… ‣ Hyped as well as NoSQL, BigData etc ‣ Just google it to find more information ‣ We use it because it’s convenient ‣ As well as split your code into small functions ‣ We moved from Django project almost 2 years ago

Slide 11

Slide 11 text

Applicability ‣ If you don’t know how to split your system into small services: -it’s too small to be split -you don’t know your system well enough -how are you going to scale your engineering team?

Slide 12

Slide 12 text

What We Built

Slide 13

Slide 13 text

Current State ‣ 7 services in Clojure (from a total of 23) ‣ 82 RPC endpoints in Clojure (from a total of 290+) ‣ 17k+ LOC of Clojure code, 2850+ commits ‣ 4-6M requests handled each day ‣ 3 engineers work with Clojure on a regular basis ‣ Not only Clojure company (also Erlang, Scala, Go)

Slide 14

Slide 14 text

Brief History ‣ Started 1.5 year ago ‣ With 2 services in Clojure (sophisticated data processing modules) ‣ Didn’t choose any of existing microservices framework or platform

Slide 15

Slide 15 text

Ready-to-use Solutions ‣ All systems are targeted to fit in predefined requirements (as any framework) ‣ We didn’t know all requirements in advance ‣ Requirements are subject to change (continuously) ‣ There is no “right way” ‣ Non-technical requirements (i.e. organization structure) are rarely portable

Slide 16

Slide 16 text

Started From… ‣ JSON-RPC 2.0 protocol over HTTP transport ‣ Server: jetty & ring ‣ Service: implicit, ad-hoc definition, code copy & paste ‣ Deploy: JAR (uber), upstart, fab ‣ Discovery: URI with environment variables ‣ Security: HMAC request signature

Slide 17

Slide 17 text

Next steps (1) ‣ Better JSON-RPC: -meta information -another multiplexing procedure -named params

Slide 18

Slide 18 text

Next steps (2) ‣ Deployment procedure: -move all fab commands to shared library -save uberjars (each version) on S3 -ping and http-based health checker -report all activity to Slack -run:as (to connect local service to QA or Prod clusters)

Slide 19

Slide 19 text

Next steps (3) ‣ Switched to httpkit (http server & client) -better benchmarks but not really applicable for our case -wanted to use core.async for service definitions, but still using futures (it’s ok for us)

Slide 20

Slide 20 text

More Services, New Problems ‣ logs: unification, collect/process ‣ errors tracking: new type of errors (inter-service communication) ‣ auth: different levels and procedures ‣ metrics: collect, view, analyze ‣ protocol: dynamic typing is hard to scale

Slide 21

Slide 21 text

Solutions So Far ‣ logs: used Loggly, not really a problem now ‣ errors tracking: Rollbar for failure reports, either abstraction, timeouts handling as first class citizen ‣ metrics: used Graphite, now using InfluxDB ‣ protocol: schema library to params definition and validation

Slide 22

Slide 22 text

Augustine ‣ shared library with s3-wagon ‣ defservice macro that uses multimethod ‣ protocol definition/validation with schema ‣ auth level specification and control ‣ errors, exceptions and timeouts handling ‣ meta information, req/resp ID with flake algorithm

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Problems and Pitfalls

Slide 25

Slide 25 text

Java ‣ You don’t need to know Java to write Clojure ‣ Your http server/framework is written in Java ‣ “Java in Clojure” is easier than “Java in Java” ‣ Most probably you will deal with Java code somehow ‣ GC is your “good but very unpredictable” friend

Slide 26

Slide 26 text

Java We Use ‣ io operations, streaming & buffering ‣ XLS reader ‣ base64 ‣ timers ‣ java.text.SimpleDateFormat

Slide 27

Slide 27 text

Data Communication ‣ Databases: Riak, Redis, CouchDB, PostgreSQL ‣ Our Clojure services are “data-centric” (mostly about data manipulations) ‣ “single data responsibility” sounds good, but doesn’t work in our case ‣ Databases are used a lot for cross-service communications to decrease inter-services coupling

Slide 28

Slide 28 text

The Road Not Taken

Slide 29

Slide 29 text

Actual Problems ‣ scaling is a hard problem even with best instruments ‣ there is no “critical” problem that we can’t solve ‣ there is a big room for enhancements ‣ there is even bigger room for experiments

Slide 30

Slide 30 text

Investigations (1) ‣ active investigations ‣ errors processing (even with either, trying monads) ‣ distributed tracing (partially solved with req IDs) ‣ service discovery & (smart) load balancing ‣ binary protocol & TCP for inter-server communication

Slide 31

Slide 31 text

Investigations (2) ‣ not really active investigations ‣ auto-generated SDKs ‣ back pressure control (looking at Hystrix) ‣ core.async (long story) ‣ tasks cancellation

Slide 32

Slide 32 text

core.async (1) ‣ Your Server as a Transducer ‣ augustine library accepts channel as a return type ‣ httpkit provides async interface ‣ but… futures work fine for us (still?) ‣ still experimenting…

Slide 33

Slide 33 text

core.async (2) ‣ better timeouts ‣ better multiplexing ‣ easier to deal with back-pressure control ‣ async abstractions are very leaky ‣ should reimplement most parts of the code ‣ hard to debug (just like futures)

Slide 34

Slide 34 text

Finagle-Clojure ‣ github.com/finagle/finagle-clojure ‣ good interface to work with Thrift ‣ easy to start with basic template and docs ‣ inconvenient Scala runtime ‣ not-really-idiomatic Clojure ‣ no more comments for now (not using in production)

Slide 35

Slide 35 text

Thoughts ‣ No regrets about our technical decision(s) ‣ We have time to solve problems & concerns ‣ Clojure is ok for product development ‣ Clojure is ok when supporting old code ‣ Clojure development is hard when # engineers > 1** ‣ ** it’s hard to work with people in any case

Slide 36

Slide 36 text

alexey@attendify.com We’re hiring!

Slide 37

Slide 37 text

Thank You! Questions?