Slide 1

Slide 1 text

Unified Log [email protected] 2015-05-20

Slide 2

Slide 2 text

This all started about 13 months ago… When I first joined State 2

Slide 3

Slide 3 text

I had read Jay Kreps’ blog post : “The Log: What every software engineer should know about real-time data's unifying abstraction” 3

Slide 4

Slide 4 text

And knew it was the best way for us to move forward, to move faster, to engineer better systems … 4

Slide 5

Slide 5 text

Product Market Fit • Efficiency and speed of iteration • Ability to be nimble • Ability to make changes quick • Ability to test out hypotheses 5

Slide 6

Slide 6 text

Challenges • State is a startup • we burn cash • we have no revenue • pushy product team • super smart engineering team 6

Slide 7

Slide 7 text

Time is of the essence 7

Slide 8

Slide 8 text

Desires • Decouple system architecture into individual standalone components • Enable engineers to have the freedom/flexibility to use the right tools for each and every problem • Framework / Pattern for building new components 8

Slide 9

Slide 9 text

I wanted it to be fun (most importantly) 9 photo by https://www.flickr.com/photos/jdhancock/

Slide 10

Slide 10 text

a short story 10

Slide 11

Slide 11 text

From a monolith … • Single repo codebase • Monolithic deploys requiring full regression testing • Unwieldily Ruby on Rails web application - battered through numerous cycles of product change 11

Slide 12

Slide 12 text

… to a SOA • Multiple daily deploys • Individual, isolated, deployable web services • Engineers can choose the right tools for the tasks at hand • A formula for creating as many Micro-Services as needed 12

Slide 13

Slide 13 text

This was a lot of hard work … initially … 13

Slide 14

Slide 14 text

The State Team you guys ROCK ! 14

Slide 15

Slide 15 text

Me • Mischa Tuffield http://mmt.me.uk/ • State CTO - Since Apr 2014 • 3rd Startup - Garlik, PeerIndex • PhD in CS - Capturing Autobiographical Metadata • Web and Data Geek • I ♥ Semantic Web • TBL is a legend (don’t doubt this) • ex-W3C contributor • Social Web XG Editor • RDF 1.1 Turtle Contributor • I also ♥ Star Wars and Batman • Am more of a Luke person than a Han person 15

Slide 16

Slide 16 text

About State • Opinion Network • You shouldn't need a following to get heard • Exchange opinions with people around the world • Social Network structured like News where people share opinions on topics they care about • Founded by Alex & Mark Asseily (Jawbone & Skype) 16

Slide 17

Slide 17 text

And now for some specifics … 17

Slide 18

Slide 18 text

Our topics define the Data API for all of our Micro-Services 18

Slide 19

Slide 19 text

TL;DR • We created some workflows and infrastructure to allow for data to be persisted into our Unified Log as separate topics - e.g. Users, Opinions, “Well Said”s • We created a framework to allow the data in our Unified Log to be joined together to create new streams • We could move away from treating our database as the canonical source of truth for data - we are undecided • This allows the Engineers to use whatever technology suits the problem at hand, all they have to do is ensure that if their service dies it can recreated itself by replaying all of the data from its topic our Unified Log • We made use of segment.com as our analytics tool, unifying this concept of using an Event Based Unified Log 19

Slide 20

Slide 20 text

Looking at the details 20

Slide 21

Slide 21 text

State’s Unified Log • We have implemented our Log using
 Apache Kafka • Our Unified Log is our Data API for our Micro-Services • We have two types of Kafka topics • Shared Topics - Data API available to all Micro- Services • Internal Topics - not shared, contracted for use by individual Micro-Services only 21

Slide 22

Slide 22 text

The take home message is Decoupling of Services In order to avoid Regressions & Spaghetti Code 22

Slide 23

Slide 23 text

Operation Framework • We have defined a framework (Devops) for writing these Micro-Services. A well understood JSON over HTTP interface • 12 factor, JSON over HTTP 23

Slide 24

Slide 24 text

The key requirement for this work was to continue Product Development We did NOT perform a rewrite of our stack 24

Slide 25

Slide 25 text

A worked example User Search 25

Slide 26

Slide 26 text

Simplified Original Stack 26 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid

Slide 27

Slide 27 text

And then we built our “Op Tailer” 27 • Which reads our Mongo DB “Replica Set Oplog” • It should be noted, that unlike MySQL where the replication data format isn’t an agreed API (see LinkedIn’s papers on their Log) and is susceptible to change. The Mongo Replica Set Oplog is just another Mongo Collection • It could change :)

Slide 28

Slide 28 text

28 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid OpTailer

Slide 29

Slide 29 text

Kafka REST 29 • Kafka REST if a web service written by the folk at Confluent.io which writes data to a given Kafka topic

Slide 30

Slide 30 text

30 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid OpTailer User Topic (User Info Data) Kafka REST Unified Log Kafka … …

Slide 31

Slide 31 text

Then we wrote a simple indexer 31 • Which took data from the Kafka topic and pushed it into Elastic Search

Slide 32

Slide 32 text

32 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid OpTailer User Topic (User Info Data) Kafka REST Unified Log Kafka Search Indexer Elastic Search … …

Slide 33

Slide 33 text

Then we wrote a simple Search Service 33 • Which implements our Micro-Services Framework, based on Dropwizard (from the folk at Yammer)

Slide 34

Slide 34 text

34 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid OpTailer User Topic (User Info Data) Kafka REST Unified Log Kafka Search Indexer Elastic Search Search Service … …

Slide 35

Slide 35 text

We are moving to V2 of our Search Service 35 • This time we wanted to include the User’s Social Graph (which we call our Connections Graph) • But we had to join the data in our User Kafka topic with our Connections Graph • We used Apache Samza for this Task

Slide 36

Slide 36 text

36 State API Backend Ruby on Rails Webapp Mongo DB iOS WWW Droid OpTailer User Topic (User Info Data) Kafka REST User Connections (User Social Graph) Unified Log Kafka Search Indexer Samza Elastic Search Search Service User JOIN User Connections … …

Slide 37

Slide 37 text

Segment.com • Buy instead of Build • Event based Analytics • Open-source Client APIs • Write once for the clients • can make use of different services • can be pushed back into product via Webhook Interface, obvs. into the Unified Log 37

Slide 38

Slide 38 text

Gotchas • You need to be super good at Ops to get this stuff working • Schema Evolution is difficult - We are using Apache Avro and Schema Registry from the Confluent folk • Kafka needs to be fault tolerant - it is lower-level than a DB • Must agree upon a Devops Framework for writing these Micro- Services. A well understood JSON over HTTP interface 38

Slide 39

Slide 39 text

THANK YOU to JD Hancock for these amazing stormtrooper photos (what a legend) all photos are shared under CC 2.0 39

Slide 40

Slide 40 text

40 Thank you for listening Questions ? Dan Harvey will be talking in more detail at the Hadoop User Group [email protected] @mischat