Event Sourcing at Studyflow / 2

Slide 1

Slide 1 text

Event sourcing at Studyﬂow / 2 Davide Taviani September 9th, 2015 1 / 25

Slide 2

Slide 2 text

Summary 1 Introduction 2 Event sourcing 3 In-memory read-model and its challenges 2 / 25

Slide 3

Slide 3 text

About me • Davide Taviani • MSc Mathematics (scientiﬁc / parallel computing, combinatorial optimization) • Learned Clojure ∼ 1.5 years ago • Developer @ Studyﬂow • @Heliosmaster on GitHub 3 / 25

Slide 4

Slide 4 text

At Studyﬂow we build a secondary education platform • http://www.studyflow.nl • We provide two courses (Rekenen and Taal) • We serve over 100 schools, 50 000+ students, who answered correctly more than 10 million questions. 4 / 25

Slide 5

Slide 5 text

Studyﬂow Our applications: • Small internal rails app for entering content • Clojure web applications (ring, hiccup, ...): authentication system administration teacher front-end • ClojureScript (om, reagent) for student applications 5 / 25

Slide 6

Slide 6 text

Applications layout Public Web Internal Web Learning EventStore Teaching Login SessionStore Assessments SchoolAdmin Publishing Reporting ElasticSearch 6 / 25

Slide 7

Slide 7 text

Application server layout We use DigitalOcean for our VMs. • 2 stacks (to provide rolling updates) • 1 VM for publishing app (internal and only 2-3 users at the same time) • 1 VM per each Clojure application per stack • 1 VM for the Event Store Performance is great, we just take a bit of memory... 7 / 25

Slide 8

Slide 8 text

Event sourcing We use event sourcing and every application listens to / writes domain events, our source of truth. We use our own open-source toolkit: rill. https://github.com/rill-event-sourcing/rill 8 / 25

Slide 9

Slide 9 text

Event sourcing An event • records a thing that happened; • has a meaning in the domain; • encodes the intent of change; • immutable. • Event store is an append-only system. • The application state is simply reconstructed from these events, in chronological order. 9 / 25

Slide 10

Slide 10 text

Event sourcing Example of an event: {:rill.message/type :quiz/QuestionAnsweredCorrectly, :rill.message/timestamp #inst "2015-02-11T11:46:55.014-00:00", :rill.message/id #uuid "276de24f-d7df-478c-a82a-fd97c24a7232", :answer "My Answer", :question-id 442, :user-id 23} • Right now, we have more than 55 million events • ∼ 11M new events per month 10 / 25

Slide 11

Slide 11 text

Event sourcing Events are awesome because they can help us answer questions like: • Which questions are difficult? • How different are quick learners vs slow ones? • What kind of mistake is the most common for a particular question? • Do the students that read again the explanation (theory) immediately after answering incorrectly get it right? • Some stuff that we don’t know yet! 11 / 25

Slide 12

Slide 12 text

Event sourcing Learning Teaching Write side EventStore Read side Write side Read side • Read side and write side of the applications have diﬀerent needs (CQRS) • Each application read-model is generated / updated asynchronously from the published events 12 / 25

Slide 13

Slide 13 text

Read-model Simple example of read-model: login application. • The application listens for events about credentials, updating the current state of the application, a “credentials database”. • It is just a (continuous) reduction on the list of events, applying one at a time. 13 / 25

Slide 14

Slide 14 text

Handling events Example of event handler: (defmethod handle-event :student.events/CredentialsAdded [db {:keys [student-id email password]}] (-> db (assoc-in [:by-email email] {:user-id student-id :password password)}) (assoc-in [:email-by-id student-id] email))) 14 / 25

Slide 15

Slide 15 text

Building and deploying • We normally deploy commits, identiﬁed by their sha. • Upon commit, if all tests pass, we build automatically the jar of each application and store it S3 with format $application-$sha.jar 1 Each server gets the appropriate sha version from S3 and it starts it up 2 All the applications catch up with previous events building the read-model 3 Every application listens for new events and updates the read-model accordingly 15 / 25

Slide 16

Slide 16 text

Read-model We currently use in-memory read-model: it’s just a map. Very straightfoward to implement but non-durable. • What happens if the server suddenly malfunctions and reboots? • What happens on deploy? Building the read-model is fast with a small number of events, but at 50M it takes a lot (in our teaching application, ∼ 4 hours). 16 / 25

Slide 17

Slide 17 text

Durable read-model How can we avoid building the read-model? • Changes to frontend code • Changes to graphical design • Simple restart of the machine do not require a diﬀerent read-model. Saving the read-model to disk and loading it on startup seems a good strategy. 17 / 25

Slide 18

Slide 18 text

Durable read-model We employ a mixture of best-practices and devops. 1 Good separation of code that writes to read-model 2 Computation a shasum of the ﬁles related to such code to generate a read-model version 3 In-memory map gets periodically serialized and saved to disk, also with the index i of the last-seen event. 4 If the sha is the same, the read-model must be the same, therefore we can load it from disk upon application start. 18 / 25

Slide 19

Slide 19 text

Durable read-model • Loading the read-model until event i is just a shortcut for applying the events 0, . . . , i. • From i + 1 onwards the events have to be handled normally. • Luckily, these are very few (generated during downtime or re-deploy). This brings the deploy time of new versions of the application without changes to the read-model down to 5 minutes (mostly other stuﬀ not related to read-model) 19 / 25

Slide 20

Slide 20 text

Deploying a new version What if the logic to handle events has changed? 1 Write a migration 2 Rebuild the entire read-model Migrations are annoying and we still need to build a version of the read-model anyway, so we chose 2. 20 / 25

Slide 21

Slide 21 text

Building the read-model Normally, it can take a few hours between a commit and the time we decide to put it live: • Code review • Tests (automated and manual, on a variety of devices) • Decision It seems useful to use this time to build the read-model. 21 / 25

Slide 22

Slide 22 text

Building the read-model Therefore, we decided to change out strategy and build read-models up front. • A dedicated build server, with a single jar with all the applications combined (similar to our development environment), plugged to the production event store • Compression and upload of the read-model to S3 • Upon deploy, get the appropriate read-model from S3. Having a centralized place for read-models on S3 is very convenient. 22 / 25

Slide 23

Slide 23 text

Migration for critical bugs in read-model What about a critical bug in the read-model? The updated version cannot be live before a few hours. Depending on the nature of the change, we can still 1 download the read-model in our development machine 2 Run a manual migration 3 Save it and upload it to the appropriate place on S3. Upon deploy, there’s virtually no diﬀerence from an automatic build. 23 / 25

Slide 24

Slide 24 text

Further improvements to read-models This strategy is not the only way to bring down the deployment time. • Splitting of applications (and their read-models) in smaller pieces (code-wise): reduces the likelihood that a change impacts the read-model version. • Sharding of event stores. Due to our natural domain, we could have one separate database per region, province, city or even each school. The number of events to be read is then small enough that the construction of the read-model is done at most in a few minutes. 24 / 25

Slide 25

Slide 25 text

Info and contact Rill https://github.com/rill-event-sourcing/rill Previous talk http://joost.zeekat.nl/wp-content/es-at-sf.pdf Studyﬂow [email protected] Me [email protected], [email protected] 25 / 25