Event Sourcing at Studyflow / 2

8cad6d8aa26abc031f3c5c38a80fd06e?s=47 Davide Taviani
September 09, 2015

Event Sourcing at Studyflow / 2

In these slides I talk briefly about how we build a secondary education platform at Studyflow, using Clojure / ClojureScript and Event Sourcing and CQRS

8cad6d8aa26abc031f3c5c38a80fd06e?s=128

Davide Taviani

September 09, 2015
Tweet

Transcript

  1. Event sourcing at Studyflow / 2 Davide Taviani September 9th,

    2015 1 / 25
  2. Summary 1 Introduction 2 Event sourcing 3 In-memory read-model and

    its challenges 2 / 25
  3. About me • Davide Taviani • MSc Mathematics (scientific /

    parallel computing, combinatorial optimization) • Learned Clojure ∼ 1.5 years ago • Developer @ Studyflow • @Heliosmaster on GitHub 3 / 25
  4. At Studyflow we build a secondary education platform • http://www.studyflow.nl

    • We provide two courses (Rekenen and Taal) • We serve over 100 schools, 50 000+ students, who answered correctly more than 10 million questions. 4 / 25
  5. Studyflow Our applications: • Small internal rails app for entering

    content • Clojure web applications (ring, hiccup, ...): authentication system administration teacher front-end • ClojureScript (om, reagent) for student applications 5 / 25
  6. Applications layout Public Web Internal Web Learning EventStore Teaching Login

    SessionStore Assessments SchoolAdmin Publishing Reporting ElasticSearch 6 / 25
  7. Application server layout We use DigitalOcean for our VMs. •

    2 stacks (to provide rolling updates) • 1 VM for publishing app (internal and only 2-3 users at the same time) • 1 VM per each Clojure application per stack • 1 VM for the Event Store Performance is great, we just take a bit of memory... 7 / 25
  8. Event sourcing We use event sourcing and every application listens

    to / writes domain events, our source of truth. We use our own open-source toolkit: rill. https://github.com/rill-event-sourcing/rill 8 / 25
  9. Event sourcing An event • records a thing that happened;

    • has a meaning in the domain; • encodes the intent of change; • immutable. • Event store is an append-only system. • The application state is simply reconstructed from these events, in chronological order. 9 / 25
  10. Event sourcing Example of an event: {:rill.message/type :quiz/QuestionAnsweredCorrectly, :rill.message/timestamp #inst

    "2015-02-11T11:46:55.014-00:00", :rill.message/id #uuid "276de24f-d7df-478c-a82a-fd97c24a7232", :answer "My Answer", :question-id 442, :user-id 23} • Right now, we have more than 55 million events • ∼ 11M new events per month 10 / 25
  11. Event sourcing Events are awesome because they can help us

    answer questions like: • Which questions are difficult? • How different are quick learners vs slow ones? • What kind of mistake is the most common for a particular question? • Do the students that read again the explanation (theory) immediately after answering incorrectly get it right? • Some stuff that we don’t know yet! 11 / 25
  12. Event sourcing Learning Teaching Write side EventStore Read side Write

    side Read side • Read side and write side of the applications have different needs (CQRS) • Each application read-model is generated / updated asynchronously from the published events 12 / 25
  13. Read-model Simple example of read-model: login application. • The application

    listens for events about credentials, updating the current state of the application, a “credentials database”. • It is just a (continuous) reduction on the list of events, applying one at a time. 13 / 25
  14. Handling events Example of event handler: (defmethod handle-event :student.events/CredentialsAdded [db

    {:keys [student-id email password]}] (-> db (assoc-in [:by-email email] {:user-id student-id :password password)}) (assoc-in [:email-by-id student-id] email))) 14 / 25
  15. Building and deploying • We normally deploy commits, identified by

    their sha. • Upon commit, if all tests pass, we build automatically the jar of each application and store it S3 with format $application-$sha.jar 1 Each server gets the appropriate sha version from S3 and it starts it up 2 All the applications catch up with previous events building the read-model 3 Every application listens for new events and updates the read-model accordingly 15 / 25
  16. Read-model We currently use in-memory read-model: it’s just a map.

    Very straightfoward to implement but non-durable. • What happens if the server suddenly malfunctions and reboots? • What happens on deploy? Building the read-model is fast with a small number of events, but at 50M it takes a lot (in our teaching application, ∼ 4 hours). 16 / 25
  17. Durable read-model How can we avoid building the read-model? •

    Changes to frontend code • Changes to graphical design • Simple restart of the machine do not require a different read-model. Saving the read-model to disk and loading it on startup seems a good strategy. 17 / 25
  18. Durable read-model We employ a mixture of best-practices and devops.

    1 Good separation of code that writes to read-model 2 Computation a shasum of the files related to such code to generate a read-model version 3 In-memory map gets periodically serialized and saved to disk, also with the index i of the last-seen event. 4 If the sha is the same, the read-model must be the same, therefore we can load it from disk upon application start. 18 / 25
  19. Durable read-model • Loading the read-model until event i is

    just a shortcut for applying the events 0, . . . , i. • From i + 1 onwards the events have to be handled normally. • Luckily, these are very few (generated during downtime or re-deploy). This brings the deploy time of new versions of the application without changes to the read-model down to 5 minutes (mostly other stuff not related to read-model) 19 / 25
  20. Deploying a new version What if the logic to handle

    events has changed? 1 Write a migration 2 Rebuild the entire read-model Migrations are annoying and we still need to build a version of the read-model anyway, so we chose 2. 20 / 25
  21. Building the read-model Normally, it can take a few hours

    between a commit and the time we decide to put it live: • Code review • Tests (automated and manual, on a variety of devices) • Decision It seems useful to use this time to build the read-model. 21 / 25
  22. Building the read-model Therefore, we decided to change out strategy

    and build read-models up front. • A dedicated build server, with a single jar with all the applications combined (similar to our development environment), plugged to the production event store • Compression and upload of the read-model to S3 • Upon deploy, get the appropriate read-model from S3. Having a centralized place for read-models on S3 is very convenient. 22 / 25
  23. Migration for critical bugs in read-model What about a critical

    bug in the read-model? The updated version cannot be live before a few hours. Depending on the nature of the change, we can still 1 download the read-model in our development machine 2 Run a manual migration 3 Save it and upload it to the appropriate place on S3. Upon deploy, there’s virtually no difference from an automatic build. 23 / 25
  24. Further improvements to read-models This strategy is not the only

    way to bring down the deployment time. • Splitting of applications (and their read-models) in smaller pieces (code-wise): reduces the likelihood that a change impacts the read-model version. • Sharding of event stores. Due to our natural domain, we could have one separate database per region, province, city or even each school. The number of events to be read is then small enough that the construction of the read-model is done at most in a few minutes. 24 / 25
  25. Info and contact Rill https://github.com/rill-event-sourcing/rill Previous talk http://joost.zeekat.nl/wp-content/es-at-sf.pdf Studyflow info@studyflow.nl

    Me davide@studyflow.nl, info@davidetaviani.com 25 / 25