Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless Event Sourcing

Aletheia
December 13, 2017

Serverless Event Sourcing

Understand and implement event sourcing design pattern with Serverless technologies

Aletheia

December 13, 2017
Tweet

More Decks by Aletheia

Other Decks in Technology

Transcript

  1. Event sourcing on serverless architectures Serverless Architectures Luca Bianchi —

    Neosperience github.com/aletheia it.linkedin.com/in/lucabianchipavia @bianchiluca medium.com/@aletheia
  2. Who am I? • Chief Technology Officer @ Neosperience •

    Working on a lot of bleeding edge technologies • Passionate developer: love writing code, hate meetings The Neosperience Cloud • Software as a service cloud for Digital Customer Experience processes 
 (Psychographics, Loyalty & Gamification, Proximity, Content, etc.) • Built on AWS, 95% on Serverless technologies • Moved from VMWare, to EC2, to Elastic Beanstalk, to Serverless • Dozens of micro and nano services The Digital Customer Experience Company, aims to change the way brands and customers interact with an approach of a software vendor targeting Digital Customer Experience as the evolution of Marketing automation.
  3. What is Serverless? “Serverless architecture replaces long-running virtual machines with

    ephemeral compute power that comes into existence on request and disappears immediately after use. Use of this architecture can mitigate some security concerns such as security patching and SSH access control, and can make much more efficient use of compute resources. These systems cost very little to operate and can have inbuilt scaling features.” — ThoughtWorks, 2016
  4. The Serverless Manifesto Function as the unit of deployment and

    scaling Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request
  5. It all started with an event.. • events are a

    great way to decouple services • largely abused in the last decade (ESB) • now revamped with CQRS and Event Sourcing to handle microservices communication • serverless functions handle events • need a different approach on architectures
  6. Sample use case: download Facebook profiles for ML 1. a

    set of FB auth tokens are sent to our system 2. jobs to download user images and posts are started 3. downloaded images are classified through Rekognition 4. posts are processed to perform text mining 5. downloaded profile is updated 6. completion acknowledge sent back to the caller (with stats) many architectures choices, how to evaluate them..?
  7. • In 2015 Tim Wagner, General Manager @ AWS Serverless

    Compute, presented to the ServerlessConf the so-called Serverless Manifesto. • In 2017 Danilo Poccia, Technical Evangelist @ AWS, talking at JeffConf asked “what would be having that manifesto translated into a checklist to evaluate Serverless compliance of an architecture?” • So, we turned it into a “Serverless Scorecard” Evaluating a serverless architecture (aka how big is my serverless?) The Serverless Scorecard Function as the unit of deployment and scaling Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request
  8. Lambda Orchestrating Lambdas The most naive approach is having a

    distributed monolith with a single endpoint receiving a set of FB tokens and mapping to a Lambda function invoking a workflow for each user (made of many lambdas) Pains: • Orchestrator lambda has to wait until the last workflow has been completed • NodeJS works better than other languages due to its asynchronous programming model IDEA 01 Orchestrator Job Orchestrator DownloadPosts DownloadImages ProcessImages Job Orchestrator DownloadPosts DownloadImages ProcessImages [ … one for each user … ]
  9. Serverless checklist Function as the unit of deployment and scaling

    Stateless Implicitly fault-tolerant Metrics No machines, VMs, or containers Never pay for idle Scales per request Bring Your Own Code 4/8
  10. Dynamo Orchestration Using a DynamoDB Table to decouple jobs based

    on FB token. Lambda are invoked asynchronously using DynamoDB streams. Workflow completion is done as an update on Table record Pains: • Orchestrator lambda has to wait until the last workflow has been completed (checked on Dynamo) • Each Lambda still has to wait for other lambda to have finished IDEA 02 Orchestrator Job Orchestrator DownloadPosts DownloadImages ProcessImages
  11. Serverless checklist Function as the unit of deployment and scaling

    Stateless Implicitly fault-tolerant Metrics No machines, VMs, or containers Never pay for idle Scales per request Bring Your Own Code 5/8
  12. Kinesis Orchestration Use Kinesis to post jobs after Orchestrator processed

    and DynamoDB Table to check completion of a task for a specific user. Dynamo and Kinesis keep the state of the system Pains: • Orchestrator lambda has to wait until the last workflow has been completed (checked on Dynamo) • Can’t know at which my system current state IDEA 03 Orchestrator DownloadPosts DownloadImages ProcessImages
  13. Kinesis + StepFunction Improve system state management using StepFunction to

    trigger different Lambdas, defer to Kinesis only batch execution within a single task. Implement End-of-Task event within Kinesis to signal processing is completed Pains: • Orchestrator lambda has to wait until the last workflow has been completed (checked on Dynamo) • Kinesis parallelism is through shard and partition key (i.e. userId) IDEA 04 Orchestrator DownloadPosts DownloadImages ProcessImages
  14. Serverless checklist Function as the unit of deployment and scaling

    Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Never pay for idle Scales per request Stateless 6/8
  15. S3 Event Sourcing Implement architecture using S3 as Event Store.

    Write a file for each computation step and let streams to parallel trigger Lambda invocation. Use SepFunction to check processing status globally (configure polling through wait/check states) Pains: • Client needs to use AWS SDK and conform to S3 data model or a Lambda has to proxy it • No event failover if one file is lost • No event replay from Event Store IDEA 05 Write file with token SplitPostsJobs DownloadPosts DownloadPosts DownloadPosts [ … ] Check/Wait files on bucket DownloadImage(s) DownloadImage(s) DownloadImage(s) [ … ] ProcessImage(s) ProcessImage(s) ProcessImage(s) SplitImagesJob(s) WriteEndFile
  16. Serverless checklist Function as the unit of deployment and scaling

    Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request 8/8
  17. Some pains still remain.. • Client needs to use AWS

    SDK and conform to S3 data model or a Lambda has to proxy it • No event failover if one file is lost • No event replay from Event Store Is a better solution possible?
  18. Introducing Internet of Functions (IoF) • IoT topics subscribed by

    Lambda
 to be notified when a new event is available (either a job request or a response) • IoT topics can be created on-the-fly
 so no CF setup is needed, just having IoT active in AWS account • Functions can subscribe to any topic
 through IoT Rules any Lambda can be triggered by a new event • Firehose can persist events into S3
 event replay from EventStore (S3) and failover can be achieved (and it’s Serverless!) • Async implicitly supported • REST endpoints decoupling
 A Lambda connected to APIG generates a requestID and returns from HTTP request with a token to poll for completion
  19. Internet of Functions IDEA 06 SplitPostsJobs DownloadPosts DownloadPosts DownloadPosts [

    … ] Check/Wait files on bucket PublishEnd /request/ Generate Request and Publish /status/ /downloadImage/ DownloadImage(s) DownloadImage(s) DownloadImage(s) [ … ] ProcessImage(s) ProcessImage(s) ProcessImage(s) SplitImagesJob(s) /downloadImage/ /processImage/ [ … ] Return update on status Kinesis Firehose
  20. Serverless checklist Function as the unit of deployment and scaling

    Implicitly fault-tolerant (improved by allowing event replay from Event Store) Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request REST compatible 9/8