Serverless Event Sourcing

Event sourcing on serverless architectures Serverless Architectures Luca Bianchi —
Neosperience github.com/aletheia it.linkedin.com/in/lucabianchipavia @bianchiluca medium.com/@aletheia

Who am I? • Chief Technology Ofﬁcer @ Neosperience •
Working on a lot of bleeding edge technologies • Passionate developer: love writing code, hate meetings The Neosperience Cloud • Software as a service cloud for Digital Customer Experience processes   (Psychographics, Loyalty & Gamiﬁcation, Proximity, Content, etc.) • Built on AWS, 95% on Serverless technologies • Moved from VMWare, to EC2, to Elastic Beanstalk, to Serverless • Dozens of micro and nano services The Digital Customer Experience Company, aims to change the way brands and customers interact with an approach of a software vendor targeting Digital Customer Experience as the evolution of Marketing automation.

What is Serverless? “Serverless architecture replaces long-running virtual machines with
ephemeral compute power that comes into existence on request and disappears immediately after use. Use of this architecture can mitigate some security concerns such as security patching and SSH access control, and can make much more efﬁcient use of compute resources. These systems cost very little to operate and can have inbuilt scaling features.” — ThoughtWorks, 2016

The Serverless Manifesto Function as the unit of deployment and
scaling Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request

It all started with an event.. • events are a
great way to decouple services • largely abused in the last decade (ESB) • now revamped with CQRS and Event Sourcing to handle microservices communication • serverless functions handle events • need a different approach on architectures

Sample use case: download Facebook profiles for ML 1. a
set of FB auth tokens are sent to our system 2. jobs to download user images and posts are started 3. downloaded images are classified through Rekognition 4. posts are processed to perform text mining 5. downloaded profile is updated 6. completion acknowledge sent back to the caller (with stats) many architectures choices, how to evaluate them..?

“Bad code haunts you for months… ..bad architectures harms you
for decades.”

• In 2015 Tim Wagner, General Manager @ AWS Serverless
Compute, presented to the ServerlessConf the so-called Serverless Manifesto. • In 2017 Danilo Poccia, Technical Evangelist @ AWS, talking at JeffConf asked “what would be having that manifesto translated into a checklist to evaluate Serverless compliance of an architecture?” • So, we turned it into a “Serverless Scorecard” Evaluating a serverless architecture (aka how big is my serverless?) The Serverless Scorecard Function as the unit of deployment and scaling Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request

Lambda Orchestrating Lambdas The most naive approach is having a
distributed monolith with a single endpoint receiving a set of FB tokens and mapping to a Lambda function invoking a workﬂow for each user (made of many lambdas) Pains: • Orchestrator lambda has to wait until the last workﬂow has been completed • NodeJS works better than other languages due to its asynchronous programming model IDEA 01 Orchestrator Job Orchestrator DownloadPosts DownloadImages ProcessImages Job Orchestrator DownloadPosts DownloadImages ProcessImages [ … one for each user … ]

Serverless checklist Function as the unit of deployment and scaling
Stateless Implicitly fault-tolerant Metrics No machines, VMs, or containers Never pay for idle Scales per request Bring Your Own Code 4/8

Dynamo Orchestration Using a DynamoDB Table to decouple jobs based
on FB token. Lambda are invoked asynchronously using DynamoDB streams. Workflow completion is done as an update on Table record Pains: • Orchestrator lambda has to wait until the last workflow has been completed (checked on Dynamo) • Each Lambda still has to wait for other lambda to have finished IDEA 02 Orchestrator Job Orchestrator DownloadPosts DownloadImages ProcessImages

Stateless Implicitly fault-tolerant Metrics No machines, VMs, or containers Never pay for idle Scales per request Bring Your Own Code 5/8

Kinesis Orchestration Use Kinesis to post jobs after Orchestrator processed
and DynamoDB Table to check completion of a task for a speciﬁc user. Dynamo and Kinesis keep the state of the system Pains: • Orchestrator lambda has to wait until the last workﬂow has been completed (checked on Dynamo) • Can’t know at which my system current state IDEA 03 Orchestrator DownloadPosts DownloadImages ProcessImages

Kinesis + StepFunction Improve system state management using StepFunction to
trigger different Lambdas, defer to Kinesis only batch execution within a single task. Implement End-of-Task event within Kinesis to signal processing is completed Pains: • Orchestrator lambda has to wait until the last workﬂow has been completed (checked on Dynamo) • Kinesis parallelism is through shard and partition key (i.e. userId) IDEA 04 Orchestrator DownloadPosts DownloadImages ProcessImages

Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Never pay for idle Scales per request Stateless 6/8

S3 Event Sourcing Implement architecture using S3 as Event Store.
Write a file for each computation step and let streams to parallel trigger Lambda invocation. Use SepFunction to check processing status globally (configure polling through wait/check states) Pains: • Client needs to use AWS SDK and conform to S3 data model or a Lambda has to proxy it • No event failover if one file is lost • No event replay from Event Store IDEA 05 Write file with token SplitPostsJobs DownloadPosts DownloadPosts DownloadPosts [ … ] Check/Wait files on bucket DownloadImage(s) DownloadImage(s) DownloadImage(s) [ … ] ProcessImage(s) ProcessImage(s) ProcessImage(s) SplitImagesJob(s) WriteEndFile

Implicitly fault-tolerant Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request 8/8

Some pains still remain.. • Client needs to use AWS
SDK and conform to S3 data model or a Lambda has to proxy it • No event failover if one ﬁle is lost • No event replay from Event Store Is a better solution possible?

Introducing Internet of Functions (IoF) • IoT topics subscribed by
Lambda  to be notiﬁed when a new event is available (either a job request or a response) • IoT topics can be created on-the-ﬂy  so no CF setup is needed, just having IoT active in AWS account • Functions can subscribe to any topic  through IoT Rules any Lambda can be triggered by a new event • Firehose can persist events into S3  event replay from EventStore (S3) and failover can be achieved (and it’s Serverless!) • Async implicitly supported • REST endpoints decoupling  A Lambda connected to APIG generates a requestID and returns from HTTP request with a token to poll for completion

Internet of Functions IDEA 06 SplitPostsJobs DownloadPosts DownloadPosts DownloadPosts [
… ] Check/Wait ﬁles on bucket PublishEnd /request/ Generate Request and Publish /status/ /downloadImage/ DownloadImage(s) DownloadImage(s) DownloadImage(s) [ … ] ProcessImage(s) ProcessImage(s) ProcessImage(s) SplitImagesJob(s) /downloadImage/ /processImage/ [ … ] Return update on status Kinesis Firehose

Implicitly fault-tolerant (improved by allowing event replay from Event Store) Metrics No machines, VMs, or containers Bring Your Own Code Stateless Never pay for idle Scales per request REST compatible 9/8

We did it!!

http://bit.ly/sls-starter One more thing… starter-serverless-nodejs

slides @ http://bit.ly/sls-20171122

Serverless Event Sourcing

Serverless Event Sourcing

Aletheia

More Decks by Aletheia

Other Decks in Technology

Featured

Transcript

Event sourcing on serverless architectures Serverless Architectures Luca Bianchi —

Who am I? • Chief Technology Ofﬁcer @ Neosperience •

What is Serverless? “Serverless architecture replaces long-running virtual machines with

The Serverless Manifesto Function as the unit of deployment and

It all started with an event.. • events are a

Sample use case: download Facebook proﬁles for ML 1. a

“Bad code haunts you for months… ..bad architectures harms you

• In 2015 Tim Wagner, General Manager @ AWS Serverless

Lambda Orchestrating Lambdas The most naive approach is having a

Serverless checklist Function as the unit of deployment and scaling

Dynamo Orchestration Using a DynamoDB Table to decouple jobs based

Serverless checklist Function as the unit of deployment and scaling

Kinesis Orchestration Use Kinesis to post jobs after Orchestrator processed

Kinesis + StepFunction Improve system state management using StepFunction to

Serverless checklist Function as the unit of deployment and scaling

S3 Event Sourcing Implement architecture using S3 as Event Store.

Serverless checklist Function as the unit of deployment and scaling

Some pains still remain.. • Client needs to use AWS

Introducing Internet of Functions (IoF) • IoT topics subscribed by

Internet of Functions IDEA 06 SplitPostsJobs DownloadPosts DownloadPosts DownloadPosts [

Serverless checklist Function as the unit of deployment and scaling

We did it!!

http://bit.ly/sls-starter One more thing… starter-serverless-nodejs

slides @ http://bit.ly/sls-20171122