Multi-cloud multi-tenant serverless crawlers

Slide 1

Slide 1 text

Multi-cloud multi- tenant serverless crawlers Ivan Čuljak @CuljakIvan / [email protected]

Slide 2

Slide 2 text

A bit about me • Ivan Čuljak • Shouting at clouds since 2012??? • Salvaging wrecks/putting out ﬁres for a living • Freelancer & Azure MVP • Cloud Solutions Architect & owner @ Celeste Maze • Founder @ Cloud Bootstrap

Slide 3

Slide 3 text

you might be asking yourself But why???

Slide 4

Slide 4 text

it all started with Hold my beer

Slide 5

Slide 5 text

First there was a desktop app

Slide 6

Slide 6 text

Then we moved to a VM

Slide 7

Slide 7 text

Then we added a proxy

Slide 8

Slide 8 text

And then we added • more proxies • more clients/users • more VMs • more points of failure • more hours to keep it running • more hours to scale it

Slide 9

Slide 9 text

to be completely honest It was "good enough"

Slide 10

Slide 10 text

But we were bored <3

Slide 11

Slide 11 text

Sooo... say hello to • additional queues • functions • containers • infrastructure as code • multiple clouds • dare we say Kubernetes?

Slide 12

Slide 12 text

It worked out ﬁne so here's the story...

Slide 13

Slide 13 text

First things ﬁrst

Slide 14

Slide 14 text

First thing to do was: • rewrite the desktop app into a web app • host the web app somewhere in the cloud • create a service out of the "engine" • "host" the service on those same VMs • connect those two through a queue in the cloud

Slide 15

Slide 15 text

N.B. Functions can be hosted as "serverless" and as a service somewhere most of the problems listed can be solved more or less easily

Slide 16

Slide 16 text

Rewriting the engine into functions will turn out great, mhmmm: • a function can run for 10 minutes • func => func calls going through queues • retaining state between executions • hard to catch and react to an exception • processing of a ticket could "spill" to another instance

Slide 17

Slide 17 text

Small step for the project Huge waste of time for the team

Slide 18

Slide 18 text

if you can't rewrite it You can wrap it into a container

Slide 19

Slide 19 text

so we ended up with The exact same thing as before :)

Slide 20

Slide 20 text

Evolution of "app hosting" [1/2] • a shared server • a server • a farm of servers • VMs on a server • VMs on a farm of servers

Slide 21

Slide 21 text

Evolution of "app hosting" [2/2] • VMs running containers • Containers orchestrator running on top of VMs • Let's just forget it's all there and do our dev job...

Slide 22

Slide 22 text

What serverless is NOT • Execution model • Managed service • Operational construct • Spectrum • Technology Big tnx to Jeremy Daly for this <3

Slide 23

Slide 23 text

Serverless IS a really bad name

Slide 24

Slide 24 text

Serverless IS a methodology Big tnx to Jeremy Daly for this <3

Slide 25

Slide 25 text

Serverless IS a journey and not always a destination

Slide 26

Slide 26 text

A journey that requires a shift in business • your architecture • your planning • your budgeting • your developers #allMessedUp

Slide 27

Slide 27 text

Current state of "serverless"

Slide 28

Slide 28 text

Back to the drawing board

Slide 29

Slide 29 text

We listed out some "issues" • paying for the infrastructure 24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable

Slide 30

Slide 30 text

but the problems were interconnected So we needed to solve the most pressing "issue"

Slide 31

Slide 31 text

in order to scale automatically We needed IaC & we needed to trigger it from code

Slide 32

Slide 32 text

step 1 => for each account created We provision a DB, a web app, a queue & a container

Slide 33

Slide 33 text

Slide 34

Slide 34 text

step 2 => for each generated request We spin up a container if there's none

Slide 35

Slide 35 text

step 3 => for each generated request Add a scheduled message to a queue to kill the container

Slide 36

Slide 36 text

Slide 37

Slide 37 text

step 4 => for each generated request If(tooManyParallelRequest) spinUpContainer()

Slide 38

Slide 38 text

Slide 39

Slide 39 text

yay... "Only" proxies left on the list

Slide 40

Slide 40 text

why would this be hard? We can spin up containers across the globe

Slide 41

Slide 41 text

the engine was ﬁguring out Which proxy to use

Slide 42

Slide 42 text

but now, now we're having a container Figuring out which request to process

Slide 43

Slide 43 text

not only that, but we might want A new request to be processed with the same IP address

Slide 44

Slide 44 text

there were a lot of over-engineering ideas But we needed KISS

Slide 45

Slide 45 text

Here's the workﬂow [1/2]: • "conductor" checks if there's the right container available • if not, we generate a GUID and a container • in addition to connection strings, that GUID is passed to the container

Slide 46

Slide 46 text

Here's the workﬂow [2/2]: • container subscribes to a topic with that GUID as a ﬁlter • container sends a message to "conductor" saying "ready: GUID" • "conductor" sends requests as messages down the "queue" (topic)

Slide 47

Slide 47 text

Slide 48

Slide 48 text

belive it or not It's as simple as that

Slide 49

Slide 49 text

and we can run our stuff on Azure, AWS, GCP, Digital Ocean, Linode... even Kubernetes