Multi-cloud multi-tenant serverless crawlers

Multi-cloud multi- tenant serverless crawlers Ivan Čuljak @CuljakIvan / [email protected]

A bit about me • Ivan Čuljak • Shouting at
clouds since 2012??? • Salvaging wrecks/putting out ﬁres for a living • Freelancer & Azure MVP • Cloud Solutions Architect & owner @ Celeste Maze • Founder @ Cloud Bootstrap

you might be asking yourself But why???

it all started with Hold my beer

First there was a desktop app

Then we moved to a VM

Then we added a proxy

And then we added • more proxies • more clients/users
• more VMs • more points of failure • more hours to keep it running • more hours to scale it

to be completely honest It was "good enough"

But we were bored <3

Sooo... say hello to • additional queues • functions •
containers • infrastructure as code • multiple clouds • dare we say Kubernetes?

It worked out ﬁne so here's the story...

First things ﬁrst

First thing to do was: • rewrite the desktop app
into a web app • host the web app somewhere in the cloud • create a service out of the "engine" • "host" the service on those same VMs • connect those two through a queue in the cloud

N.B. Functions can be hosted as "serverless" and as a
service somewhere most of the problems listed can be solved more or less easily

Rewriting the engine into functions will turn out great, mhmmm:
• a function can run for 10 minutes • func => func calls going through queues • retaining state between executions • hard to catch and react to an exception • processing of a ticket could "spill" to another instance

Small step for the project Huge waste of time for
the team

if you can't rewrite it You can wrap it into
a container

so we ended up with The exact same thing as
before :)

Evolution of "app hosting" [1/2] • a shared server •
a server • a farm of servers • VMs on a server • VMs on a farm of servers

Evolution of "app hosting" [2/2] • VMs running containers •
Containers orchestrator running on top of VMs • Let's just forget it's all there and do our dev job...

What serverless is NOT • Execution model • Managed service
• Operational construct • Spectrum • Technology Big tnx to Jeremy Daly for this <3

Serverless IS a really bad name

Serverless IS a methodology Big tnx to Jeremy Daly for
this <3

Serverless IS a journey and not always a destination

A journey that requires a shift in business • your
architecture • your planning • your budgeting • your developers #allMessedUp

Current state of "serverless"

Back to the drawing board

We listed out some "issues" • paying for the infrastructure
24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable

but the problems were interconnected So we needed to solve
the most pressing "issue"

in order to scale automatically We needed IaC & we
needed to trigger it from code

step 1 => for each account created We provision a
DB, a web app, a queue & a container

step 2 => for each generated request We spin up
a container if there's none

step 3 => for each generated request Add a scheduled
message to a queue to kill the container

step 4 => for each generated request If(tooManyParallelRequest) spinUpContainer()

yay... "Only" proxies left on the list

why would this be hard? We can spin up containers
across the globe

the engine was ﬁguring out Which proxy to use

but now, now we're having a container Figuring out which
request to process

not only that, but we might want A new request
to be processed with the same IP address

there were a lot of over-engineering ideas But we needed
KISS

Here's the workﬂow [1/2]: • "conductor" checks if there's the
right container available • if not, we generate a GUID and a container • in addition to connection strings, that GUID is passed to the container

Here's the workﬂow [2/2]: • container subscribes to a topic
with that GUID as a ﬁlter • container sends a message to "conductor" saying "ready: GUID" • "conductor" sends requests as messages down the "queue" (topic)

belive it or not It's as simple as that

and we can run our stuff on Azure, AWS, GCP,
Digital Ocean, Linode... even Kubernetes

There's plenty left to say... ...but we're out of time
:(

Thank you <3 Any questions? [email protected] / @CuljakIvan

Multi-cloud multi-tenant serverless crawlers

Multi-cloud multi-tenant serverless crawlers

More Decks by Ivan Čuljak

Other Decks in Programming

Featured

Transcript