Multi-cloud multi-tenant serverless crawlers

Multi-cloud multi-tenant serverless crawlers

402476a4ba88f9ef948a17ee96d1f360?s=128

Ivan Čuljak

December 04, 2019
Tweet

Transcript

  1. Multi-cloud multi- tenant serverless crawlers Ivan Čuljak @CuljakIvan / ivan@culjak.xyz

  2. A bit about me • Ivan Čuljak • Shouting at

    clouds since 2012??? • Salvaging wrecks/putting out fires for a living • Freelancer & Azure MVP • Cloud Solutions Architect & owner @ Celeste Maze • Founder @ Cloud Bootstrap
  3. you might be asking yourself But why???

  4. it all started with Hold my beer

  5. First there was a desktop app

  6. Then we moved to a VM

  7. Then we added a proxy

  8. And then we added • more proxies • more clients/users

    • more VMs • more points of failure • more hours to keep it running • more hours to scale it
  9. to be completely honest It was "good enough"

  10. But we were bored <3

  11. Sooo... say hello to • additional queues • functions •

    containers • infrastructure as code • multiple clouds • dare we say Kubernetes?
  12. It worked out fine so here's the story...

  13. First things first

  14. First thing to do was: • rewrite the desktop app

    into a web app • host the web app somewhere in the cloud • create a service out of the "engine" • "host" the service on those same VMs • connect those two through a queue in the cloud
  15. N.B. Functions can be hosted as "serverless" and as a

    service somewhere most of the problems listed can be solved more or less easily
  16. Rewriting the engine into functions will turn out great, mhmmm:

    • a function can run for 10 minutes • func => func calls going through queues • retaining state between executions • hard to catch and react to an exception • processing of a ticket could "spill" to another instance
  17. Small step for the project Huge waste of time for

    the team
  18. if you can't rewrite it You can wrap it into

    a container
  19. so we ended up with The exact same thing as

    before :)
  20. Evolution of "app hosting" [1/2] • a shared server •

    a server • a farm of servers • VMs on a server • VMs on a farm of servers
  21. Evolution of "app hosting" [2/2] • VMs running containers •

    Containers orchestrator running on top of VMs • Let's just forget it's all there and do our dev job...
  22. What serverless is NOT • Execution model • Managed service

    • Operational construct • Spectrum • Technology Big tnx to Jeremy Daly for this <3
  23. Serverless IS a really bad name

  24. Serverless IS a methodology Big tnx to Jeremy Daly for

    this <3
  25. Serverless IS a journey and not always a destination

  26. A journey that requires a shift in business • your

    architecture • your planning • your budgeting • your developers #allMessedUp
  27. Current state of "serverless"

  28. Back to the drawing board

  29. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  30. but the problems were interconnected So we needed to solve

    the most pressing "issue"
  31. in order to scale automatically We needed IaC & we

    needed to trigger it from code
  32. step 1 => for each account created We provision a

    DB, a web app, a queue & a container
  33. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  34. step 2 => for each generated request We spin up

    a container if there's none
  35. step 3 => for each generated request Add a scheduled

    message to a queue to kill the container
  36. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  37. step 4 => for each generated request If(tooManyParallelRequest) spinUpContainer()

  38. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  39. yay... "Only" proxies left on the list

  40. why would this be hard? We can spin up containers

    across the globe
  41. the engine was figuring out Which proxy to use

  42. but now, now we're having a container Figuring out which

    request to process
  43. not only that, but we might want A new request

    to be processed with the same IP address
  44. there were a lot of over-engineering ideas But we needed

    KISS
  45. Here's the workflow [1/2]: • "conductor" checks if there's the

    right container available • if not, we generate a GUID and a container • in addition to connection strings, that GUID is passed to the container
  46. Here's the workflow [2/2]: • container subscribes to a topic

    with that GUID as a filter • container sends a message to "conductor" saying "ready: GUID" • "conductor" sends requests as messages down the "queue" (topic)
  47. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  48. belive it or not It's as simple as that

  49. and we can run our stuff on Azure, AWS, GCP,

    Digital Ocean, Linode... even Kubernetes
  50. There's plenty left to say... ...but we're out of time

    :(
  51. Thank you <3 Any questions? ivan@culjak.xyz / @CuljakIvan