Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak
December 04, 2019

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak

December 04, 2019
Tweet

More Decks by Ivan Čuljak

Other Decks in Programming

Transcript

  1. A bit about me • Ivan Čuljak • Shouting at

    clouds since 2012??? • Salvaging wrecks/putting out fires for a living • Freelancer & Azure MVP • Cloud Solutions Architect & owner @ Celeste Maze • Founder @ Cloud Bootstrap
  2. And then we added • more proxies • more clients/users

    • more VMs • more points of failure • more hours to keep it running • more hours to scale it
  3. Sooo... say hello to • additional queues • functions •

    containers • infrastructure as code • multiple clouds • dare we say Kubernetes?
  4. First thing to do was: • rewrite the desktop app

    into a web app • host the web app somewhere in the cloud • create a service out of the "engine" • "host" the service on those same VMs • connect those two through a queue in the cloud
  5. N.B. Functions can be hosted as "serverless" and as a

    service somewhere most of the problems listed can be solved more or less easily
  6. Rewriting the engine into functions will turn out great, mhmmm:

    • a function can run for 10 minutes • func => func calls going through queues • retaining state between executions • hard to catch and react to an exception • processing of a ticket could "spill" to another instance
  7. Evolution of "app hosting" [1/2] • a shared server •

    a server • a farm of servers • VMs on a server • VMs on a farm of servers
  8. Evolution of "app hosting" [2/2] • VMs running containers •

    Containers orchestrator running on top of VMs • Let's just forget it's all there and do our dev job...
  9. What serverless is NOT • Execution model • Managed service

    • Operational construct • Spectrum • Technology Big tnx to Jeremy Daly for this <3
  10. A journey that requires a shift in business • your

    architecture • your planning • your budgeting • your developers #allMessedUp
  11. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  12. step 1 => for each account created We provision a

    DB, a web app, a queue & a container
  13. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  14. step 3 => for each generated request Add a scheduled

    message to a queue to kill the container
  15. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  16. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  17. not only that, but we might want A new request

    to be processed with the same IP address
  18. Here's the workflow [1/2]: • "conductor" checks if there's the

    right container available • if not, we generate a GUID and a container • in addition to connection strings, that GUID is passed to the container
  19. Here's the workflow [2/2]: • container subscribes to a topic

    with that GUID as a filter • container sends a message to "conductor" saying "ready: GUID" • "conductor" sends requests as messages down the "queue" (topic)
  20. We listed out some "issues" • paying for the infrastructure

    24/7 • depending on proxies • some proxies were really slow • performance bottlenecks with our infrastructure • not automatically scaleable
  21. and we can run our stuff on Azure, AWS, GCP,

    Digital Ocean, Linode... even Kubernetes