Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak
December 04, 2019

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak

December 04, 2019
Tweet

More Decks by Ivan Čuljak

Other Decks in Programming

Transcript

  1. Multi-cloud multi-
    tenant serverless
    crawlers
    Ivan Čuljak
    @CuljakIvan / [email protected]

    View Slide

  2. A bit about me
    • Ivan Čuljak
    • Shouting at clouds since 2012???
    • Salvaging wrecks/putting out fires for a living
    • Freelancer & Azure MVP
    • Cloud Solutions Architect & owner @ Celeste Maze
    • Founder @ Cloud Bootstrap

    View Slide

  3. you might be asking yourself
    But why???

    View Slide

  4. it all started with
    Hold my beer

    View Slide

  5. First there was a desktop app

    View Slide

  6. Then we moved to a VM

    View Slide

  7. Then we added a proxy

    View Slide

  8. And then we added
    • more proxies
    • more clients/users
    • more VMs
    • more points of failure
    • more hours to keep it running
    • more hours to scale it

    View Slide

  9. to be completely honest
    It was "good enough"

    View Slide

  10. But we were bored <3

    View Slide

  11. Sooo... say hello to
    • additional queues
    • functions
    • containers
    • infrastructure as code
    • multiple clouds
    • dare we say Kubernetes?

    View Slide

  12. It worked out fine
    so here's the story...

    View Slide

  13. First things first

    View Slide

  14. First thing to do was:
    • rewrite the desktop app into a web app
    • host the web app somewhere in the cloud
    • create a service out of the "engine"
    • "host" the service on those same VMs
    • connect those two through a queue in the cloud

    View Slide

  15. N.B. Functions can be hosted as
    "serverless" and as a service
    somewhere
    most of the problems listed can be solved more or less
    easily

    View Slide

  16. Rewriting the engine into functions will
    turn out great, mhmmm:
    • a function can run for 10 minutes
    • func => func calls going through queues
    • retaining state between executions
    • hard to catch and react to an exception
    • processing of a ticket could "spill" to another instance

    View Slide

  17. Small step for the
    project
    Huge waste of time for the team

    View Slide

  18. if you can't rewrite it
    You can wrap it into a container

    View Slide

  19. so we ended up with
    The exact same thing as before :)

    View Slide

  20. Evolution of "app hosting" [1/2]
    • a shared server
    • a server
    • a farm of servers
    • VMs on a server
    • VMs on a farm of servers

    View Slide

  21. Evolution of "app hosting" [2/2]
    • VMs running containers
    • Containers orchestrator running on top of VMs
    • Let's just forget it's all there and do our dev job...

    View Slide

  22. What serverless is NOT
    • Execution model
    • Managed service
    • Operational construct
    • Spectrum
    • Technology
    Big tnx to Jeremy Daly for this <3

    View Slide

  23. Serverless IS a
    really bad name

    View Slide

  24. Serverless IS a
    methodology
    Big tnx to Jeremy Daly for this <3

    View Slide

  25. Serverless IS a
    journey
    and not always a destination

    View Slide

  26. A journey that requires a shift in business
    • your architecture
    • your planning
    • your budgeting
    • your developers
    #allMessedUp

    View Slide

  27. Current state of "serverless"

    View Slide

  28. Back to the drawing board

    View Slide

  29. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View Slide

  30. but the problems were interconnected
    So we needed to solve the most
    pressing "issue"

    View Slide

  31. in order to scale automatically
    We needed IaC & we needed to
    trigger it from code

    View Slide

  32. step 1 => for each account created
    We provision a DB, a web app, a
    queue & a container

    View Slide

  33. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View Slide

  34. step 2 => for each generated request
    We spin up a container if there's
    none

    View Slide

  35. step 3 => for each generated request
    Add a scheduled message to a
    queue to kill the container

    View Slide

  36. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View Slide

  37. step 4 => for each generated request
    If(tooManyParallelRequest)
    spinUpContainer()

    View Slide

  38. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View Slide

  39. yay...
    "Only" proxies left on the list

    View Slide

  40. why would this be hard?
    We can spin up containers across
    the globe

    View Slide

  41. the engine was figuring out
    Which proxy to use

    View Slide

  42. but now, now we're having a container
    Figuring out which request to
    process

    View Slide

  43. not only that, but we might want
    A new request to be processed
    with the same IP address

    View Slide

  44. there were a lot of over-engineering ideas
    But we needed KISS

    View Slide

  45. Here's the workflow [1/2]:
    • "conductor" checks if there's the right container available
    • if not, we generate a GUID and a container
    • in addition to connection strings, that GUID is passed to the
    container

    View Slide

  46. Here's the workflow [2/2]:
    • container subscribes to a topic with that GUID as a filter
    • container sends a message to "conductor" saying "ready:
    GUID"
    • "conductor" sends requests as messages down the
    "queue" (topic)

    View Slide

  47. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View Slide

  48. belive it or not
    It's as simple as that

    View Slide

  49. and we can run our stuff on
    Azure, AWS, GCP, Digital Ocean,
    Linode... even Kubernetes

    View Slide

  50. There's plenty left to say...
    ...but we're out of time :(

    View Slide

  51. Thank you <3
    Any questions?
    [email protected] / @CuljakIvan

    View Slide