Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak
December 04, 2019

Multi-cloud multi-tenant serverless crawlers

Ivan Čuljak

December 04, 2019
Tweet

More Decks by Ivan Čuljak

Other Decks in Programming

Transcript

  1. Multi-cloud multi-
    tenant serverless
    crawlers
    Ivan Čuljak
    @CuljakIvan / [email protected]

    View full-size slide

  2. A bit about me
    • Ivan Čuljak
    • Shouting at clouds since 2012???
    • Salvaging wrecks/putting out fires for a living
    • Freelancer & Azure MVP
    • Cloud Solutions Architect & owner @ Celeste Maze
    • Founder @ Cloud Bootstrap

    View full-size slide

  3. you might be asking yourself
    But why???

    View full-size slide

  4. it all started with
    Hold my beer

    View full-size slide

  5. First there was a desktop app

    View full-size slide

  6. Then we moved to a VM

    View full-size slide

  7. Then we added a proxy

    View full-size slide

  8. And then we added
    • more proxies
    • more clients/users
    • more VMs
    • more points of failure
    • more hours to keep it running
    • more hours to scale it

    View full-size slide

  9. to be completely honest
    It was "good enough"

    View full-size slide

  10. But we were bored <3

    View full-size slide

  11. Sooo... say hello to
    • additional queues
    • functions
    • containers
    • infrastructure as code
    • multiple clouds
    • dare we say Kubernetes?

    View full-size slide

  12. It worked out fine
    so here's the story...

    View full-size slide

  13. First things first

    View full-size slide

  14. First thing to do was:
    • rewrite the desktop app into a web app
    • host the web app somewhere in the cloud
    • create a service out of the "engine"
    • "host" the service on those same VMs
    • connect those two through a queue in the cloud

    View full-size slide

  15. N.B. Functions can be hosted as
    "serverless" and as a service
    somewhere
    most of the problems listed can be solved more or less
    easily

    View full-size slide

  16. Rewriting the engine into functions will
    turn out great, mhmmm:
    • a function can run for 10 minutes
    • func => func calls going through queues
    • retaining state between executions
    • hard to catch and react to an exception
    • processing of a ticket could "spill" to another instance

    View full-size slide

  17. Small step for the
    project
    Huge waste of time for the team

    View full-size slide

  18. if you can't rewrite it
    You can wrap it into a container

    View full-size slide

  19. so we ended up with
    The exact same thing as before :)

    View full-size slide

  20. Evolution of "app hosting" [1/2]
    • a shared server
    • a server
    • a farm of servers
    • VMs on a server
    • VMs on a farm of servers

    View full-size slide

  21. Evolution of "app hosting" [2/2]
    • VMs running containers
    • Containers orchestrator running on top of VMs
    • Let's just forget it's all there and do our dev job...

    View full-size slide

  22. What serverless is NOT
    • Execution model
    • Managed service
    • Operational construct
    • Spectrum
    • Technology
    Big tnx to Jeremy Daly for this <3

    View full-size slide

  23. Serverless IS a
    really bad name

    View full-size slide

  24. Serverless IS a
    methodology
    Big tnx to Jeremy Daly for this <3

    View full-size slide

  25. Serverless IS a
    journey
    and not always a destination

    View full-size slide

  26. A journey that requires a shift in business
    • your architecture
    • your planning
    • your budgeting
    • your developers
    #allMessedUp

    View full-size slide

  27. Current state of "serverless"

    View full-size slide

  28. Back to the drawing board

    View full-size slide

  29. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View full-size slide

  30. but the problems were interconnected
    So we needed to solve the most
    pressing "issue"

    View full-size slide

  31. in order to scale automatically
    We needed IaC & we needed to
    trigger it from code

    View full-size slide

  32. step 1 => for each account created
    We provision a DB, a web app, a
    queue & a container

    View full-size slide

  33. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View full-size slide

  34. step 2 => for each generated request
    We spin up a container if there's
    none

    View full-size slide

  35. step 3 => for each generated request
    Add a scheduled message to a
    queue to kill the container

    View full-size slide

  36. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View full-size slide

  37. step 4 => for each generated request
    If(tooManyParallelRequest)
    spinUpContainer()

    View full-size slide

  38. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View full-size slide

  39. yay...
    "Only" proxies left on the list

    View full-size slide

  40. why would this be hard?
    We can spin up containers across
    the globe

    View full-size slide

  41. the engine was figuring out
    Which proxy to use

    View full-size slide

  42. but now, now we're having a container
    Figuring out which request to
    process

    View full-size slide

  43. not only that, but we might want
    A new request to be processed
    with the same IP address

    View full-size slide

  44. there were a lot of over-engineering ideas
    But we needed KISS

    View full-size slide

  45. Here's the workflow [1/2]:
    • "conductor" checks if there's the right container available
    • if not, we generate a GUID and a container
    • in addition to connection strings, that GUID is passed to the
    container

    View full-size slide

  46. Here's the workflow [2/2]:
    • container subscribes to a topic with that GUID as a filter
    • container sends a message to "conductor" saying "ready:
    GUID"
    • "conductor" sends requests as messages down the
    "queue" (topic)

    View full-size slide

  47. We listed out some "issues"
    • paying for the infrastructure 24/7
    • depending on proxies
    • some proxies were really slow
    • performance bottlenecks with our infrastructure
    • not automatically scaleable

    View full-size slide

  48. belive it or not
    It's as simple as that

    View full-size slide

  49. and we can run our stuff on
    Azure, AWS, GCP, Digital Ocean,
    Linode... even Kubernetes

    View full-size slide

  50. There's plenty left to say...
    ...but we're out of time :(

    View full-size slide

  51. Thank you <3
    Any questions?
    [email protected] / @CuljakIvan

    View full-size slide