Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons learned from building serverless, distributed architecture

Lessons learned from building serverless, distributed architecture

Presented at DevOps Days India 2017

Jalem Raj Rohit

September 15, 2017
Tweet

More Decks by Jalem Raj Rohit

Other Decks in Programming

Transcript

  1. Lessons learned from building
    serverless, distributed
    architecture

    View Slide

  2. Introduction
    I am Jalem Raj Rohit.
    Works on Devops and Machine Learning full-time.
    - Moderates the DevOps and DataScience
    sites of StackOverflow
    - Contributes to random OSS projects

    View Slide

  3. Setting the context
    - Serverless, distributed system for processing ML
    workloads
    - Upto 900 servers every run.
    - Batch architecture

    View Slide

  4. LESSONS LEARNED

    View Slide

  5. LESSON #1
    Always return your Lambda functions

    View Slide

  6. Always return your lambda functions
    - The cost of lambda functions can go from ‘meh’ to
    ‘OMFG’ really quick
    - A function which has not been returned is considered a
    failure by Lambda, and it keeps on retrying. [5 times]

    View Slide

  7. LESSON #2
    Monitoring and Logging is still an unconquered beast

    View Slide

  8. Monitoring and logging
    - Monitoring a serverless system is very tricky.
    - Adding the distributed systems paradigm to it doesn’t
    really help
    - Having a hosted server for monitoring serverless
    systems?

    View Slide

  9. View Slide

  10. Monitoring and logging (cont...)
    - Monitor the orchestration rather than trying to monitor
    all the servers
    - Use the cloud provider’s dashboard as much as possible
    - For logging, the closest best practise is to zip the log file
    and send to a data store before the server termination
    task

    View Slide

  11. LESSON #3
    Super-high scalability with relative ease

    View Slide

  12. Super high scalability
    - Super high scalability at a fraction of the costs
    - Can be made to scale seamlessly with demand

    View Slide

  13. LESSON #4
    If it is a distributed serverless system, it needs to be
    self-healing

    View Slide

  14. Self-healing
    - Debugging for a lost file or a faulty file in a distributed
    system is like finding a needle in a haystack
    - Thus, self-healing

    View Slide

  15. LESSON #5
    Having distributed system doesn’t necessarily mean the
    load is distributed equally

    View Slide

  16. Load Balancing
    - Improper or poorly done load balancing defeats the
    whole purpose of having distributed systems
    - Have proper load balancing techniques or algorithms in
    place wherever data is getting ingested

    View Slide

  17. LESSON #6
    Compliance automation is good. Let’s do more of it

    View Slide

  18. Compliance Automation
    - Boon for teams which have very strict compliance
    - No need to worry about the number of systems in
    production
    - Tag-based and boundary-based detection

    View Slide

  19. LESSON #7
    Debugging and fixing serverless distributed systems is
    extremely difficult

    View Slide

  20. Horrors of debugging/fixing serverless
    distributed systems
    - These systems run in a nohup mode
    - All the servers get terminated once the orchestration is
    completed
    - So, if late in killing the process, one needs to start all
    over again from the beginning

    View Slide

  21. Horrors of debugging/fixing serverless
    distributed systems
    - Watching the tail of the log file would save a lot of
    headache
    - The more distributed the workload is, the bigger hell it
    is for the developer

    View Slide

  22. LESSON #8
    Every distributed systems engineer deserves a hug

    View Slide

  23. THANK YOU
    ● Github: Dawny33
    ● Home: jrajrohit.me

    View Slide