Lessons learned from building serverless, distributed architecture

Introduction I am Jalem Raj Rohit. Works on Devops and
Machine Learning full-time. - Moderates the DevOps and DataScience sites of StackOverflow - Contributes to random OSS projects

Setting the context - Serverless, distributed system for processing ML
workloads - Upto 900 servers every run. - Batch architecture

LESSONS LEARNED

LESSON #1 Always return your Lambda functions

Always return your lambda functions - The cost of lambda
functions can go from ‘meh’ to ‘OMFG’ really quick - A function which has not been returned is considered a failure by Lambda, and it keeps on retrying. [5 times]

LESSON #2 Monitoring and Logging is still an unconquered beast

Monitoring and logging - Monitoring a serverless system is very
tricky. - Adding the distributed systems paradigm to it doesn’t really help - Having a hosted server for monitoring serverless systems?

Monitoring and logging (cont...) - Monitor the orchestration rather than
trying to monitor all the servers - Use the cloud provider’s dashboard as much as possible - For logging, the closest best practise is to zip the log file and send to a data store before the server termination task

LESSON #3 Super-high scalability with relative ease

Super high scalability - Super high scalability at a fraction
of the costs - Can be made to scale seamlessly with demand

LESSON #4 If it is a distributed serverless system, it
needs to be self-healing

Self-healing - Debugging for a lost file or a faulty
file in a distributed system is like finding a needle in a haystack - Thus, self-healing

LESSON #5 Having distributed system doesn’t necessarily mean the load
is distributed equally

Load Balancing - Improper or poorly done load balancing defeats
the whole purpose of having distributed systems - Have proper load balancing techniques or algorithms in place wherever data is getting ingested

LESSON #6 Compliance automation is good. Let’s do more of
it

Compliance Automation - Boon for teams which have very strict
compliance - No need to worry about the number of systems in production - Tag-based and boundary-based detection

LESSON #7 Debugging and fixing serverless distributed systems is extremely
difficult

Horrors of debugging/fixing serverless distributed systems - These systems run
in a nohup mode - All the servers get terminated once the orchestration is completed - So, if late in killing the process, one needs to start all over again from the beginning

Horrors of debugging/fixing serverless distributed systems - Watching the tail
of the log file would save a lot of headache - The more distributed the workload is, the bigger hell it is for the developer

LESSON #8 Every distributed systems engineer deserves a hug

THANK YOU • Github: Dawny33 • Home: jrajrohit.me

Lessons learned from building serverless, distr...

Lessons learned from building serverless, distributed architecture

Jalem Raj Rohit

More Decks by Jalem Raj Rohit

Other Decks in Programming

Featured

Transcript