Nabarun Pal (@theonlynabarun)
Making Lambda Simpler
for Data Scientists
Slide 2
Slide 2 text
About Me
● Platform Engineer at rorodata
● Optimizing development time through simple abstractions/tooling
● Venturing into Container Orchestration and Serverless Computing
● Contributor to the Kubernetes ecosystem
Slide 3
Slide 3 text
Outline
● Genesis
● Present Constructs in Python - Threads and Processes
● Kubernetes
● Serverless
● The Abstraction
● Requirements
● API
● Internals
● Demo
● Performance Metrics
● Current Limitations
● Future Goals
Slide 4
Slide 4 text
Genesis
Slide 5
Slide 5 text
Multithreading
Pros
● Lightweight
● Shared state between multiple threads
● Works flawlessly for I/O-bound applications
Cons
● Subject to Global Interpreter Lock
● Context switching overhead
● Code prone to race conditions
● Does not work for CPU-bound tasks
Slide 6
Slide 6 text
Multiprocessing
Pros
● Isolation of memory space
● Leverages multiples processors & cores
● GIL limitations don’t apply
● Synchronization primitives like locks are mandatory unless sharing data
● Works well for CPU-bound tasks
Cons
● Sharing data between processes is a little bit complicated
● Bulky memory footprint
● Definite scaling
Slide 7
Slide 7 text
Can Kubernetes help?
Pros
● Abstracts out infrastructure
● Simple interface
● Can scale based on workload
Cons
● Layer on top of VM’s - Slow to scale up/down
● Autoscaling is not a core functionality
● Needs dedicated time to manage
Slide 8
Slide 8 text
What about Serverless?
● Zero Infrastructure Management
● Near Infinite Scaling
● High Availability
● No Idle Resources
● Suitable for short-lived workloads
Slide 9
Slide 9 text
All problems in computer science can be
solved by another level of indirection
David Wheeler
Slide 10
Slide 10 text
LambdaPool
- “The Indirection”
Slide 11
Slide 11 text
Requirements
● Minimum overhead on users
● Simple way to create, delete, list and update lambda functions
● Coherent ways to invoke the lambda function
● Easy to use interface
Slide 12
Slide 12 text
Features
● CLI to create, list, update and delete functions
● Support for specifying function layers and list the layers used for each functions
● LambdaPool interface
● LambdaExecutor Interface
LambdaPool
● Implements the same interface as ThreadPool and ProcessPool
Slide 16
Slide 16 text
LambdaExecutor
● Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor
Slide 17
Slide 17 text
Demo
Slide 18
Slide 18 text
Internals - Creating/Updating a Function
Slide 19
Slide 19 text
Internals - Invoking a function
Slide 20
Slide 20 text
Benefits
● Compute Time
● Compute Costs
● Developer Time
Slide 21
Slide 21 text
Current Limitations
● Serialization of the payload is being a hurdle
● Decoupling between function provisioning and invocation
● Size of execution environment
Inherent to Serverless
● Cold start issues
● Additional Network Overhead
● Not suitable for long running workloads
● Troubleshooting is hard
● Local testing
Slide 22
Slide 22 text
Future Goals
● Distribute lambdapool through PyPI
● Permissions management system
● System to fetch execution logs
● Better layer management
● Make the function update process intelligent