Making Lambda simpler for data scientists

Nabarun Pal (@theonlynabarun) Making Lambda Simpler for Data Scientists

About Me • Platform Engineer at rorodata • Optimizing development
time through simple abstractions/tooling • Venturing into Container Orchestration and Serverless Computing • Contributor to the Kubernetes ecosystem

Outline • Genesis • Present Constructs in Python - Threads
and Processes • Kubernetes • Serverless • The Abstraction • Requirements • API • Internals • Demo • Performance Metrics • Current Limitations • Future Goals

Genesis

Multithreading Pros • Lightweight • Shared state between multiple threads
• Works flawlessly for I/O-bound applications Cons • Subject to Global Interpreter Lock • Context switching overhead • Code prone to race conditions • Does not work for CPU-bound tasks

Multiprocessing Pros • Isolation of memory space • Leverages multiples
processors & cores • GIL limitations don’t apply • Synchronization primitives like locks are mandatory unless sharing data • Works well for CPU-bound tasks Cons • Sharing data between processes is a little bit complicated • Bulky memory footprint • Definite scaling

Can Kubernetes help? Pros • Abstracts out infrastructure • Simple
interface • Can scale based on workload Cons • Layer on top of VM’s - Slow to scale up/down • Autoscaling is not a core functionality • Needs dedicated time to manage

What about Serverless? • Zero Infrastructure Management • Near Infinite
Scaling • High Availability • No Idle Resources • Suitable for short-lived workloads

All problems in computer science can be solved by another
level of indirection David Wheeler

LambdaPool - “The Indirection”

Requirements • Minimum overhead on users • Simple way to
create, delete, list and update lambda functions • Coherent ways to invoke the lambda function • Easy to use interface

Features • CLI to create, list, update and delete functions
• Support for specifying function layers and list the layers used for each functions • LambdaPool interface • LambdaExecutor Interface

Installation $ pip install --user https://lambdapool-releases.s3.amazonaws.com/lambdapool-0.9.7.tar.gz

LambdaPool • Implements the same interface as ThreadPool and ProcessPool

LambdaExecutor • Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor

Internals - Creating/Updating a Function

Internals - Invoking a function

Benefits • Compute Time • Compute Costs • Developer Time

Current Limitations • Serialization of the payload is being a
hurdle • Decoupling between function provisioning and invocation • Size of execution environment Inherent to Serverless • Cold start issues • Additional Network Overhead • Not suitable for long running workloads • Troubleshooting is hard • Local testing

Future Goals • Distribute lambdapool through PyPI • Permissions management
system • System to fetch execution logs • Better layer management • Make the function update process intelligent

Conclusions https://github.com/rorodata/lambdapool

Thank You! Feedback: https://bit.ly/lambdapoolfeedback Contact Us: [email protected]

Making Lambda simpler for data scientists

Making Lambda simpler for data scientists

Nabarun Pal

More Decks by Nabarun Pal

Featured

Transcript

Nabarun Pal (@theonlynabarun) Making Lambda Simpler for Data Scientists

About Me • Platform Engineer at rorodata • Optimizing development

Outline • Genesis • Present Constructs in Python - Threads

Genesis

Multithreading Pros • Lightweight • Shared state between multiple threads

Multiprocessing Pros • Isolation of memory space • Leverages multiples

Can Kubernetes help? Pros • Abstracts out infrastructure • Simple

What about Serverless? • Zero Infrastructure Management • Near Infinite

All problems in computer science can be solved by another

LambdaPool - “The Indirection”

Requirements • Minimum overhead on users • Simple way to

Features • CLI to create, list, update and delete functions

Installation $ pip install --user https://lambdapool-releases.s3.amazonaws.com/lambdapool-0.9.7.tar.gz

CLI

LambdaPool • Implements the same interface as ThreadPool and ProcessPool

LambdaExecutor • Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor

Demo

Internals - Creating/Updating a Function

Internals - Invoking a function

Benefits • Compute Time • Compute Costs • Developer Time

Current Limitations • Serialization of the payload is being a

Future Goals • Distribute lambdapool through PyPI • Permissions management

Conclusions https://github.com/rorodata/lambdapool

Thank You! Feedback: https://bit.ly/lambdapoolfeedback Contact Us: [email protected]