Making Lambda simpler for data scientists

Slide 1

Slide 1 text

Nabarun Pal (@theonlynabarun) Making Lambda Simpler for Data Scientists

Slide 2

Slide 2 text

About Me ● Platform Engineer at rorodata ● Optimizing development time through simple abstractions/tooling ● Venturing into Container Orchestration and Serverless Computing ● Contributor to the Kubernetes ecosystem

Slide 3

Slide 3 text

Outline ● Genesis ● Present Constructs in Python - Threads and Processes ● Kubernetes ● Serverless ● The Abstraction ● Requirements ● API ● Internals ● Demo ● Performance Metrics ● Current Limitations ● Future Goals

Slide 4

Slide 4 text

Genesis

Slide 5

Slide 5 text

Multithreading Pros ● Lightweight ● Shared state between multiple threads ● Works flawlessly for I/O-bound applications Cons ● Subject to Global Interpreter Lock ● Context switching overhead ● Code prone to race conditions ● Does not work for CPU-bound tasks

Slide 6

Slide 6 text

Multiprocessing Pros ● Isolation of memory space ● Leverages multiples processors & cores ● GIL limitations don’t apply ● Synchronization primitives like locks are mandatory unless sharing data ● Works well for CPU-bound tasks Cons ● Sharing data between processes is a little bit complicated ● Bulky memory footprint ● Definite scaling

Slide 7

Slide 7 text

Can Kubernetes help? Pros ● Abstracts out infrastructure ● Simple interface ● Can scale based on workload Cons ● Layer on top of VM’s - Slow to scale up/down ● Autoscaling is not a core functionality ● Needs dedicated time to manage

Slide 8

Slide 8 text

What about Serverless? ● Zero Infrastructure Management ● Near Infinite Scaling ● High Availability ● No Idle Resources ● Suitable for short-lived workloads

Slide 9

Slide 9 text

All problems in computer science can be solved by another level of indirection David Wheeler

Slide 10

Slide 10 text

LambdaPool - “The Indirection”

Slide 11

Slide 11 text

Requirements ● Minimum overhead on users ● Simple way to create, delete, list and update lambda functions ● Coherent ways to invoke the lambda function ● Easy to use interface

Slide 12

Slide 12 text

Features ● CLI to create, list, update and delete functions ● Support for specifying function layers and list the layers used for each functions ● LambdaPool interface ● LambdaExecutor Interface

Slide 13

Slide 13 text

Installation $ pip install --user https://lambdapool-releases.s3.amazonaws.com/lambdapool-0.9.7.tar.gz

Slide 14

Slide 14 text

CLI

Slide 15

Slide 15 text

LambdaPool ● Implements the same interface as ThreadPool and ProcessPool

Slide 16

Slide 16 text

LambdaExecutor ● Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor

Slide 17

Slide 17 text

Demo

Slide 18

Slide 18 text

Internals - Creating/Updating a Function

Slide 19

Slide 19 text

Internals - Invoking a function

Slide 20

Slide 20 text

Benefits ● Compute Time ● Compute Costs ● Developer Time

Slide 21

Slide 21 text

Current Limitations ● Serialization of the payload is being a hurdle ● Decoupling between function provisioning and invocation ● Size of execution environment Inherent to Serverless ● Cold start issues ● Additional Network Overhead ● Not suitable for long running workloads ● Troubleshooting is hard ● Local testing

Slide 22

Slide 22 text

Future Goals ● Distribute lambdapool through PyPI ● Permissions management system ● System to fetch execution logs ● Better layer management ● Make the function update process intelligent

Slide 23

Slide 23 text

Conclusions https://github.com/rorodata/lambdapool

Slide 24

Slide 24 text

Thank You! Feedback: https://bit.ly/lambdapoolfeedback Contact Us: [email protected]