Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Lambda simpler for data scientists

Nabarun Pal
July 27, 2019
16

Making Lambda simpler for data scientists

Presented at AWS Community Day Bengaluru 2019

Nabarun Pal

July 27, 2019
Tweet

More Decks by Nabarun Pal

Transcript

  1. About Me • Platform Engineer at rorodata • Optimizing development

    time through simple abstractions/tooling • Venturing into Container Orchestration and Serverless Computing • Contributor to the Kubernetes ecosystem
  2. Outline • Genesis • Present Constructs in Python - Threads

    and Processes • Kubernetes • Serverless • The Abstraction • Requirements • API • Internals • Demo • Performance Metrics • Current Limitations • Future Goals
  3. Multithreading Pros • Lightweight • Shared state between multiple threads

    • Works flawlessly for I/O-bound applications Cons • Subject to Global Interpreter Lock • Context switching overhead • Code prone to race conditions • Does not work for CPU-bound tasks
  4. Multiprocessing Pros • Isolation of memory space • Leverages multiples

    processors & cores • GIL limitations don’t apply • Synchronization primitives like locks are mandatory unless sharing data • Works well for CPU-bound tasks Cons • Sharing data between processes is a little bit complicated • Bulky memory footprint • Definite scaling
  5. Can Kubernetes help? Pros • Abstracts out infrastructure • Simple

    interface • Can scale based on workload Cons • Layer on top of VM’s - Slow to scale up/down • Autoscaling is not a core functionality • Needs dedicated time to manage
  6. What about Serverless? • Zero Infrastructure Management • Near Infinite

    Scaling • High Availability • No Idle Resources • Suitable for short-lived workloads
  7. Requirements • Minimum overhead on users • Simple way to

    create, delete, list and update lambda functions • Coherent ways to invoke the lambda function • Easy to use interface
  8. Features • CLI to create, list, update and delete functions

    • Support for specifying function layers and list the layers used for each functions • LambdaPool interface • LambdaExecutor Interface
  9. CLI

  10. Current Limitations • Serialization of the payload is being a

    hurdle • Decoupling between function provisioning and invocation • Size of execution environment Inherent to Serverless • Cold start issues • Additional Network Overhead • Not suitable for long running workloads • Troubleshooting is hard • Local testing
  11. Future Goals • Distribute lambdapool through PyPI • Permissions management

    system • System to fetch execution logs • Better layer management • Make the function update process intelligent