Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Lambda simpler for data scientists

Nabarun Pal
July 27, 2019
6

Making Lambda simpler for data scientists

Presented at AWS Community Day Bengaluru 2019

Nabarun Pal

July 27, 2019
Tweet

Transcript

  1. Nabarun Pal (@theonlynabarun)
    Making Lambda Simpler
    for Data Scientists

    View Slide

  2. About Me
    ● Platform Engineer at rorodata
    ● Optimizing development time through simple abstractions/tooling
    ● Venturing into Container Orchestration and Serverless Computing
    ● Contributor to the Kubernetes ecosystem

    View Slide

  3. Outline
    ● Genesis
    ● Present Constructs in Python - Threads and Processes
    ● Kubernetes
    ● Serverless
    ● The Abstraction
    ● Requirements
    ● API
    ● Internals
    ● Demo
    ● Performance Metrics
    ● Current Limitations
    ● Future Goals

    View Slide

  4. Genesis

    View Slide

  5. Multithreading
    Pros
    ● Lightweight
    ● Shared state between multiple threads
    ● Works flawlessly for I/O-bound applications
    Cons
    ● Subject to Global Interpreter Lock
    ● Context switching overhead
    ● Code prone to race conditions
    ● Does not work for CPU-bound tasks

    View Slide

  6. Multiprocessing
    Pros
    ● Isolation of memory space
    ● Leverages multiples processors & cores
    ● GIL limitations don’t apply
    ● Synchronization primitives like locks are mandatory unless sharing data
    ● Works well for CPU-bound tasks
    Cons
    ● Sharing data between processes is a little bit complicated
    ● Bulky memory footprint
    ● Definite scaling

    View Slide

  7. Can Kubernetes help?
    Pros
    ● Abstracts out infrastructure
    ● Simple interface
    ● Can scale based on workload
    Cons
    ● Layer on top of VM’s - Slow to scale up/down
    ● Autoscaling is not a core functionality
    ● Needs dedicated time to manage

    View Slide

  8. What about Serverless?
    ● Zero Infrastructure Management
    ● Near Infinite Scaling
    ● High Availability
    ● No Idle Resources
    ● Suitable for short-lived workloads

    View Slide

  9. All problems in computer science can be
    solved by another level of indirection
    David Wheeler

    View Slide

  10. LambdaPool
    - “The Indirection”

    View Slide

  11. Requirements
    ● Minimum overhead on users
    ● Simple way to create, delete, list and update lambda functions
    ● Coherent ways to invoke the lambda function
    ● Easy to use interface

    View Slide

  12. Features
    ● CLI to create, list, update and delete functions
    ● Support for specifying function layers and list the layers used for each functions
    ● LambdaPool interface
    ● LambdaExecutor Interface

    View Slide

  13. Installation
    $ pip install --user https://lambdapool-releases.s3.amazonaws.com/lambdapool-0.9.7.tar.gz

    View Slide

  14. CLI

    View Slide

  15. LambdaPool
    ● Implements the same interface as ThreadPool and ProcessPool

    View Slide

  16. LambdaExecutor
    ● Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor

    View Slide

  17. Demo

    View Slide

  18. Internals - Creating/Updating a Function

    View Slide

  19. Internals - Invoking a function

    View Slide

  20. Benefits
    ● Compute Time
    ● Compute Costs
    ● Developer Time

    View Slide

  21. Current Limitations
    ● Serialization of the payload is being a hurdle
    ● Decoupling between function provisioning and invocation
    ● Size of execution environment
    Inherent to Serverless
    ● Cold start issues
    ● Additional Network Overhead
    ● Not suitable for long running workloads
    ● Troubleshooting is hard
    ● Local testing

    View Slide

  22. Future Goals
    ● Distribute lambdapool through PyPI
    ● Permissions management system
    ● System to fetch execution logs
    ● Better layer management
    ● Make the function update process intelligent

    View Slide

  23. Conclusions
    https://github.com/rorodata/lambdapool

    View Slide

  24. Thank You!
    Feedback: https://bit.ly/lambdapoolfeedback Contact Us: [email protected]

    View Slide