Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Lambda simpler for data scientists

Nabarun Pal
July 27, 2019
13

Making Lambda simpler for data scientists

Presented at AWS Community Day Bengaluru 2019

Nabarun Pal

July 27, 2019
Tweet

Transcript

  1. Nabarun Pal (@theonlynabarun)
    Making Lambda Simpler
    for Data Scientists

    View full-size slide

  2. About Me
    ● Platform Engineer at rorodata
    ● Optimizing development time through simple abstractions/tooling
    ● Venturing into Container Orchestration and Serverless Computing
    ● Contributor to the Kubernetes ecosystem

    View full-size slide

  3. Outline
    ● Genesis
    ● Present Constructs in Python - Threads and Processes
    ● Kubernetes
    ● Serverless
    ● The Abstraction
    ● Requirements
    ● API
    ● Internals
    ● Demo
    ● Performance Metrics
    ● Current Limitations
    ● Future Goals

    View full-size slide

  4. Multithreading
    Pros
    ● Lightweight
    ● Shared state between multiple threads
    ● Works flawlessly for I/O-bound applications
    Cons
    ● Subject to Global Interpreter Lock
    ● Context switching overhead
    ● Code prone to race conditions
    ● Does not work for CPU-bound tasks

    View full-size slide

  5. Multiprocessing
    Pros
    ● Isolation of memory space
    ● Leverages multiples processors & cores
    ● GIL limitations don’t apply
    ● Synchronization primitives like locks are mandatory unless sharing data
    ● Works well for CPU-bound tasks
    Cons
    ● Sharing data between processes is a little bit complicated
    ● Bulky memory footprint
    ● Definite scaling

    View full-size slide

  6. Can Kubernetes help?
    Pros
    ● Abstracts out infrastructure
    ● Simple interface
    ● Can scale based on workload
    Cons
    ● Layer on top of VM’s - Slow to scale up/down
    ● Autoscaling is not a core functionality
    ● Needs dedicated time to manage

    View full-size slide

  7. What about Serverless?
    ● Zero Infrastructure Management
    ● Near Infinite Scaling
    ● High Availability
    ● No Idle Resources
    ● Suitable for short-lived workloads

    View full-size slide

  8. All problems in computer science can be
    solved by another level of indirection
    David Wheeler

    View full-size slide

  9. LambdaPool
    - “The Indirection”

    View full-size slide

  10. Requirements
    ● Minimum overhead on users
    ● Simple way to create, delete, list and update lambda functions
    ● Coherent ways to invoke the lambda function
    ● Easy to use interface

    View full-size slide

  11. Features
    ● CLI to create, list, update and delete functions
    ● Support for specifying function layers and list the layers used for each functions
    ● LambdaPool interface
    ● LambdaExecutor Interface

    View full-size slide

  12. Installation
    $ pip install --user https://lambdapool-releases.s3.amazonaws.com/lambdapool-0.9.7.tar.gz

    View full-size slide

  13. LambdaPool
    ● Implements the same interface as ThreadPool and ProcessPool

    View full-size slide

  14. LambdaExecutor
    ● Implements the same interface as ThreadPoolExecutor and ProcessPoolExecutor

    View full-size slide

  15. Internals - Creating/Updating a Function

    View full-size slide

  16. Internals - Invoking a function

    View full-size slide

  17. Benefits
    ● Compute Time
    ● Compute Costs
    ● Developer Time

    View full-size slide

  18. Current Limitations
    ● Serialization of the payload is being a hurdle
    ● Decoupling between function provisioning and invocation
    ● Size of execution environment
    Inherent to Serverless
    ● Cold start issues
    ● Additional Network Overhead
    ● Not suitable for long running workloads
    ● Troubleshooting is hard
    ● Local testing

    View full-size slide

  19. Future Goals
    ● Distribute lambdapool through PyPI
    ● Permissions management system
    ● System to fetch execution logs
    ● Better layer management
    ● Make the function update process intelligent

    View full-size slide

  20. Conclusions
    https://github.com/rorodata/lambdapool

    View full-size slide

  21. Thank You!
    Feedback: https://bit.ly/lambdapoolfeedback Contact Us: [email protected]

    View full-size slide