Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anaconda Project

Anaconda Project

Talk Data to me
An introduction to data science portability, development and deployment
General Assembly, ATX
April 2017

Christine Doig

April 04, 2017
Tweet

More Decks by Christine Doig

Other Decks in Technology

Transcript

  1. © 2016 Continuum Analytics - Confidential & Proprietary
    © 2016 Continuum Analytics - Confidential & Proprietary
    Anaconda Project
    An introduction to data science portability,
    development and deployment
    Christine Doig, Product Manager and Senior Data Scientist
    Continuum Analytics

    View Slide

  2. © 2016 Continuum Analytics - Confidential & Proprietary
    • What’s data science?
    • Introduction to Anaconda
    • Data science development and deployment
    • Anaconda and Docker
    • Anaconda Project
    • Anaconda Enterprise
    Agenda
    2

    View Slide

  3. What’s data science?

    View Slide

  4. © 2016 Continuum Analytics - Confidential & Proprietary 4
    Data Science is
    not just Machine
    Learning…
    Distributed 

    Systems
    Business 

    Intelligence
    Machine Learning

    & Statistics
    Software
    & Web
    development
    Scientific 

    Computing & HPC

    View Slide

  5. © 2016 Continuum Analytics - Confidential & Proprietary 5
    Distributed 

    Systems
    Business 

    Intelligence
    Machine Learning

    & Statistics
    Software
    & Web
    development
    Scientific 

    Computing & HPC
    Classification, deep learning,
    Regression, PCA
    distributed file system,
    message passsing,
    schedulers, resource managers
    Web crawling, scraping, 3rd party
    data & API providers, software
    packaging, CI, testing
    array computing, simulation,
    optimization, GPUs, multi-cores
    Data warehouse, querying,
    reporting, data visualization,
    dashboards
    Data Science is
    Interdisciplinary…

    View Slide

  6. © 2016 Continuum Analytics - Confidential & Proprietary 6
    Distributed 

    Systems
    Business 

    Intelligence
    Machine Learning

    & Statistics
    Software
    & Web
    development
    Scientific 

    Computing & HPC
    Numba
    dask
    xlwings
    Blaze
    Airflow
    Open Source
    Communities
    Create Powerful
    Technologies for
    Data Science

    View Slide

  7. © 2016 Continuum Analytics - Confidential & Proprietary
    How do you…?
    • Download and install data science libraries
    • Manage versions and dependencies
    • Upgrade libraries
    • Isolate dependencies between projects
    Challenges in the open data science ecosystem
    7

    View Slide

  8. Introduction to Anaconda

    View Slide

  9. © 2016 Continuum Analytics - Confidential & Proprietary 9
    Numba
    dask
    xlwings
    Airflow
    Blaze
    Distributed 

    Systems
    Business 

    Intelligence
    Web
    Scientific 

    Computing / HPC
    Machine Learning

    / Statistics
    ANACONDA
    Python & R distribution with
    1000+ curated packages that
    makes it easy to get started with
    Open Data Science

    View Slide

  10. © 2016 Continuum Analytics - Confidential & Proprietary 10
    https://www.continuum.io/downloads

    View Slide

  11. © 2016 Continuum Analytics - Confidential & Proprietary 11
    Anaconda Navigator
    Data science desktop graphical interface
    Anaconda Project
    Data science portable encapsulation
    Data Science libraries
    scikit-learn
    Bokeh Tensorflow
    Jupyter pandas
    matplotlib
    seaborn
    dask
    numba
    What’s in ANACONDA?

    View Slide

  12. © 2016 Continuum Analytics - Confidential & Proprietary 12
    • Install data science libraries
    $ conda install pandas
    • Manage package versions
    $ conda install pandas=0.14
    • Create isolated environments
    $ conda create -n myenv python=3.5 pandas=0.18
    • Update package version
    $ conda update pandas

    View Slide

  13. © 2016 Continuum Analytics - Confidential & Proprietary 13
    Data Science libraries
    scikit-learn
    Bokeh Tensorflow
    Jupyter pandas
    matplotlib
    seaborn
    dask
    numba
    • Interactive data visualization
    • Data munging
    • Parallel computing
    • Deep learning
    • …

    View Slide

  14. © 2016 Continuum Analytics - Confidential & Proprietary 14
    Anaconda Project
    Data science portable encapsulation
    anaconda-project.yml
    • Define and manage:
    • project package dependencies
    • deployment commands
    • data
    • …

    View Slide

  15. © 2016 Continuum Analytics - Confidential & Proprietary 15
    Anaconda Navigator
    Data science desktop graphical interface
    • Launch applications
    • Manage package
    versions and
    environments
    • Create and upload
    projects

    View Slide

  16. Data Science development and deployment

    View Slide

  17. © 2016 Continuum Analytics - Confidential & Proprietary 17
    What do data scientists develop?
    Workflows
    Data
    Query Visualize
    Clean
    & Tidy
    Predict,
    Simulate,
    & Optimize
    R
    P
    In
    N
    In
    A
    P
    M
    Interactive data visualizations
    and dashboards
    Jupyter notebooks
    Scripts
    Predictive models
    Processed
    Data

    View Slide

  18. © 2016 Continuum Analytics - Confidential & Proprietary 18
    Laptop
    Data Science Development
    scikit-learn
    Bokeh Tensorflow
    Jupyter pandas
    matplotlib
    seaborn
    dask
    numba
    script 1 script 2 notebook A dataset Z
    script 3
    Python, R

    View Slide

  19. © 2016 Continuum Analytics - Confidential & Proprietary
    How do you…?
    • Share your data science project with others
    • Ensure that you can reproduce your analysis
    • Deploy your project
    Challenges in data science development and
    deployment
    19

    View Slide

  20. © 2016 Continuum Analytics - Confidential & Proprietary 20
    The Path to easy Data Science Deployment!
    Anaconda Enterprise
    DIY
    Anaconda Project
    Anaconda
    Docker containers
    conda env 1 conda env 2 conda env 3

    View Slide

  21. Anaconda and Docker

    View Slide

  22. © 2016 Continuum Analytics - Confidential & Proprietary
    Laptop
    conda env 1
    Analysis
    1
    conda env 2 conda env 3
    Analysis
    2
    Analysis
    3
    Server
    conda env 1
    Analysis
    1
    conda env 2 conda env 3
    Analysis
    2
    Analysis
    3
    Docker container
    Data Science Development
    Data Science Deployment

    View Slide

  23. © 2016 Continuum Analytics - Confidential & Proprietary 23
    https://hub.docker.com/r/continuumio/anaconda/

    View Slide

  24. © 2016 Continuum Analytics - Confidential & Proprietary
    • Dependencies
    Conda and Docker
    24
    • Data
    • Deployment commands
    • Security
    • Scalability
    • Availability

    View Slide

  25. © 2016 Continuum Analytics - Confidential & Proprietary 25
    Learn more
    ANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCE
    Monday, June 20, 2016
    https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science
    ANACONDA FOR R USERS: SPARKR AND RBOKEH
    Monday, February 1, 2016
    https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokeh
    JUPYTER AND CONDA FOR R
    Monday, September 7, 2015
    https://www.continuum.io/blog/developer/jupyter-and-conda-r
    CONDA FOR DATA SCIENCE
    Thursday, May 21, 2015
    https://www.continuum.io/content/conda-data-science

    View Slide

  26. Anaconda Project

    View Slide

  27. © 2016 Continuum Analytics - Confidential & Proprietary
    Laptop Server
    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3
    Data Science Development Data Science Deployment

    View Slide

  28. © 2016 Continuum Analytics - Confidential & Proprietary
    Laptop
    Server
    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3
    Data Science Development
    Data Science Deployment
    Docker container

    View Slide

  29. © 2016 Continuum Analytics - Confidential & Proprietary
    • Dependencies
    • Data
    • Deployment commands
    Anaconda Project
    29
    • Security
    • Scalability
    • Availability

    View Slide

  30. © 2016 Continuum Analytics - Confidential & Proprietary 30
    Learn more
    ANACONDA PROJECT http://anaconda-project.readthedocs.io/en/latest/

    View Slide

  31. Anaconda Enterprise

    View Slide

  32. © 2016 Continuum Analytics - Confidential & Proprietary
    Laptop
    Project 1 Project 2 Project 3
    Project 1 Project 2 Project 3
    Data Science Development Data Science Development and Deployment
    Anaconda Enterprise
    Container 1
    Container 2
    Container 3 Container 4

    View Slide

  33. © 2016 Continuum Analytics - Confidential & Proprietary
    • Dependencies
    • Data
    • Deployment commands
    • Security
    • Scalability
    • Availability
    Anaconda Enterprise
    33

    View Slide

  34. © 2016 Continuum Analytics - Confidential & Proprietary 34

    View Slide

  35. © 2016 Continuum Analytics - Confidential & Proprietary 35
    Learn more
    PRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTS
    Wednesday, February 1, 2017
    https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projects
    SECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDA
    Monday, February 27, 2017
    https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anaconda
    ANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THE
    EASY WAY!
    Monday, March 20, 2017
    https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation-
    deployment

    View Slide

  36. https://speakerdeck.com/chdoig
    @ch_doig

    View Slide

  37. Questions?

    View Slide