Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anaconda Project

Anaconda Project

Talk Data to me
An introduction to data science portability, development and deployment
General Assembly, ATX
April 2017

Christine Doig

April 04, 2017
Tweet

More Decks by Christine Doig

Other Decks in Technology

Transcript

  1. © 2016 Continuum Analytics - Confidential & Proprietary © 2016

    Continuum Analytics - Confidential & Proprietary Anaconda Project An introduction to data science portability, development and deployment Christine Doig, Product Manager and Senior Data Scientist Continuum Analytics
  2. © 2016 Continuum Analytics - Confidential & Proprietary • What’s

    data science? • Introduction to Anaconda • Data science development and deployment • Anaconda and Docker • Anaconda Project • Anaconda Enterprise Agenda 2
  3. © 2016 Continuum Analytics - Confidential & Proprietary 4 Data

    Science is not just Machine Learning… Distributed 
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC
  4. © 2016 Continuum Analytics - Confidential & Proprietary 5 Distributed

    
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Classification, deep learning, Regression, PCA distributed file system, message passsing, schedulers, resource managers Web crawling, scraping, 3rd party data & API providers, software packaging, CI, testing array computing, simulation, optimization, GPUs, multi-cores Data warehouse, querying, reporting, data visualization, dashboards Data Science is Interdisciplinary…
  5. © 2016 Continuum Analytics - Confidential & Proprietary 6 Distributed

    
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Numba dask xlwings Blaze Airflow Open Source Communities Create Powerful Technologies for Data Science
  6. © 2016 Continuum Analytics - Confidential & Proprietary How do

    you…? • Download and install data science libraries • Manage versions and dependencies • Upgrade libraries • Isolate dependencies between projects Challenges in the open data science ecosystem 7
  7. © 2016 Continuum Analytics - Confidential & Proprietary 9 Numba

    dask xlwings Airflow Blaze Distributed 
 Systems Business 
 Intelligence Web Scientific 
 Computing / HPC Machine Learning
 / Statistics ANACONDA Python & R distribution with 1000+ curated packages that makes it easy to get started with Open Data Science
  8. © 2016 Continuum Analytics - Confidential & Proprietary 11 Anaconda

    Navigator Data science desktop graphical interface Anaconda Project Data science portable encapsulation Data Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba What’s in ANACONDA? …
  9. © 2016 Continuum Analytics - Confidential & Proprietary 12 •

    Install data science libraries $ conda install pandas • Manage package versions $ conda install pandas=0.14 • Create isolated environments $ conda create -n myenv python=3.5 pandas=0.18 • Update package version $ conda update pandas
  10. © 2016 Continuum Analytics - Confidential & Proprietary 13 Data

    Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba • Interactive data visualization • Data munging • Parallel computing • Deep learning • … …
  11. © 2016 Continuum Analytics - Confidential & Proprietary 14 Anaconda

    Project Data science portable encapsulation anaconda-project.yml • Define and manage: • project package dependencies • deployment commands • data • …
  12. © 2016 Continuum Analytics - Confidential & Proprietary 15 Anaconda

    Navigator Data science desktop graphical interface • Launch applications • Manage package versions and environments • Create and upload projects
  13. © 2016 Continuum Analytics - Confidential & Proprietary 17 What

    do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data
  14. © 2016 Continuum Analytics - Confidential & Proprietary 18 Laptop

    Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R
  15. © 2016 Continuum Analytics - Confidential & Proprietary How do

    you…? • Share your data science project with others • Ensure that you can reproduce your analysis • Deploy your project Challenges in data science development and deployment 19
  16. © 2016 Continuum Analytics - Confidential & Proprietary 20 The

    Path to easy Data Science Deployment! Anaconda Enterprise DIY Anaconda Project Anaconda Docker containers conda env 1 conda env 2 conda env 3
  17. © 2016 Continuum Analytics - Confidential & Proprietary Laptop conda

    env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Server conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Docker container Data Science Development Data Science Deployment
  18. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    Conda and Docker 24 • Data • Deployment commands • Security • Scalability • Availability
  19. © 2016 Continuum Analytics - Confidential & Proprietary 25 Learn

    more ANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCE Monday, June 20, 2016 https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science ANACONDA FOR R USERS: SPARKR AND RBOKEH Monday, February 1, 2016 https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokeh JUPYTER AND CONDA FOR R Monday, September 7, 2015 https://www.continuum.io/blog/developer/jupyter-and-conda-r CONDA FOR DATA SCIENCE Thursday, May 21, 2015 https://www.continuum.io/content/conda-data-science
  20. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Server

    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment
  21. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Server

    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment Docker container
  22. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    • Data • Deployment commands Anaconda Project 29 • Security • Scalability • Availability
  23. © 2016 Continuum Analytics - Confidential & Proprietary 30 Learn

    more ANACONDA PROJECT http://anaconda-project.readthedocs.io/en/latest/
  24. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Project

    1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4
  25. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    • Data • Deployment commands • Security • Scalability • Availability Anaconda Enterprise 33
  26. © 2016 Continuum Analytics - Confidential & Proprietary 35 Learn

    more PRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTS Wednesday, February 1, 2017 https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projects SECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDA Monday, February 27, 2017 https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anaconda ANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THE EASY WAY! Monday, March 20, 2017 https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation- deployment