Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anaconda Project

Anaconda Project

Talk Data to me
An introduction to data science portability, development and deployment
General Assembly, ATX
April 2017

6cc5be6a122c6e768981003fd2e24789?s=128

Christine Doig

April 04, 2017
Tweet

Transcript

  1. © 2016 Continuum Analytics - Confidential & Proprietary © 2016

    Continuum Analytics - Confidential & Proprietary Anaconda Project An introduction to data science portability, development and deployment Christine Doig, Product Manager and Senior Data Scientist Continuum Analytics
  2. © 2016 Continuum Analytics - Confidential & Proprietary • What’s

    data science? • Introduction to Anaconda • Data science development and deployment • Anaconda and Docker • Anaconda Project • Anaconda Enterprise Agenda 2
  3. What’s data science?

  4. © 2016 Continuum Analytics - Confidential & Proprietary 4 Data

    Science is not just Machine Learning… Distributed 
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC
  5. © 2016 Continuum Analytics - Confidential & Proprietary 5 Distributed

    
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Classification, deep learning, Regression, PCA distributed file system, message passsing, schedulers, resource managers Web crawling, scraping, 3rd party data & API providers, software packaging, CI, testing array computing, simulation, optimization, GPUs, multi-cores Data warehouse, querying, reporting, data visualization, dashboards Data Science is Interdisciplinary…
  6. © 2016 Continuum Analytics - Confidential & Proprietary 6 Distributed

    
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Numba dask xlwings Blaze Airflow Open Source Communities Create Powerful Technologies for Data Science
  7. © 2016 Continuum Analytics - Confidential & Proprietary How do

    you…? • Download and install data science libraries • Manage versions and dependencies • Upgrade libraries • Isolate dependencies between projects Challenges in the open data science ecosystem 7
  8. Introduction to Anaconda

  9. © 2016 Continuum Analytics - Confidential & Proprietary 9 Numba

    dask xlwings Airflow Blaze Distributed 
 Systems Business 
 Intelligence Web Scientific 
 Computing / HPC Machine Learning
 / Statistics ANACONDA Python & R distribution with 1000+ curated packages that makes it easy to get started with Open Data Science
  10. © 2016 Continuum Analytics - Confidential & Proprietary 10 https://www.continuum.io/downloads

  11. © 2016 Continuum Analytics - Confidential & Proprietary 11 Anaconda

    Navigator Data science desktop graphical interface Anaconda Project Data science portable encapsulation Data Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba What’s in ANACONDA? …
  12. © 2016 Continuum Analytics - Confidential & Proprietary 12 •

    Install data science libraries $ conda install pandas • Manage package versions $ conda install pandas=0.14 • Create isolated environments $ conda create -n myenv python=3.5 pandas=0.18 • Update package version $ conda update pandas
  13. © 2016 Continuum Analytics - Confidential & Proprietary 13 Data

    Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba • Interactive data visualization • Data munging • Parallel computing • Deep learning • … …
  14. © 2016 Continuum Analytics - Confidential & Proprietary 14 Anaconda

    Project Data science portable encapsulation anaconda-project.yml • Define and manage: • project package dependencies • deployment commands • data • …
  15. © 2016 Continuum Analytics - Confidential & Proprietary 15 Anaconda

    Navigator Data science desktop graphical interface • Launch applications • Manage package versions and environments • Create and upload projects
  16. Data Science development and deployment

  17. © 2016 Continuum Analytics - Confidential & Proprietary 17 What

    do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data
  18. © 2016 Continuum Analytics - Confidential & Proprietary 18 Laptop

    Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R
  19. © 2016 Continuum Analytics - Confidential & Proprietary How do

    you…? • Share your data science project with others • Ensure that you can reproduce your analysis • Deploy your project Challenges in data science development and deployment 19
  20. © 2016 Continuum Analytics - Confidential & Proprietary 20 The

    Path to easy Data Science Deployment! Anaconda Enterprise DIY Anaconda Project Anaconda Docker containers conda env 1 conda env 2 conda env 3
  21. Anaconda and Docker

  22. © 2016 Continuum Analytics - Confidential & Proprietary Laptop conda

    env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Server conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Docker container Data Science Development Data Science Deployment
  23. © 2016 Continuum Analytics - Confidential & Proprietary 23 https://hub.docker.com/r/continuumio/anaconda/

  24. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    Conda and Docker 24 • Data • Deployment commands • Security • Scalability • Availability
  25. © 2016 Continuum Analytics - Confidential & Proprietary 25 Learn

    more ANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCE Monday, June 20, 2016 https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science ANACONDA FOR R USERS: SPARKR AND RBOKEH Monday, February 1, 2016 https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokeh JUPYTER AND CONDA FOR R Monday, September 7, 2015 https://www.continuum.io/blog/developer/jupyter-and-conda-r CONDA FOR DATA SCIENCE Thursday, May 21, 2015 https://www.continuum.io/content/conda-data-science
  26. Anaconda Project

  27. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Server

    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment
  28. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Server

    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment Docker container
  29. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    • Data • Deployment commands Anaconda Project 29 • Security • Scalability • Availability
  30. © 2016 Continuum Analytics - Confidential & Proprietary 30 Learn

    more ANACONDA PROJECT http://anaconda-project.readthedocs.io/en/latest/
  31. Anaconda Enterprise

  32. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Project

    1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4
  33. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    • Data • Deployment commands • Security • Scalability • Availability Anaconda Enterprise 33
  34. © 2016 Continuum Analytics - Confidential & Proprietary 34

  35. © 2016 Continuum Analytics - Confidential & Proprietary 35 Learn

    more PRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTS Wednesday, February 1, 2017 https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projects SECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDA Monday, February 27, 2017 https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anaconda ANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THE EASY WAY! Monday, March 20, 2017 https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation- deployment
  36. https://speakerdeck.com/chdoig @ch_doig

  37. Questions?