Slide 1

Slide 1 text

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Anaconda Project An introduction to data science portability, development and deployment Christine Doig, Product Manager and Senior Data Scientist Continuum Analytics

Slide 2

Slide 2 text

© 2016 Continuum Analytics - Confidential & Proprietary • What’s data science? • Introduction to Anaconda • Data science development and deployment • Anaconda and Docker • Anaconda Project • Anaconda Enterprise Agenda 2

Slide 3

Slide 3 text

What’s data science?

Slide 4

Slide 4 text

© 2016 Continuum Analytics - Confidential & Proprietary 4 Data Science is not just Machine Learning… Distributed 
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC

Slide 5

Slide 5 text

© 2016 Continuum Analytics - Confidential & Proprietary 5 Distributed 
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Classification, deep learning, Regression, PCA distributed file system, message passsing, schedulers, resource managers Web crawling, scraping, 3rd party data & API providers, software packaging, CI, testing array computing, simulation, optimization, GPUs, multi-cores Data warehouse, querying, reporting, data visualization, dashboards Data Science is Interdisciplinary…

Slide 6

Slide 6 text

© 2016 Continuum Analytics - Confidential & Proprietary 6 Distributed 
 Systems Business 
 Intelligence Machine Learning
 & Statistics Software & Web development Scientific 
 Computing & HPC Numba dask xlwings Blaze Airflow Open Source Communities Create Powerful Technologies for Data Science

Slide 7

Slide 7 text

© 2016 Continuum Analytics - Confidential & Proprietary How do you…? • Download and install data science libraries • Manage versions and dependencies • Upgrade libraries • Isolate dependencies between projects Challenges in the open data science ecosystem 7

Slide 8

Slide 8 text

Introduction to Anaconda

Slide 9

Slide 9 text

© 2016 Continuum Analytics - Confidential & Proprietary 9 Numba dask xlwings Airflow Blaze Distributed 
 Systems Business 
 Intelligence Web Scientific 
 Computing / HPC Machine Learning
 / Statistics ANACONDA Python & R distribution with 1000+ curated packages that makes it easy to get started with Open Data Science

Slide 10

Slide 10 text

© 2016 Continuum Analytics - Confidential & Proprietary 10 https://www.continuum.io/downloads

Slide 11

Slide 11 text

© 2016 Continuum Analytics - Confidential & Proprietary 11 Anaconda Navigator Data science desktop graphical interface Anaconda Project Data science portable encapsulation Data Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba What’s in ANACONDA? …

Slide 12

Slide 12 text

© 2016 Continuum Analytics - Confidential & Proprietary 12 • Install data science libraries $ conda install pandas • Manage package versions $ conda install pandas=0.14 • Create isolated environments $ conda create -n myenv python=3.5 pandas=0.18 • Update package version $ conda update pandas

Slide 13

Slide 13 text

© 2016 Continuum Analytics - Confidential & Proprietary 13 Data Science libraries scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba • Interactive data visualization • Data munging • Parallel computing • Deep learning • … …

Slide 14

Slide 14 text

© 2016 Continuum Analytics - Confidential & Proprietary 14 Anaconda Project Data science portable encapsulation anaconda-project.yml • Define and manage: • project package dependencies • deployment commands • data • …

Slide 15

Slide 15 text

© 2016 Continuum Analytics - Confidential & Proprietary 15 Anaconda Navigator Data science desktop graphical interface • Launch applications • Manage package versions and environments • Create and upload projects

Slide 16

Slide 16 text

Data Science development and deployment

Slide 17

Slide 17 text

© 2016 Continuum Analytics - Confidential & Proprietary 17 What do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data

Slide 18

Slide 18 text

© 2016 Continuum Analytics - Confidential & Proprietary 18 Laptop Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R

Slide 19

Slide 19 text

© 2016 Continuum Analytics - Confidential & Proprietary How do you…? • Share your data science project with others • Ensure that you can reproduce your analysis • Deploy your project Challenges in data science development and deployment 19

Slide 20

Slide 20 text

© 2016 Continuum Analytics - Confidential & Proprietary 20 The Path to easy Data Science Deployment! Anaconda Enterprise DIY Anaconda Project Anaconda Docker containers conda env 1 conda env 2 conda env 3

Slide 21

Slide 21 text

Anaconda and Docker

Slide 22

Slide 22 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Server conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Docker container Data Science Development Data Science Deployment

Slide 23

Slide 23 text

© 2016 Continuum Analytics - Confidential & Proprietary 23 https://hub.docker.com/r/continuumio/anaconda/

Slide 24

Slide 24 text

© 2016 Continuum Analytics - Confidential & Proprietary • Dependencies Conda and Docker 24 • Data • Deployment commands • Security • Scalability • Availability

Slide 25

Slide 25 text

© 2016 Continuum Analytics - Confidential & Proprietary 25 Learn more ANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCE Monday, June 20, 2016 https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science ANACONDA FOR R USERS: SPARKR AND RBOKEH Monday, February 1, 2016 https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokeh JUPYTER AND CONDA FOR R Monday, September 7, 2015 https://www.continuum.io/blog/developer/jupyter-and-conda-r CONDA FOR DATA SCIENCE Thursday, May 21, 2015 https://www.continuum.io/content/conda-data-science

Slide 26

Slide 26 text

Anaconda Project

Slide 27

Slide 27 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop Server Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment

Slide 28

Slide 28 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop Server Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment Docker container

Slide 29

Slide 29 text

© 2016 Continuum Analytics - Confidential & Proprietary • Dependencies • Data • Deployment commands Anaconda Project 29 • Security • Scalability • Availability

Slide 30

Slide 30 text

© 2016 Continuum Analytics - Confidential & Proprietary 30 Learn more ANACONDA PROJECT http://anaconda-project.readthedocs.io/en/latest/

Slide 31

Slide 31 text

Anaconda Enterprise

Slide 32

Slide 32 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4

Slide 33

Slide 33 text

© 2016 Continuum Analytics - Confidential & Proprietary • Dependencies • Data • Deployment commands • Security • Scalability • Availability Anaconda Enterprise 33

Slide 34

Slide 34 text

© 2016 Continuum Analytics - Confidential & Proprietary 34

Slide 35

Slide 35 text

© 2016 Continuum Analytics - Confidential & Proprietary 35 Learn more PRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTS Wednesday, February 1, 2017 https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projects SECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDA Monday, February 27, 2017 https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anaconda ANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THE EASY WAY! Monday, March 20, 2017 https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation- deployment

Slide 36

Slide 36 text

https://speakerdeck.com/chdoig @ch_doig

Slide 37

Slide 37 text

Questions?