Slide 1

Slide 1 text

© 2016 Continuum Analytics - Confidential & Proprietary © 2017 Continuum Analytics - Confidential & Proprietary Data Science encapsulation and deployment with Anaconda Project and JupyterLab Christine Doig, Senior Product Manager and Data Scientist Continuum Analytics

Slide 2

Slide 2 text

© 2017 Continuum Analytics - Confidential & Proprietary • Challenges in data science reproducibility and deployment • Encapsulating your data science with Anaconda Project • Using Anaconda Project with JupyterLab • Anaconda Project & JupyterLab powering Anaconda Enterprise v5 Agenda 2

Slide 3

Slide 3 text

Challenges in Data Science reproducibility and deployment

Slide 4

Slide 4 text

© 2017 Continuum Analytics - Confidential & Proprietary 4 Laptop Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R Reproducibility

Slide 5

Slide 5 text

© 2017 Continuum Analytics - Confidential & Proprietary 5 Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data Deployment

Slide 6

Slide 6 text

© 2017 Continuum Analytics - Confidential & Proprietary • Data Scientists work in different platforms: Windows, macOS, Linux • Data science development environments different than deployment environments • Data science dependencies are more than just software packages: data, variables, commands, services • Managing software packages: versions, build, channel • Data scientists are not necessarily software developers. Current deployment tools are very focused on serving developers • There is a variety of “deployment” types: notebooks, REST APIs, dashboards, web apps… Challenges in Data Science reproducibility and deployment 6

Slide 7

Slide 7 text

Encapsulating your data science with Anaconda Project

Slide 8

Slide 8 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop / Desktop conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Data Science Development Anaconda Distribution Anaconda Distribution & conda make data science reproducibility and development easier Laptop / Desktop / Server conda env 1 Analysis 1 conda env 2 conda env 3 Analysis 2 Analysis 3 Data Science Reproducibility & Deployment Anaconda Distribution Docker container Windows, macOS, Linux Windows, macOS, Linux

Slide 9

Slide 9 text

© 2017 Continuum Analytics - Confidential & Proprietary • Manage software packages across platforms: Windows, macOS, Linux • Isolate and recreate environments With Anaconda and conda, Data Scientists can: 9

Slide 10

Slide 10 text

© 2017 Continuum Analytics - Confidential & Proprietary 10 Introducing Anaconda Project, now available in Anaconda Distribution

Slide 11

Slide 11 text

© 2017 Continuum Analytics - Confidential & Proprietary 11 Anaconda Project Data science portable encapsulation anaconda-project.yml • Define and manage: • deployment commands • downloads and data • project package dependencies • multiple enviroments • environment variables (with encryption)

Slide 12

Slide 12 text

© 2017 Continuum Analytics - Confidential & Proprietary 12 Anaconda Project Data science portable encapsulation • Lock your environments: • package versions, down to the build numbers • platforms • packages by platform Note: This file is automatically generated for you by Anaconda Project

Slide 13

Slide 13 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop / Desktop Laptop / Desktop / Server Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Reproducibility & Deployment Windows, macOS, Linux Windows, macOS, Linux Anaconda Project brings additional capabilities for data science reproducibility and development

Slide 14

Slide 14 text

© 2017 Continuum Analytics - Confidential & Proprietary With Anaconda Projects, Data Scientists can: 14 • Export environments (with pinned versions) cross-platform • Manage other dependencies: data, variables, commands, services • Get the right abstraction for data scientists (e.g. Docker)

Slide 15

Slide 15 text

© 2016 Continuum Analytics - Confidential & Proprietary Laptop / Desktop Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Reproducibility, Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4 Anaconda Enterprise makes project collaboration and deployment secure and scalable

Slide 16

Slide 16 text

© 2017 Continuum Analytics - Confidential & Proprietary 16 Project 1 Project 2 Deploy Notebooks Models - REST APIs Dashboards Applications

Slide 17

Slide 17 text

© 2017 Continuum Analytics - Confidential & Proprietary 17

Slide 18

Slide 18 text

© 2017 Continuum Analytics - Confidential & Proprietary 18

Slide 19

Slide 19 text

© 2017 Continuum Analytics - Confidential & Proprietary With Anaconda Enterprise, Data Scientists can: 19 • Easily deploy projects a wide variety of “deployment” types: notebooks, REST APIs, dashboards, web apps… • Collaborate with other data scientists • Securily share deployments with other applications and users

Slide 20

Slide 20 text

Anaconda Project & JupyterLab powering Anaconda Enterprise v5

Slide 21

Slide 21 text

© 2017 Continuum Analytics - Confidential & Proprietary 21 JupyterLab is the default experience in Anaconda Enterprise

Slide 22

Slide 22 text

© 2017 Continuum Analytics - Confidential & Proprietary 22 Anaconda Project Lab extension • Manage your Anaconda Project dependencies from a graphical interface inside JupyterLab Nicholas Bollweg Github: bollwyvl

Slide 23

Slide 23 text

© 2017 Continuum Analytics - Confidential & Proprietary 23

Slide 24

Slide 24 text

© 2017 Continuum Analytics - Confidential & Proprietary • Anaconda Distribution and conda: • Manage software packages across platforms: Windows, macOS, Linux • Isolate and recreate environments • Anaconda Project: • Export environments (with pinned versions) cross-platform • Manage other dependencies: data, variables, commands, services • Get the right abstraction for data scientists (e.g. Docker) • Anaconda Enterprise: • Easily deploy projects a wide variety of “deployment” types: notebooks, REST APIs, dashboards, web apps… • Collaborate with other data scientists • Securily share deployments with other applications and users Anaconda helps Data Scientists reproduce and deploy their projects 24

Slide 25

Slide 25 text

https://speakerdeck.com/chdoig @ch_doig

Slide 26

Slide 26 text

Questions?