Talk Data to me An introduction to data science portability, development and deployment General Assembly, ATX April 2017
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & ProprietaryAnaconda ProjectAn introduction to data science portability,development and deploymentChristine Doig, Product Manager and Senior Data ScientistContinuum Analytics
View Slide
© 2016 Continuum Analytics - Confidential & Proprietary• What’s data science?• Introduction to Anaconda• Data science development and deployment• Anaconda and Docker• Anaconda Project• Anaconda EnterpriseAgenda2
What’s data science?
© 2016 Continuum Analytics - Confidential & Proprietary 4Data Science isnot just MachineLearning…Distributed SystemsBusiness IntelligenceMachine Learning & StatisticsSoftware& WebdevelopmentScientific Computing & HPC
© 2016 Continuum Analytics - Confidential & Proprietary 5Distributed SystemsBusiness IntelligenceMachine Learning & StatisticsSoftware& WebdevelopmentScientific Computing & HPCClassification, deep learning,Regression, PCAdistributed file system,message passsing,schedulers, resource managersWeb crawling, scraping, 3rd partydata & API providers, softwarepackaging, CI, testingarray computing, simulation,optimization, GPUs, multi-coresData warehouse, querying,reporting, data visualization,dashboardsData Science isInterdisciplinary…
© 2016 Continuum Analytics - Confidential & Proprietary 6Distributed SystemsBusiness IntelligenceMachine Learning & StatisticsSoftware& WebdevelopmentScientific Computing & HPCNumbadaskxlwingsBlazeAirflowOpen SourceCommunitiesCreate PowerfulTechnologies forData Science
© 2016 Continuum Analytics - Confidential & ProprietaryHow do you…?• Download and install data science libraries• Manage versions and dependencies• Upgrade libraries• Isolate dependencies between projectsChallenges in the open data science ecosystem7
Introduction to Anaconda
© 2016 Continuum Analytics - Confidential & Proprietary 9NumbadaskxlwingsAirflowBlazeDistributed SystemsBusiness IntelligenceWebScientific Computing / HPCMachine Learning / StatisticsANACONDAPython & R distribution with1000+ curated packages thatmakes it easy to get started withOpen Data Science
© 2016 Continuum Analytics - Confidential & Proprietary 10https://www.continuum.io/downloads
© 2016 Continuum Analytics - Confidential & Proprietary 11Anaconda NavigatorData science desktop graphical interfaceAnaconda ProjectData science portable encapsulationData Science librariesscikit-learnBokeh TensorflowJupyter pandasmatplotlibseaborndasknumbaWhat’s in ANACONDA?…
© 2016 Continuum Analytics - Confidential & Proprietary 12• Install data science libraries$ conda install pandas• Manage package versions$ conda install pandas=0.14• Create isolated environments$ conda create -n myenv python=3.5 pandas=0.18• Update package version$ conda update pandas
© 2016 Continuum Analytics - Confidential & Proprietary 13Data Science librariesscikit-learnBokeh TensorflowJupyter pandasmatplotlibseaborndasknumba• Interactive data visualization• Data munging• Parallel computing• Deep learning• ……
© 2016 Continuum Analytics - Confidential & Proprietary 14Anaconda ProjectData science portable encapsulationanaconda-project.yml• Define and manage:• project package dependencies• deployment commands• data• …
© 2016 Continuum Analytics - Confidential & Proprietary 15Anaconda NavigatorData science desktop graphical interface• Launch applications• Manage packageversions andenvironments• Create and uploadprojects
Data Science development and deployment
© 2016 Continuum Analytics - Confidential & Proprietary 17What do data scientists develop?WorkflowsDataQuery VisualizeClean& TidyPredict,Simulate,& OptimizeRPInNInAPMInteractive data visualizationsand dashboardsJupyter notebooksScriptsPredictive modelsProcessedData
© 2016 Continuum Analytics - Confidential & Proprietary 18LaptopData Science Developmentscikit-learnBokeh TensorflowJupyter pandasmatplotlibseaborndasknumbascript 1 script 2 notebook A dataset Zscript 3Python, R
© 2016 Continuum Analytics - Confidential & ProprietaryHow do you…?• Share your data science project with others• Ensure that you can reproduce your analysis• Deploy your projectChallenges in data science development anddeployment19
© 2016 Continuum Analytics - Confidential & Proprietary 20The Path to easy Data Science Deployment!Anaconda EnterpriseDIYAnaconda ProjectAnacondaDocker containersconda env 1 conda env 2 conda env 3
Anaconda and Docker
© 2016 Continuum Analytics - Confidential & ProprietaryLaptopconda env 1Analysis1conda env 2 conda env 3Analysis2Analysis3Serverconda env 1Analysis1conda env 2 conda env 3Analysis2Analysis3Docker containerData Science DevelopmentData Science Deployment
© 2016 Continuum Analytics - Confidential & Proprietary 23https://hub.docker.com/r/continuumio/anaconda/
© 2016 Continuum Analytics - Confidential & Proprietary• DependenciesConda and Docker24• Data• Deployment commands• Security• Scalability• Availability
© 2016 Continuum Analytics - Confidential & Proprietary 25Learn moreANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCEMonday, June 20, 2016https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-scienceANACONDA FOR R USERS: SPARKR AND RBOKEHMonday, February 1, 2016https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokehJUPYTER AND CONDA FOR RMonday, September 7, 2015https://www.continuum.io/blog/developer/jupyter-and-conda-rCONDA FOR DATA SCIENCEThursday, May 21, 2015https://www.continuum.io/content/conda-data-science
Anaconda Project
© 2016 Continuum Analytics - Confidential & ProprietaryLaptop ServerProject 1 Project 2 Project 3 Project 1 Project 2 Project 3Data Science Development Data Science Deployment
© 2016 Continuum Analytics - Confidential & ProprietaryLaptopServerProject 1 Project 2 Project 3 Project 1 Project 2 Project 3Data Science DevelopmentData Science DeploymentDocker container
© 2016 Continuum Analytics - Confidential & Proprietary• Dependencies• Data• Deployment commandsAnaconda Project29• Security• Scalability• Availability
© 2016 Continuum Analytics - Confidential & Proprietary 30Learn moreANACONDA PROJECT http://anaconda-project.readthedocs.io/en/latest/
Anaconda Enterprise
© 2016 Continuum Analytics - Confidential & ProprietaryLaptopProject 1 Project 2 Project 3Project 1 Project 2 Project 3Data Science Development Data Science Development and DeploymentAnaconda EnterpriseContainer 1Container 2Container 3 Container 4
© 2016 Continuum Analytics - Confidential & Proprietary• Dependencies• Data• Deployment commands• Security• Scalability• AvailabilityAnaconda Enterprise33
© 2016 Continuum Analytics - Confidential & Proprietary 34
© 2016 Continuum Analytics - Confidential & Proprietary 35Learn morePRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTSWednesday, February 1, 2017https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projectsSECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDAMonday, February 27, 2017https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anacondaANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THEEASY WAY!Monday, March 20, 2017https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation-deployment
https://speakerdeck.com/chdoig@ch_doig
Questions?