Save 37% off PRO during our Black Friday Sale! »

PyCon Colombia 2018

PyCon Colombia 2018

Data Science: Past, Present & Future Keynote talk at PyCon Colombia 2018.

6cc5be6a122c6e768981003fd2e24789?s=128

Christine Doig

February 09, 2018
Tweet

Transcript

  1. 1 DATA SCIENCE: PAST, PRESENT & FUTURE. PyCon Colombia 2018

  2. 2 ABOUT ME. GRAD STUDENT Energy 2011 E.ON / RWTH

    AACHEN PROCESS ENGINEER 2012 Manufacturing PROCTER & GAMBLE 2014 2015 2013 2017 2016 2018 PyCON Colombia 2018 BUSINESS ANALYST / CONSULTANT Banking BLUECAP - LACAIXA DATA SCIENCE CONSULTING Professional Services ANACONDA DATNA, LLC DATA SCIENTIST & PRODUCT MANAGER Tech }
  3. 3 DATA SCIENCE. ACADEMIA & RESEARCH INDUSTRY & ENTERPRISE COMMUNITY

    & OPEN SOURCE ME
  4. 4 N E X T DATA SCIENCE DEFINITIONS

  5. DATA SCIENCE WORKFLOW 03 02 01 04 COLLECT Gather, integrate

    and store data UNDERSTAND Explore, clean, transform, visualize DEPLOY Communicate and integrate into systems MODEL Build and validate models Data Analytics & Insights Data Modeling Data Engineering & Architecture
  6. 03 STEP Modeling Unsupervised learning Supervised learning no labels labels

    Exploring (clustering, dimensionality reduction) Predicting (Classification, Regression) Decision making Reinforcement learning Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies reward K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Q-learning Policy gradient REINFORCE Dyna Dynamic programming MCTS TASKS APPLICATIONS ALGORITHMS MACHINE LEARNING
  7. 03 STEP Modeling Unsupervised learning Supervised learning no labels labels

    Exploring (clustering, dimensionality reduction) Predicting (Classification, Regression) Decision making Reinforcement learning Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies reward K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Q-learning Policy gradient REINFORCE Dyna Dynamic programming MCTS TASKS APPLICATIONS ALGORITHMS MACHINE LEARNING DEEP LEARNING
  8. 8 DEFINITIONS. MACHINE LEARNING DATA SCIENCE DEEP LEARNING MACHINE LEARNING

    ~ AI DEEP LEARNING ~ AI REINFORCEMENT LEARNING ~AI
  9. 9 N E X T DATA SCIENCE: THE PAST

  10. 10 TERM: DATA SCIENCE. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century OCTOBER, 2012 “It was coined

    in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook”.
  11. 11 WE HAD ALREADY BEEN USING DATA. STATISTICS OPERATIONS RESEARCH

    BUSINESS INTELLIGENCE DATA MINING ANALYTICS PROCESS ENGINEERING QUANTITATIVE RESEARCH OPTIMIZATION REPORTS DASHBOARDS CRAWLING OPEN DATA DATA WAREHOUSE INFORMATION RETRIEVAL
  12. 12 Source: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram THE DATA SCIENTIST. SKILLS DATA TOOLS

  13. 13 THE DATA SCIENTIST. https://brohrer.github.io/imposter_syndrome.html If you are asking questions

    and using data to find answers, YOU ARE A DATA SCIENTIST. Period. ~ Brandon Rohrer
  14. ANALYTICS IN THE ENTERPRISE. SAS + ORACLE + MATLAB +

    EXCEL Transparency Innovation Reproducibility Deployment CHALLENGES
  15. 1994-2005: THE FOUNDATIONS. Python v1.0 1994 SciPy 1998 NumPy 2005

    IPython 2001 matplotlib 2001
  16. 16 Pandas 2008 Scikit-learn 2007 Jupyter 2014 Conda 2012 2006-2014:

    THE GROWTH.
  17. 2014-2017: THE “ENTERPRISE” OPEN SOURCE.

  18. 18 Source: https://speakerdeck.com/jakevdp/pythons-data-science-stack-jsm-2016 BUILDING ON EACH OTHER’S WORK.

  19. 19 WE INCORPORATED DATA PRODUCTS IN OUR LIVES. WHAT WE

    BUY AMAZON RECOMMENDATIONS WHAT INFORMATION WE CONSUME GOOGLE SEARCH WHAT SHOWS WE WATCH NETFLIX HOW WE NAVIGATE GOOGLE MAPS HOW WE CONNECT FACEBOOK LINKEDIN In 2009, Netflix awarded the $1M Grand Prize
  20. 20 Every project started with a small first step.

  21. 21 N E X T DATA SCIENCE: THE PRESENT

  22. 22 GARTNER HYPE CYCLE. Deep Learning Machine Learning Autonomous Vehicles

  23. 23 A NEW PROFESSION & CAREER PATH. Data Engineer Data

    Engineering & Architecture Data Analytics & Insights Data Science Sr. Data Engineer Data Architect Data Analyst Sr. Data Analyst Analytics Manager Director of Analytics Director of Data Engineering Data Scientist Sr. Data Scientist Data Science Manager Director of Data Science
  24. 24 DATA SCIENCE MATURITY. ECOSYSTEM SKILLS OPEN SOURCE COMPUTE DATA

    + ALGORITHMS
  25. ECOSYSTEM. A mature product and vendor ecosystem to serve the

    early majority Source: FirstMark Capital, Matt Turck, Jim Hao http://mattturck.com/big-data-landscape-2016-v18-final/ M&A - CONSOLIDATION: TURI, YHAT, SENSE.IO, KAGGLE IPOs: CLOUDERA, ALTERYX
  26. 26 SKILLS. MIT - http://introtodeeplearning.com/

  27. 27 SKILLS. http://www.mastersindatascience.org/

  28. 28 OPEN SOURCE. DEEP LEARNING IDEs DATA MUNGING DATA VIZ

    MACHINE LEARNING DATA WORKFLOWS BIG DATA NLP STATISTICS
  29. 29 COMPUTE. GCP AZURE Data Science Collaboration ML / DL

    APIs GOOGLE CLOUD VISION API COMPUTER VISION API Build, train, deploy API AWS
  30. 30 DATA + ALGORITHMS. AUDIO IMAGES WEBSITES COMPETITIONS DATA REPOSITORY

    http://www.image-net.org/ https://research.google.com/audioset/ http://commoncrawl.org/ https://www.kaggle.com/ https://data.world/
  31. ALPHA GO "Mastering the game of Go without human knowledge".

    Nature. 19 October 2017. Retrieved 19 October 2017. Oct. 2015 - Beats human professional Go player (v. Fan) Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data.
  32. 32 DOG VS . . . Source: https://imgur.com/a/K4RWn Chihuahua or

    muffin? Sharpei or towel?
  33. 33 Source: https://imgur.com/a/K4RWn Labradoodle or fried chicken? Sheepdog or mop?

  34. 34 REAL APPLICATIONS. Source: http://observer.com/2017/05/artificial-intelligence-can-stop-elephant-rhino-poaching-in-africa/ Source: https://www.nature.com/articles/nature21056

  35. 35 REAL APPLICATIONS. Source: https://cloud.google.com/blog/big-data/2016/08/how-a-japanese-cucumber-farmer-is-using-deep-learning-and-tensorflow

  36. 36 DATA PRODUCTS - DEVICES. HOME ASSISTANTS WEARABLES

  37. 37 N E X T DATA SCIENCE: THE FUTURE

  38. 38 AUTONOMOUS VEHICLES.

  39. 39 Source: https://research.fb.com/facebook-open-sources-detectron/ OPEN SOURCE DETECTRON.

  40. 40 CHALLENGES. SECURITY OPEN SOURCE SUSTAINABILITY ETHICS INTERPRETABILITY

  41. 41 SECURITY. https://www.nytimes.com/2018/01/29/world/middleeast/strava-heat-map.html https://qz.com/1042852/using-a-fitness-app-taught-me-the-scary-truth-about-why-privacy-settings-are-a-feminist-issue/

  42. 42 SECURITY. HACKING AI. Source: https://steemit.com/security/@mrosenquist/researchers-hack-self-driving-cars-with-stickers-on-signs Source: https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-fooling-attacks-artificial-intelligence

  43. 43 ETHICS. DEEPFAKE.

  44. 44 https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html ETHICS. BIAS. COMPAS predicts black defendants will have

    higher risks of recidivism than they actually do, while white defendants are predicted to have lower rates than they actually do
  45. 45 [Machine-learned models] will learn what the data shows them,

    and then tell you what they’ve learned. They refuse to learn “the world as we wish it were”. The fact is that these biases do exist in our society, and they’re reflected in nearly any piece of data you look at. Source: https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48 BIAS.
  46. 46 ETHICS INITIATIVE. Source: https://www.bloomberg.com/company/announcements/bloomberg-brighthive-data-democracy-launch-initiative-develop-data-science-code-ethics/ Source: https://medium.com/@dpatil/a-code-of-ethics-for-data-science-cda27d1fac1

  47. 47 Source: https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48 INTERPRETABILITY. What people are good at, it

    turns out, isn’t explaining how they made decisions: it’s coming up with a reasonable-sounding explanation for their decision after the fact.
  48. 48 Source: https://github.com/marcotcr/lime INTERPRETABILITY. Source: https://arxiv.org/pdf/1311.2901.pdf Source: https://twitter.com/pmddomingos/status/956697536189800448

  49. 49 OPEN SOURCE SUSTAINABILITY. Source: https://www.numfocus.org/blog/matplotlib-lead-developer-explains-why-he-cant-fix-the-docs-but-you-can/ DEVELOPER BURN OUT RATIO

    DEVELOPER - USER ENTERPRISES PROFITING WITHOUT GIVING BACK USERS WITH HIGH EXPECTATIONS
  50. 50 THANK YOU! @ch_doig christine@datna.io