PyCon Colombia 2018

PyCon Colombia 2018

Data Science: Past, Present & Future Keynote talk at PyCon Colombia 2018.

6cc5be6a122c6e768981003fd2e24789?s=128

Christine Doig

February 09, 2018
Tweet

Transcript

  1. 2.

    2 ABOUT ME. GRAD STUDENT Energy 2011 E.ON / RWTH

    AACHEN PROCESS ENGINEER 2012 Manufacturing PROCTER & GAMBLE 2014 2015 2013 2017 2016 2018 PyCON Colombia 2018 BUSINESS ANALYST / CONSULTANT Banking BLUECAP - LACAIXA DATA SCIENCE CONSULTING Professional Services ANACONDA DATNA, LLC DATA SCIENTIST & PRODUCT MANAGER Tech }
  2. 5.

    DATA SCIENCE WORKFLOW 03 02 01 04 COLLECT Gather, integrate

    and store data UNDERSTAND Explore, clean, transform, visualize DEPLOY Communicate and integrate into systems MODEL Build and validate models Data Analytics & Insights Data Modeling Data Engineering & Architecture
  3. 6.

    03 STEP Modeling Unsupervised learning Supervised learning no labels labels

    Exploring (clustering, dimensionality reduction) Predicting (Classification, Regression) Decision making Reinforcement learning Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies reward K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Q-learning Policy gradient REINFORCE Dyna Dynamic programming MCTS TASKS APPLICATIONS ALGORITHMS MACHINE LEARNING
  4. 7.

    03 STEP Modeling Unsupervised learning Supervised learning no labels labels

    Exploring (clustering, dimensionality reduction) Predicting (Classification, Regression) Decision making Reinforcement learning Market segmentation Anomaly detection Summarizing information Spam detection Object/face recognition Recommender systems Robotics - Make Humanoid robot walk Games - Defeat Go champion Finance - Trading strategies reward K-means Hierarchical clustering PCA T-SNE Logistic Regression SVM Decision trees k-NN Linear Regression Neural Networks Q-learning Policy gradient REINFORCE Dyna Dynamic programming MCTS TASKS APPLICATIONS ALGORITHMS MACHINE LEARNING DEEP LEARNING
  5. 8.

    8 DEFINITIONS. MACHINE LEARNING DATA SCIENCE DEEP LEARNING MACHINE LEARNING

    ~ AI DEEP LEARNING ~ AI REINFORCEMENT LEARNING ~AI
  6. 10.

    10 TERM: DATA SCIENCE. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century OCTOBER, 2012 “It was coined

    in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook”.
  7. 11.

    11 WE HAD ALREADY BEEN USING DATA. STATISTICS OPERATIONS RESEARCH

    BUSINESS INTELLIGENCE DATA MINING ANALYTICS PROCESS ENGINEERING QUANTITATIVE RESEARCH OPTIMIZATION REPORTS DASHBOARDS CRAWLING OPEN DATA DATA WAREHOUSE INFORMATION RETRIEVAL
  8. 13.

    13 THE DATA SCIENTIST. https://brohrer.github.io/imposter_syndrome.html If you are asking questions

    and using data to find answers, YOU ARE A DATA SCIENTIST. Period. ~ Brandon Rohrer
  9. 14.

    ANALYTICS IN THE ENTERPRISE. SAS + ORACLE + MATLAB +

    EXCEL Transparency Innovation Reproducibility Deployment CHALLENGES
  10. 19.

    19 WE INCORPORATED DATA PRODUCTS IN OUR LIVES. WHAT WE

    BUY AMAZON RECOMMENDATIONS WHAT INFORMATION WE CONSUME GOOGLE SEARCH WHAT SHOWS WE WATCH NETFLIX HOW WE NAVIGATE GOOGLE MAPS HOW WE CONNECT FACEBOOK LINKEDIN In 2009, Netflix awarded the $1M Grand Prize
  11. 23.

    23 A NEW PROFESSION & CAREER PATH. Data Engineer Data

    Engineering & Architecture Data Analytics & Insights Data Science Sr. Data Engineer Data Architect Data Analyst Sr. Data Analyst Analytics Manager Director of Analytics Director of Data Engineering Data Scientist Sr. Data Scientist Data Science Manager Director of Data Science
  12. 25.

    ECOSYSTEM. A mature product and vendor ecosystem to serve the

    early majority Source: FirstMark Capital, Matt Turck, Jim Hao http://mattturck.com/big-data-landscape-2016-v18-final/ M&A - CONSOLIDATION: TURI, YHAT, SENSE.IO, KAGGLE IPOs: CLOUDERA, ALTERYX
  13. 28.

    28 OPEN SOURCE. DEEP LEARNING IDEs DATA MUNGING DATA VIZ

    MACHINE LEARNING DATA WORKFLOWS BIG DATA NLP STATISTICS
  14. 29.

    29 COMPUTE. GCP AZURE Data Science Collaboration ML / DL

    APIs GOOGLE CLOUD VISION API COMPUTER VISION API Build, train, deploy API AWS
  15. 30.

    30 DATA + ALGORITHMS. AUDIO IMAGES WEBSITES COMPETITIONS DATA REPOSITORY

    http://www.image-net.org/ https://research.google.com/audioset/ http://commoncrawl.org/ https://www.kaggle.com/ https://data.world/
  16. 31.

    ALPHA GO "Mastering the game of Go without human knowledge".

    Nature. 19 October 2017. Retrieved 19 October 2017. Oct. 2015 - Beats human professional Go player (v. Fan) Mar. 2016 - Beats Lee Sedol (9-dan professional) in five-game match (v. Lee) May 2017 - Beats Ke Jie the world's top Go player (v. Master) October 2017 - AlphaGo Zero beats Alpha Go (v.Lee) (100-0) with an algorithm based solely on reinforcement learning, without human data.
  17. 44.

    44 https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html ETHICS. BIAS. COMPAS predicts black defendants will have

    higher risks of recidivism than they actually do, while white defendants are predicted to have lower rates than they actually do
  18. 45.

    45 [Machine-learned models] will learn what the data shows them,

    and then tell you what they’ve learned. They refuse to learn “the world as we wish it were”. The fact is that these biases do exist in our society, and they’re reflected in nearly any piece of data you look at. Source: https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48 BIAS.
  19. 47.

    47 Source: https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48 INTERPRETABILITY. What people are good at, it

    turns out, isn’t explaining how they made decisions: it’s coming up with a reasonable-sounding explanation for their decision after the fact.