Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to (Spatial) Data Science

Introduction to (Spatial) Data Science

JD Godchaux

More Decks by Texas Natural Resources Information System

Other Decks in Technology

Transcript

  1. Texas GIS Forum Workshop, October 22, 2019 Introduction to (Spatial)

    Data Science JD Godchaux, Customer Success Manager @ CARTO https://tinyurl.com/carto-sds-workshop https://tinyurl.com/carto-sds-workshop-folder
  2. Texas GIS Forum Workshop, October 22, 2019 What is Data

    Science "What do you think of when you read the phrase ‘data science’? It’s probably some combination of keywords like statistics, machine learning, deep learning, and ‘sexiest job of the 21st century’. Or maybe it’s an image of a data scientist, sitting at her computer, putting together stunning visuals from well-run A/B tests. Either way, it’s glamorous, smart, and sophisticated. This is the narrative that data science has been selling since I entered the field almost ten years ago.” - Vicki Boykis
  3. Texas GIS Forum Workshop, October 22, 2019 "Data Scientist" most

    of the time means "Data Janitor" - Dongjie Fan, Data Scientist @ CARTO
  4. Texas GIS Forum Workshop, October 22, 2019 What Data Scientists

    Do • Prediction • Inference • Optimization
  5. Texas GIS Forum Workshop, October 22, 2019 What Data Scientists

    Do • A/B Testing on Website Path • Image Recognition • Predictions (Supervised Learning) • Unsupervised Learning • Natural Language Processing • Recommendation System (Netflix) • Time Series Analysis (trading) • Operation Research (Uber)
  6. Texas GIS Forum Workshop, October 22, 2019 Who are "Data

    Scientists"? • Data Analyst • Business Analyst • Data Engineer • Machine Learning Engineer • Data Visualization • Developers
  7. Texas GIS Forum Workshop, October 22, 2019 Being able to

    work with data to understand problems
  8. Texas GIS Forum Workshop, October 22, 2019 Validate the methods

    used to solve the assumption, even if results look good
  9. Texas GIS Forum Workshop, October 22, 2019 Be able to

    explain the process of how you reached a conclusion
  10. Texas GIS Forum Workshop, October 22, 2019 Spatial data science

    contains at least two aspects. One is adding spatial variables; the other is adding spatial items in the model. For example, f(x)=aX+b+error this is simple linear regression. you can add as many as self-created spatial variables here in X ; f(x)=aX+Sigma+b+error we add a covariance item Sigma which represents the influences caused by the spatial relationships from the whole dataset (not necessarily related to specific input x)
  11. Texas GIS Forum Workshop, October 22, 2019 ! Bringing the

    spatial data in as an explicit part of the analysis or taking "the where", distances, spatial arrangement into account
  12. Texas GIS Forum Workshop, October 22, 2019 " If the

    location changes, and the context of the data also changes, then you need SDS
  13. Texas GIS Forum Workshop, October 22, 2019 Spatial Analysis Questions

    • Where are things and where do they happen: clusters, hot spots, disparities • Why do they happen where they happen: understanding locations decisions and movement patterns • How does where things happen affect other things (context/environment) and how does context affect what happens: "I am my neighbors neighbor" • Where should things be located: optimization
  14. Texas GIS Forum Workshop, October 22, 2019 What makes spatial

    data unique • Spatial Context: Understanding the effect that your neighbors have on an observation, and vice versa • Spatial Support Problem: where scales of data do not match (zip codes and block groups) • Spatial Scale of Observations: behavior does not match the unit of observation (you know what neighborhood you live in but not with block group)
  15. Texas GIS Forum Workshop, October 22, 2019 What makes spatial

    data unique • Spatial Spillover: the activity of one location will impact the costs of other locations (closing a road will have an increased cost for a distribution center) • Spatial Multiplier: a successful store will not just impact that store, but the nearby stores as well • Spatial Decay: Observations change and decay as you move away from an observation
  16. Texas GIS Forum Workshop, October 22, 2019 GIS v. Location

    Intelligence v. Spatial Data Science
  17. Texas GIS Forum Workshop, October 22, 2019 GIS GIS is

    the toolkit to work spatial data and do something with it - here are the tools and we train people to use those tools. Much (but not all) of this work is descriptive in nature - where the things are. Traditional GIS programs trained you on these tools, then you found careers using those tools. And with ESRI there are many many tools.
  18. Texas GIS Forum Workshop, October 22, 2019 Location Intelligence Location

    Intelligence focuses on outcomes from location data - you take the data in and you get some valuable insight out. Think about human mobility, they market immediate insight from location data, when the real process is much more complex.
  19. Texas GIS Forum Workshop, October 22, 2019 Spatial Data Science

    Spatial Data Science focuses on the the journey and the data, or the underlying conditions leading to the insight and prescribes recommendations for optimization Looking at the factors that cause something to occur, and creating models for spatial "optimization".
  20. Texas GIS Forum Workshop, October 22, 2019 Data Science Takeaways

    • Data science is a verbose space with many tools and skills • Spatial data science concepts are newer to traditional 'data scientists' • There are many different roles and titles that all triangulate on ‘data science’
  21. Texas GIS Forum Workshop, October 22, 2019 Spatial Data Science

    Takeaways • Spatial Data Science is data science with spatial attributes (with a little more complexity) • In Spatial Data Science, the journey to discover the problem is just as, if not more important, than the answer • It is important to start with a problem, then work from the beginning to align the steps to get there
  22. Texas GIS Forum Workshop, October 22, 2019 Ways of Notebooking

    - Jupyter Notebook (Installation in a virtual environment with CARTOframes) - Docker - Google Colaboratory - nteract - Kaggle - Glitch
  23. Texas GIS Forum Workshop, October 22, 2019 Notebooks for Today

    • Intro to Python • NumPy, Pandas, Seaborn • Introduction to CARTOframes • Helper Functions in CARTOframes • Los Angeles Real Estate Data Cleaning • Los Angeles Real Estate Exploratory Data Analysis