Slide 1

Slide 1 text

Texas GIS Forum Workshop, October 22, 2019 Introduction to (Spatial) Data Science JD Godchaux, Customer Success Manager @ CARTO https://tinyurl.com/carto-sds-workshop https://tinyurl.com/carto-sds-workshop-folder

Slide 2

Slide 2 text

Texas GIS Forum Workshop, October 22, 2019 Data Science

Slide 3

Slide 3 text

Texas GIS Forum Workshop, October 22, 2019 What is Data Science "What do you think of when you read the phrase ‘data science’? It’s probably some combination of keywords like statistics, machine learning, deep learning, and ‘sexiest job of the 21st century’. Or maybe it’s an image of a data scientist, sitting at her computer, putting together stunning visuals from well-run A/B tests. Either way, it’s glamorous, smart, and sophisticated. This is the narrative that data science has been selling since I entered the field almost ten years ago.” - Vicki Boykis

Slide 4

Slide 4 text

Texas GIS Forum Workshop, October 22, 2019 Well...

Slide 5

Slide 5 text

Texas GIS Forum Workshop, October 22, 2019 What is Data Science

Slide 6

Slide 6 text

Texas GIS Forum Workshop, October 22, 2019 What is Data Science

Slide 7

Slide 7 text

Texas GIS Forum Workshop, October 22, 2019 "Data Scientist" most of the time means "Data Janitor" - Dongjie Fan, Data Scientist @ CARTO

Slide 8

Slide 8 text

Texas GIS Forum Workshop, October 22, 2019 Many different tools

Slide 9

Slide 9 text

Texas GIS Forum Workshop, October 22, 2019 Storytelling is important

Slide 10

Slide 10 text

Texas GIS Forum Workshop, October 22, 2019 SQL is still important

Slide 11

Slide 11 text

Texas GIS Forum Workshop, October 22, 2019 What Data Scientists Do ● Prediction ● Inference ● Optimization

Slide 12

Slide 12 text

Texas GIS Forum Workshop, October 22, 2019 What Data Scientists Do ● A/B Testing on Website Path ● Image Recognition ● Predictions (Supervised Learning) ● Unsupervised Learning ● Natural Language Processing ● Recommendation System (Netflix) ● Time Series Analysis (trading) ● Operation Research (Uber)

Slide 13

Slide 13 text

Texas GIS Forum Workshop, October 22, 2019 Who are "Data Scientists"

Slide 14

Slide 14 text

Texas GIS Forum Workshop, October 22, 2019 Who are "Data Scientists"? ● Data Analyst ● Business Analyst ● Data Engineer ● Machine Learning Engineer ● Data Visualization ● Developers

Slide 15

Slide 15 text

Texas GIS Forum Workshop, October 22, 2019 What do Data Scientists do?

Slide 16

Slide 16 text

Texas GIS Forum Workshop, October 22, 2019 Being able to work with data to understand problems

Slide 17

Slide 17 text

Texas GIS Forum Workshop, October 22, 2019 Use different tools & methods to accomplish these tasks

Slide 18

Slide 18 text

Texas GIS Forum Workshop, October 22, 2019 Validate the methods used to solve the assumption, even if results look good

Slide 19

Slide 19 text

Texas GIS Forum Workshop, October 22, 2019 Be able to explain the process of how you reached a conclusion

Slide 20

Slide 20 text

Texas GIS Forum Workshop, October 22, 2019 (Spatial) Data Science

Slide 21

Slide 21 text

Texas GIS Forum Workshop, October 22, 2019 Spatial data science contains at least two aspects. One is adding spatial variables; the other is adding spatial items in the model. For example, f(x)=aX+b+error this is simple linear regression. you can add as many as self-created spatial variables here in X ; f(x)=aX+Sigma+b+error we add a covariance item Sigma which represents the influences caused by the spatial relationships from the whole dataset (not necessarily related to specific input x)

Slide 22

Slide 22 text

Texas GIS Forum Workshop, October 22, 2019 ! Bringing the spatial data in as an explicit part of the analysis or taking "the where", distances, spatial arrangement into account

Slide 23

Slide 23 text

Texas GIS Forum Workshop, October 22, 2019 " If the location changes, and the context of the data also changes, then you need SDS

Slide 24

Slide 24 text

Texas GIS Forum Workshop, October 22, 2019 Spatial Analysis Questions ● Where are things and where do they happen: clusters, hot spots, disparities ● Why do they happen where they happen: understanding locations decisions and movement patterns ● How does where things happen affect other things (context/environment) and how does context affect what happens: "I am my neighbors neighbor" ● Where should things be located: optimization

Slide 25

Slide 25 text

Texas GIS Forum Workshop, October 22, 2019 What makes spatial data unique ● Spatial Context: Understanding the effect that your neighbors have on an observation, and vice versa ● Spatial Support Problem: where scales of data do not match (zip codes and block groups) ● Spatial Scale of Observations: behavior does not match the unit of observation (you know what neighborhood you live in but not with block group)

Slide 26

Slide 26 text

Texas GIS Forum Workshop, October 22, 2019 What makes spatial data unique ● Spatial Spillover: the activity of one location will impact the costs of other locations (closing a road will have an increased cost for a distribution center) ● Spatial Multiplier: a successful store will not just impact that store, but the nearby stores as well ● Spatial Decay: Observations change and decay as you move away from an observation

Slide 27

Slide 27 text

Texas GIS Forum Workshop, October 22, 2019 80% of the effort is data preparation

Slide 28

Slide 28 text

Texas GIS Forum Workshop, October 22, 2019 GIS v. Location Intelligence v. Spatial Data Science

Slide 29

Slide 29 text

Texas GIS Forum Workshop, October 22, 2019 GIS GIS is the toolkit to work spatial data and do something with it - here are the tools and we train people to use those tools. Much (but not all) of this work is descriptive in nature - where the things are. Traditional GIS programs trained you on these tools, then you found careers using those tools. And with ESRI there are many many tools.

Slide 30

Slide 30 text

Texas GIS Forum Workshop, October 22, 2019 Location Intelligence Location Intelligence focuses on outcomes from location data - you take the data in and you get some valuable insight out. Think about human mobility, they market immediate insight from location data, when the real process is much more complex.

Slide 31

Slide 31 text

Texas GIS Forum Workshop, October 22, 2019 "Location Intelligence"

Slide 32

Slide 32 text

Texas GIS Forum Workshop, October 22, 2019

Slide 33

Slide 33 text

Texas GIS Forum Workshop, October 22, 2019 Spatial Data Science Spatial Data Science focuses on the the journey and the data, or the underlying conditions leading to the insight and prescribes recommendations for optimization Looking at the factors that cause something to occur, and creating models for spatial "optimization".

Slide 34

Slide 34 text

Texas GIS Forum Workshop, October 22, 2019 USC Spatial Data Science Curriculum

Slide 35

Slide 35 text

Texas GIS Forum Workshop, October 22, 2019 Data Science Takeaways ● Data science is a verbose space with many tools and skills ● Spatial data science concepts are newer to traditional 'data scientists' ● There are many different roles and titles that all triangulate on ‘data science’

Slide 36

Slide 36 text

Texas GIS Forum Workshop, October 22, 2019

Slide 37

Slide 37 text

Texas GIS Forum Workshop, October 22, 2019 Spatial Data Science Takeaways ● Spatial Data Science is data science with spatial attributes (with a little more complexity) ● In Spatial Data Science, the journey to discover the problem is just as, if not more important, than the answer ● It is important to start with a problem, then work from the beginning to align the steps to get there

Slide 38

Slide 38 text

Texas GIS Forum Workshop, October 22, 2019 Now to the Notebooks!

Slide 39

Slide 39 text

Texas GIS Forum Workshop, October 22, 2019 Ways of Notebooking - Jupyter Notebook (Installation in a virtual environment with CARTOframes) - Docker - Google Colaboratory - nteract - Kaggle - Glitch

Slide 40

Slide 40 text

Texas GIS Forum Workshop, October 22, 2019 Notebooks for Today ● Intro to Python ● NumPy, Pandas, Seaborn ● Introduction to CARTOframes ● Helper Functions in CARTOframes ● Los Angeles Real Estate Data Cleaning ● Los Angeles Real Estate Exploratory Data Analysis