Slide 1

Slide 1 text

Geospatial Analysis with R Michael Harper, Luke Blunden September 9, 2017 University of Southampton 1 / 66

Slide 2

Slide 2 text

Uses of Spatial Data in Science • spatial analysis: using location or spatial relationships as an explanatory or predictive variable • examining the effects of scale • backdrops to show context Figure 1: Cholera Map of 1854 London Outbreak 2 / 66

Slide 3

Slide 3 text

GIS vs. R • Many tools options also available • Other tools avaialble (QGIS, Python) Which tool is best to use? Well, it depends on the task! 3 / 66

Slide 4

Slide 4 text

GIS vs. R GIS • Visual interaction • Data management • Geometric operations • Standard workflows • Single map production • Click, click, click, click • Speed of execution R • Data and model focused • Analysis • Attributes as important • Creativity and innovation • Many (simpler) maps • Repeatability (script) • Speed of development 4 / 66

Slide 5

Slide 5 text

R Examples (1) Figure 2: London Cycle Hire Journeys http://spatial.ly/2012/02/great-maps-ggplot2/ 5 / 66

Slide 6

Slide 6 text

R Examples (2) Figure 3: Wind Turbine Analysis of Great Britain 6 / 66

Slide 7

Slide 7 text

Challenges with Spatial Data in R Understanding and manipulating spatial data in R can be a real challenge! • Many types and formats of spatial data • R is not the easiest program to learn • There are a many packages from a very diverse user community 7 / 66

Slide 8

Slide 8 text

Presentation Goals 1. Help you get to the point where you can do basic analysis or visualization 2. Point out some of the commonly-used packages for spatial data manipulation and analysis 3. Be a resource you can come back to 4. Provide Guidance on useful resources to use after the course Beware of the learning curve! Takes patience to master. 8 / 66

Slide 9

Slide 9 text

Outline 1. Creating Spatial Objects 2. Data Table Operations 3. Coordinate Systems and Reprojection 4. Geoprocessing 5. Presenting Results Subjects Omitted More advanced visualization, Spatial statistics, Geodatabases, Network analysis, Many other... 9 / 66

Slide 10

Slide 10 text

Before We Start… • Many packages are used within the analysis. library(rgdal) # loads geospatial libraries library(rgeos) # dealing with geospatial files library(sp) # spatial projections library(raster) # loading rasters and geoprocessing library(ggplot2) # Creating maps library(spatstat) # Used for spatial statistics • Don’t worry for now. These will be explained in more detail later. 10 / 66

Slide 11

Slide 11 text

PART I: Creating Spatial Objects 11 / 66

Slide 12

Slide 12 text

Representing Physical Features Figure 4: Representation of Figure Layers Figure 5: Raster Features Figure 6: Vector Features 12 / 66

Slide 13

Slide 13 text

Creating SpatialPoints Objects • The package sp is primarily used for creating spatial data types • The code getClass('Spatial') can be run to see all types Three options to create spatial data in R: 1. Create from scratch 2. Promote a data frame with X and Y columns 3. Import a GIS file • Methods 2 and 3 are most commonly used! • Tutorial will focus on creation of Points 13 / 66

Slide 14

Slide 14 text

Method 1: Create a SpatialPoints Object from Scratch • Let’s create an example dataset! library(sp) # create a set of random coordinates xy <- matrix(data = runif(n = 100), ncol = 2) head(xy, n = 3) # show first three rows ## [,1] [,2] ## [1,] 0.3624658 0.1637974 ## [2,] 0.6252937 0.6651115 ## [3,] 0.8885394 0.8519063 • SpatialPoints function can be used to create a dataset from XY coordinates xySp <- SpatialPoints(xy) 14 / 66

Slide 15

Slide 15 text

Method 1: Create a SpatialPoints Object from Scratch SpatialPoints can be plotted as typical XY data using the plot() function plot(xySp, axes = TRUE, main = "Example Spatial Data Points") 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 Example Spatial Data Points 15 / 66

Slide 16

Slide 16 text

Method 2: Promote a DataFrame to a SpatialPointsDataFrame • It is unlikely you will use Method 1 often • You will more likely have an existing dataset with XY coordinates. • For the tutorial, We will used the canada.cities dataset from the maps package. library(maps) head(canada.cities, n = 3) ## name country.etc pop lat long capital ## 1 Abbotsford BC BC 157795 49.06 -122.30 0 ## 2 Acton ON ON 8308 43.63 -80.03 0 ## 3 Acton Vale QC QC 5153 45.63 -72.57 0 • Data already has X (long) and Y (lat) references. 16 / 66

Slide 17

Slide 17 text

Method 2: Promote a DataFrame to a SpatialPointsDataFrame • We must let R know which columns are the X and Y coordinates # Create a new variable to edit the data cities <- canada.cities # State which columns contain the x and y # coordinates coordinates(cities) <- ~long + lat • By assigning coordinates we create a SpatialPointsDataFrame: class(cities) ## [1] "SpatialPointsDataFrame" ## attr(,"package") ## [1] "sp" 17 / 66

Slide 18

Slide 18 text

Method 2: Promote a DataFrame to a SpatialPointsDataFrame We can map the results of the SpatialDataFrame as before: plot(cities, axes = TRUE, main = "Cities in Canada", xlab = "Longitude", ylab = "Latitude") −140 −120 −100 −80 −60 45 55 65 Cities in Canada Longitude Latitude • This map is not very useful yet! We will add to this map later! 18 / 66

Slide 19

Slide 19 text

Method 3: Import a GIS File • Package rgdal has a large set of import and export filters for both vector and raster GIS data (Shapefiles, KML, GPX etc.). • For the complete list of vector import filters, use the function ogrDrivers(). • The tutorial will focus on Shapefiles, a commonly used format in GIS software. • We will use the “scot_BNG” file. • readOGR() is the function you use to import a vector file. # Find the file path of the vectors vectorDir <- system.file("vectors", package = "rgdal") scotland <- readOGR(dsn = vectorDir, layer = "scot_BNG", verbose = FALSE) • Note: you don’t need to include the file extension when loading the object 19 / 66

Slide 20

Slide 20 text

Method 3: Import a GIS File • Shapefiles are typically easy to plot straight away plot(scotland, axes = T, asp = 1, col = palette()) −2e+05 0e+00 2e+05 4e+05 6e+05 8e+05 600000 800000 1000000 1200000 • We will come back to plotting again later! 20 / 66

Slide 21

Slide 21 text

PART II: Working With SpatialDataFrames 21 / 66

Slide 22

Slide 22 text

Working with the Data Frame • You can work with the data frame as you would with any data frame • To use the attribute table of a SpatialPointsDataFrame, use the data slot (with ‘@’ character): names(cities@data) ## [1] "name" "country.etc" "pop" "capital" • What happened to the coordinates? • We can read them with the coordinates() function coordinates(cities)[1, ] ## long lat ## -122.30 49.06 22 / 66

Slide 23

Slide 23 text

Calculating Summaries • SpatialDataframes can mostly be treated like a normal Dataframe mean(cities$pop) # Calculate Mean population ## [1] 27463.94 ggplot2::qplot(x = "", y = cities$pop, geom = "boxplot", main = "A questionable boxplot", ylab = "Population") + coord_flip() 0e+00 1e+06 2e+06 3e+06 4e+06 Population x A questionable boxplot 23 / 66

Slide 24

Slide 24 text

Adding a Column to the Data Frame of a SPDF Object • Calculate new columns in the data frame # Calculate the size order of cities cities$sizerank <- rank(-cities$pop) head(cities@data, n = 5) ## name country.etc pop capital sizerank ## 1 Abbotsford BC BC 157795 0 20 ## 2 Acton ON ON 8308 0 220 ## 3 Acton Vale QC QC 5153 0 314 ## 4 Airdrie AB AB 25863 0 86 ## 5 Aklavik NT NT 643 0 915 24 / 66

Slide 25

Slide 25 text

Filter on an Attribute • Standard R syntax can be used to filter SpatialDataFrames.1 # Filter cities by those with a population more # than 100,000 BigCities <- cities[cities$pop > 1e+05, ] # Count how many cities meet criteria nrow(BigCities) ## [1] 29 1Check out http://www.statmethods.net/management/subset.html 25 / 66

Slide 26

Slide 26 text

Filter on an Attribute par(mfrow = c(1, 2)) # Plot graphs side by side plot(cities, axes = T, asp = 1, pch = 16, main = "Original") plot(cities, axes = T, asp = 1, pch = 16, main = "Highlighing Big Cities plot(BigCities, pch = 1, col = "red", cex = 3, add = TRUE) −140 −120 −100 −80 −60 30 50 70 90 Original −140 −120 −100 −80 −60 30 50 70 90 Highlighing Big Cities Figure 7: Subsetting of spatial dataframes 26 / 66

Slide 27

Slide 27 text

Enriching Data • You may want to link data to your spatial data • e.g. deomgraphic data to census wards 27 / 66

Slide 28

Slide 28 text

Enriching Data: Example • Let’s join some census data with shapefiles for the Southampton region # Load Shapefile Solent <- readOGR(dsn = "../data/SolentLEP", "Solent_revised", verbose = FALSE) head(Solent[1:4], n = 2) ## OA11CD LAD11CD LAD11NM AREA_HECT ## 0 E00116566 E07000090 Havant 5.84000000 ## 1 E00116564 E07000090 Havant 9.75000000 nrow(Solent) ## [1] 3826 28 / 66

Slide 29

Slide 29 text

Enriching Data: Example • We can load some Census Data # Age data for all of the UK AgeData <- read.csv("../data/MedianAge.csv", stringsAsFactors = FALSE) head(AgeData,n=3) ## GeographyCode MedianAge ## 1 K04000001 39 ## 2 E92000001 39 ## 3 W92000004 41 nrow(AgeData) ## [1] 223722 • The age data has records for the whole of England 29 / 66

Slide 30

Slide 30 text

Enriching Data: Example • We can easily merge the age data with the boundary data using the merge() function # Merge Datasets SolentMerged <- merge(Solent, AgeData, by.x = "OA11CD", by.y = "GeographyCode") # Lets focus on Southampton Southampton <- SolentMerged[SolentMerged$LAD11NM == "Southampton", ] 30 / 66

Slide 31

Slide 31 text

Enriching Data: Example spplot(Southampton, "MedianAge", main = "Median Age (years)") Median Age (years) 20 30 40 50 60 70 80 Figure 8: Median Age with Output Areas in Southampton. Data from 2011 Census 31 / 66

Slide 32

Slide 32 text

PART III: Coordinate Reference Systems 32 / 66

Slide 33

Slide 33 text

Coordinate Reference Systems (CRS) • Very important! R won’t give you a warning when loading like in ArcGIS. • You don’t always need to give one, but it is recommended to always do. • If you create spatial data, you will usually have to specify the projection. • packages sp and rgdal are used for managing CRS. Find out more A useful guide is provided here about spatial projection in R https://goo.gl/FAsv4k 33 / 66

Slide 34

Slide 34 text

Coordinate Systems: Example • Earlier, we created the SpatialDataFrame of Canadian Cities called cities. • We can check the spatial projection using the function crs() crs(cities) ## CRS arguments: NA • Currently the data has no CRS! We need to assign one • Dataset has WGS84 projection, which is the mostly commonly used # Assign Projection to Cities crs(cities) <- CRS("+init=epsg:4326") Find out spatial projection codes An online registry of spatial projections is available at http://www.epsg-registry.org for Coordinate Reference Systems 34 / 66

Slide 35

Slide 35 text

Coordinate Systems: Example • We will load data for administrative borders in Canada using the function getData() from the raster package 2. # Load Canada Data canada <- raster::getData("GADM", country = "CAN", level = 0, download = FALSE, path = "../data") # Check Coordinate System crs(canada) ## CRS arguments: ## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 2This provides global vector and raster data for countries, administrative boundaries, altitude models and climatic data. 35 / 66

Slide 36

Slide 36 text

Coordinate Systems: Example 36 / 66

Slide 37

Slide 37 text

Example: Reprojecting • The function spTransform() can be used to reproject a CRS GB <- raster::getData("GADM", country = "GBR", level = 1, path = "../data") GB_BNG <- spTransform(GB, CRS("+init=epsg:27700")) 15°W 10°W 5°W 0° 50°N 52°N 54°N 56°N 58°N 60°N WGS 1984 Coordinate System −4e+05 0e+00 4e+05 8e+05 0 200000 600000 1000000 British National Grid Figure 9: Comparison of coordinate systems 37 / 66

Slide 38

Slide 38 text

PART IV: Geoprocessing 38 / 66

Slide 39

Slide 39 text

Geoprocessing: rgeos package • rgeos is the main package used within R for geospatial analysis • interface to GEOS (Geometry Engine Open Source) Library Example Processes Union, distance, intersection, convex hull, envelope, buffer, simplify, polygon assembly, valid, area, length, intersects, touches, disjoint, crosses, within, contains, overlaps, equals, covers, etc. 39 / 66

Slide 40

Slide 40 text

Buffer • apply a 2km buffer to the county of Inverness inver <- scotland[scotland$NAME == "Inverness", ] inver_buf <- gBuffer(inver, width = 2000) 220000 240000 260000 280000 790000 810000 830000 850000 Buffer 40 / 66

Slide 41

Slide 41 text

Delaunay Triangulation • Form Delaunay triangulation for the cities in Canada delTri <- gDelaunayTriangulation(BigCities) plot(delTri, axes = TRUE, , main = "Delaunay Triangulation") plot(BigCities, col = "red", pch = 16, cex = 2, add = TRUE) 120°W 110°W 100°W 90°W 80°W 70°W 60°W 50°W 30°N 40°N 50°N 60°N 70°N Delaunay Triangulation 41 / 66

Slide 42

Slide 42 text

Spatial Proximity • How far is it to the nearest city? • spatstat provides adds a spatial proximity functions. library(spatstat) BigCities$Distance <- nndist(X = BigCities@coords) spplot(BigCities, "Distance") # Plot Results [0.1879,2.411] (2.411,4.633] (4.633,6.856] (6.856,9.078] (9.078,11.3] 42 / 66

Slide 43

Slide 43 text

PART V: Raster Data 43 / 66

Slide 44

Slide 44 text

Raster package • Most control through raster package • Read, write and manipulate gridded spatial data • Basic and high-level analysis functions • Processing of large files 44 / 66

Slide 45

Slide 45 text

Raster Layer: Create from Scratch myrast <- raster(extent(c(0, 20, 0, 20)), ncol = 10, nrow = 10) myrast[] <- 1:100 myrast ## class : RasterLayer ## dimensions : 10, 10, 100 (nrow, ncol, ncell) ## resolution : 2, 2 (x, y) ## extent : 0, 20, 0, 20 (xmin, xmax, ymin, ymax) ## coord. ref. : NA ## data source : in memory ## names : layer ## values : 1, 100 (min, max) 45 / 66

Slide 46

Slide 46 text

Raster Layer: Raster Algebra • Rasters can be added, multiplied etc. par(mfrow = c(1, 2)) # Plot graphs side by side plot(myrast, main = "Raster Value", asp = 1) myrastSquared <- myrast^2 plot(myrastSquared, main = "Raster Value Squared", asp = 1) 0 5 10 15 20 0 5 10 15 20 Raster Value 20 40 60 80 100 0 5 10 15 20 0 5 10 15 20 Raster Value Squared 2000 4000 6000 8000 10000 46 / 66

Slide 47

Slide 47 text

Raster Data: Import a GeoTiff • The raster() function from the raster package can be used WindSpeedRaster <- raster(x = "../data/WindSpeed45.tif") plot(WindSpeedRaster, "WindSpeed45") −2e+05 0e+00 2e+05 4e+05 6e+05 8e+05 0 200000 400000 600000 800000 1000000 1200000 0 2 4 6 8 10 12 14 Figure 10: UK Annualised Wind Speed Map at 45m above ground level 47 / 66

Slide 48

Slide 48 text

Raster Data: Crop a Raster • crop() function can be used to clip a raster to a spatial extent WindSpeedSolent <- crop(WindSpeedRaster, Solent) plot(WindSpeedSolent, asp = 1) 430000 440000 450000 460000 470000 80000 90000 100000 110000 120000 5 6 7 8 9 48 / 66

Slide 49

Slide 49 text

Overlay Points and Extract Values • Generate Some Points Within The Solent Region • The spsample() function from the package sp can to produce a sample within a given extent SolentPoints <- spsample(Solent, n = 25, type = "random") SolentPoints$ID <- 1:nrow(SolentPoints@coords) 430000 440000 450000 460000 470000 80000 90000 100000 110000 120000 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 49 / 66

Slide 50

Slide 50 text

Overlay Points and Extract Values • You can use the extract() function to calculate the value at points. extract(WindSpeedSolent, SolentPoints) ## [1] 6 7 NA 6 7 6 6 7 6 6 6 7 6 8 7 6 7 9 6 7 8 6 6 ## [24] 7 6 50 / 66

Slide 51

Slide 51 text

Create a Distance Surface • The function distance() can produce a proximity raster to the nearest spatial point • We must first rasterize() the Vector data (SpatialPoints) SolentPointsRast <- rasterize(SolentPoints, WindSpeedSolent, field = 1) dist2SamplePtsRast <- distance(SolentPointsRast) Original Points 430000 460000 80000 110000 Rasterized Points 430000 460000 80000 110000 Proximity Raster 0 5000 10000 15000 51 / 66

Slide 52

Slide 52 text

PART VI: Displaying Results 52 / 66

Slide 53

Slide 53 text

ggplot: the basics • Maps so far have been quite basic and produced with the functions plot() or spplot() • Easy to produce, but not the most attractive • Time to use ggplot2 • Same procedure as building graphs, with a few minor differences Warning Don’t worry if you don’t understand all the examples. They can be used for future reference. 53 / 66

Slide 54

Slide 54 text

ggplot Example: A World Map 54 / 66

Slide 55

Slide 55 text

ggplot Example: A World Map • Map made with less than 15 lines of code. library(mapdata) library(ggmap) # Load Data and select countries World <- map_data(map = "world") World <- World[World$region != "Antarctica", ] # Remove Antartica as it looks bad on the map # Select Highlight Regions StudyRegions <- World[World$region == c("UK", "USA", "UAE", "Egypt", "Kenya", "Cameroon", "Hungary", "France", "Thailand", "Vietnam"),] ggplot() + # Add the basemap geom_polygon(data = World, aes(x=long, y = lat, group = group), fill = "grey80", colour = "grey90", size = 0.25) + # Add the higlihhted Countries geom_polygon(data = StudyRegions, aes(x=long, y = lat, group = group), fill = "steelblue") + # Coordinates coord_fixed(1.3) + # Themes theme_nothing() + theme(panel.background = element_rect(fill = 'gray99'), plot.margin = unit(c(0,0,0,0), "cm")) 55 / 66

Slide 56

Slide 56 text

ggplot Examples: a choropleth map Another example from: http://docs.ggplot2.org/current/geom_map.html 25 30 35 40 45 50 −120 −100 −80 x y 5 10 15 Murder Murder Rates in the United States 56 / 66

Slide 57

Slide 57 text

ggplot Examples: a faceted map UrbanPop Rape Murder Assault −120 −100 −80 −120 −100 −80 25 30 35 40 45 50 25 30 35 40 45 50 x y 100 200 300 value 57 / 66

Slide 58

Slide 58 text

ggplot Examples: a choropleth map library(ggplot2) crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests) library(reshape2) # for melt crimesm <- melt(crimes, id = 1) library(maps) states_map <- map_data("state") # Murder Map # Data & Aesthetics ggplot(crimes, aes(map_id = state, fill = Murder)) + # Geometries geom_map(aes(fill = Murder), map = states_map, colour = "black") + scale_fill_gradient(low = "#91cf60", high = "#fc8d59") + # Coordinates expand_limits(x = states_map$long, y = states_map$lat) + coord_map() + # Theme ggtitle("Murder Rates in the United States") # Faceted Map ggplot(crimesm, aes(map_id = state)) + geom_map(aes(fill = value), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + facet_wrap(~variable) 58 / 66

Slide 59

Slide 59 text

ggplot Examples: spatial Points 40 50 60 −10 0 10 20 30 40 lon lat sqrt(flights) 10 20 30 Airports In Europe 59 / 66

Slide 60

Slide 60 text

ggplot Examples: spatial Points #data airports <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", header = FALSE) colnames(airports) <- c("ID", "name", "city", "country", "IATA_FAA", "ICAO", "lat", "lon", "altitude", "timezone", "DST") routes <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat", header=F) colnames(routes) <- c("airline", "airlineID", "sourceAirport", "sourceAirportID", "destinationAirport", "destinationAirportID", "codeshare", "stops", "equipment") library(plyr) departures <- ddply(routes, .(sourceAirportID), "nrow") names(departures)[2] <- "flights" arrivals <- ddply(routes, .(destinationAirportID), "nrow") names(arrivals)[2] <- "flights" airportD <- merge(airports, departures, by.x = "ID", by.y = "sourceAirportID") airportA <- merge(airports, arrivals, by.x = "ID", by.y = "destinationAirportID") map <- get_map(location = 'Europe', zoom = 4, messaging = FALSE) # create the data set containing both departures and arrivals airportD$type <- "departures" airportA$type <- "arrivals" airportDA <- rbind(airportD, airportA) ggmap(map) + geom_point(aes(x = lon, y = lat, size = sqrt(flights)), data = airportDA, alpha = .5) + # panels according to type (departure/arrival) + ggtitle("Airports In Europe") + guides(fill=guide_legend(title="New Legend Title")) 60 / 66

Slide 61

Slide 61 text

Conclusions 61 / 66

Slide 62

Slide 62 text

When to Use GIS vs. R? GIS • no mapping experience • one-off maps • serious cartography • your data live in a geodatabase R • no money for commercial software • need repeated analysis • integration with other analyses • development of new methods • working with large raster files 62 / 66

Slide 63

Slide 63 text

Additional Resources Overview of Spatial Packages • http://cran.r-project.org/web/views/Spatial.html Books 63 / 66

Slide 64

Slide 64 text

Additional Resources Tutorials • http://rpubs.com/RobinLovelace/11869 • http://science.nature.nps.gov/im/datamgmt/statistics/r/advanced/ spatial.cfm • http://spatial.ly/r/ • http://gis.stackexchange.com/questions/45327/ tutorials-to-handle-spatial-data-in-r • https://pakillo.github.io/R-GIS-tutorial/ R Spatial Cheatsheet • http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/ cheatsheet.html • https://github.com/wegmann/RSdocs/releases/download/ May2014_spatial/AniMove_refcard.pdf 64 / 66

Slide 65

Slide 65 text

Summary 1. Help you get to the point where you can do basic analysis or visualization 2. Point out some of the commonly-used packages for spatial data manipulation and analysis 3. Be a resource you can come back to 4. Provide Guidance on useful resources to use after the course Hopefully you have ticked a few of these boxes! 65 / 66

Slide 66

Slide 66 text

Thank You for Listening! 66 / 66