Shreya Vuttaluru, Tampa Bay Times
Ryan Little, The Baltimore Banner
This GitHub repository accompanies a class on spatial analysis in R. We'll cover basic spatial functions for transforming data and explore tools for exploratory mapping, spatial joins, buffering, calculating spatial distances between points and spatial indexing.
We'll be using the sf
package for geospatial functions. This package has many of the same functions available in geospatial software like ArcGIS, QGIS and PostGIS.
Some advantages of using sf
for spatial analysis:
- Easy to integrate with other data analysis and cleaning steps
- Reproduce your scripts
- Fairly easy visualization
- Faster processing time for large datasets
- Free! 💰
Clone this repository locally to start!
- sf is a relatively intuitive R package with functions for spatial analysis that integrates well with other data analysis and cleaning steps.
- raster provides functions for reading, writing, processing, and analyzing gridded spatial data, such as satellite imagery, digital elevation models (DEMs), and climate data.
- mapview is a library built off of the leaflet API for creating interactive web maps. Consider using leaflet instead for more customizable maps.
- tidycensus and tigris provide spatial data for census geographies across the U.S.
- Vector Data:
- Represents geographic features like points, lines, and polygons.
- Example: Points representing cities or landmarks, linear features like roads, rivers, or boundaries.
- Represents geographic features like points, lines, and polygons.
- Raster Data:
- Represents geographic features as a grid of cells or pixels, where each cell has a value representing a specific attribute.
- Example: Satellite imagery, digital elevation models (DEMs), land cover classifications.
- Represents geographic features as a grid of cells or pixels, where each cell has a value representing a specific attribute.
- When filing records requests, it might be helpful to include one or more of these filetypes. You can also ask for polygon, multipolygon, or latitude/longitude fields.
- ESRI Shapefile (.shp):
- Standard format for geographic data, holding both shape geometry and attribute information.
- KML (.kml):
- Google Earth's language for displaying geographic data, featuring points, lines, and polygons.
- Compressed KML (.kmz):
- Compact version of KML, convenient for sharing as a single compressed file.
- ESRI Geodatabase (.gdb):
- ESRI's comprehensive spatial data management system, storing various data types within a structured folder.
- R Data Set (.rds):
- File format in R for storing spatial data objects, facilitating analysis and manipulation within the R environment.
- GeoJSON (.geojson):
- Lightweight format for encoding geographic data structures in a human-readable text format, widely used for web mapping and interoperability purposes.
- Point-in-Polygon:
- Assigns attributes from polygons to points that fall within them. Implemented with
st_join
usingjoin = st_within
.
- Assigns attributes from polygons to points that fall within them. Implemented with
- Polygon-on-Polygon:
- Assigns attributes from one polygon layer to another based on their spatial intersection, or whether one polygon falls inside another. Implemented with
st_join
usingjoin = st_intersects
.
- Assigns attributes from one polygon layer to another based on their spatial intersection, or whether one polygon falls inside another. Implemented with
- Intersect:
- Computes the intersection between geometries, returning shared portions of intersecting geometries as a new
sf
object usingst_intersection
.
- Computes the intersection between geometries, returning shared portions of intersecting geometries as a new
- Union:
- Combines geometries from multiple layers into a single
sf
object, preserving all features and their attributes usingst_union
.
- Combines geometries from multiple layers into a single
- Difference:
- Computes the geometric difference between two
sf
objects and removes overlapping portions based on their intersection withst_difference
.
- Computes the geometric difference between two
- Buffer:
- Creates buffer zones around spatial features by generating new geometries at a specified distance from the original features using
st_buffer
.
- Creates buffer zones around spatial features by generating new geometries at a specified distance from the original features using
- Nearest Neighbor:
- Determines the nearest feature in one layer to each feature in another layer, achieved using spatial indexing and distance calculations, not directly supported by
st_join
.
- Determines the nearest feature in one layer to each feature in another layer, achieved using spatial indexing and distance calculations, not directly supported by
Check out the documentation for st_join for more.
- Choosing the right CRS is important for accurately representing spatial data on maps.
- There are two main types of CRS: geographic and projected.
- Geographic CRS (e.g., CRS 4326/WGS84):
- Uses latitude and longitude coordinates, like a grid laid over a globe.
- Projected CRS (e.g., CRS 5070/NAD83/Conus Albers):
- Flattens the Earth's surface onto a map, helping with calculating in distance, area, and direction. Stick with a projected system if you plan to do any math.
- When performing joins, ensure your CRS match each other. Pick one and stick with it.
- Popular CRS include 4326 and 5070 for any US-based spatial analysis you might do, but research the most appropriate system for your region!
Other Resources for Understanding CRS:
Here's this documentation in Google Docs.