## Shapefile type: Polygon, (5), # of Shapes: 10
## Shapefile type: Polygon, (5), # of Shapes: 155

In this session we explore the details of spatial modelling, mapping and more advanced GIS work as they support social and cultural analytics. We agree with https://pakillo.github.io/R-GIS-tutorial/ that R is also great for spatial analysis. Check out http://spatialanalysis.co.uk/wp-content/uploads/2012/02/bike_ggplot.png for an example visualisation.

Before we proceed, you need to register for a Google Maps API key. First, you need to go to https://developers.google.com/maps/documentation/maps-static/get-api-key and follow the instructions under Quick Guide. Once you have a key, follow the instructions at https://github.com/dkahle/ggmap to register the Google Map key in R. If you have not done this yet, please leave swirl at the next question and return once you have a registered Google API key.

We use a number of libraries, starting with maptools, which provides us with a functions that help with reading and processing spatial data. You can load the package with library(maptools). Please note that this also loads the standard sp package, which provides classes for storing and manipulating different types of spatial data. It is kind of a standard in R. So new spatial packages expect data to be an sp object.

library(maptools)

In your environment you should see spatial datasets related to Haiti. In particular, we will investigate a dataset related to the earthquake that took place there in 2010. The dataset was crowd-sourced using the Ushahidi platform (https://wiki.ushahidi.com/display/WIKI/Datasets), a non-profit software company that enables crowdsourcing of information related to natural disasters and geopolitical events via web form, text messages or social media. We have some data collected during the 2010 Haiti earthquake crisis and aftermath, which we borrowed from Wes Mckinney’s Python for Data Analysis. Let us take a look at the data with head(haiti_quake).

head(haiti_quake)
## # A tibble: 6 x 55
##   Serial INCIDENT_TITLE INCIDENT_DATE       LOCATION DESCRIPTION CATEGORY
##    <int> <chr>          <dttm>              <chr>    <chr>       <chr>
## 1   4052 * URGENT * Ty… 2010-07-05 17:26:00 Jacmel,… Birthing C… 1. Urge…
## 2   4042 Citi Soleil s… 2010-05-18 16:26:00 Citi So… "We are wo… 1. Urge…
## 3   4041 Radio Commerc… 2010-04-26 13:14:00 Radio C… "i'm Louin… 5e. Com…
## 4   4040 Contaminated … 2010-04-26 14:19:00 Marc ne… "How do we… 4. Mena…
## 5   4039 Violence at &… 2010-04-26 14:27:00 unable … "Goodnight… 4. Mena…
## 6   4038 No electricit… 2010-03-15 10:58:00 Pernier  "why the p… 2. Urge…
## # ... with 49 more variables: LATITUDE <dbl>, LONGITUDE <dbl>,
## #   APPROVED <chr>, VERIFIED <chr>, category_1 <int>, category_1a <int>,
## #   category_1b <int>, category_1c <int>, category_1d <int>,
## #   category_2 <int>, category_2a <int>, category_2b <int>,
## #   category_2c <int>, category_2d <int>, category_2e <int>,
## #   category_2f <int>, category_3 <int>, category_3a <int>,
## #   category_3b <int>, category_3c <int>, category_3d <int>,
## #   category_3e <int>, category_4 <int>, category_4a <int>,
## #   category_4c <int>, category_4e <int>, category_5 <int>,
## #   category_5a <int>, category_5b <int>, category_5c <int>,
## #   category_5d <int>, category_5e <int>, category_6 <int>,
## #   category_6a <int>, category_6b <int>, category_6c <int>,
## #   category_7 <int>, category_7a <int>, category_7b <int>,
## #   category_7c <int>, category_7d <int>, category_7g <int>,
## #   category_7h <int>, category_8 <int>, category_8a <int>,
## #   category_8c <int>, category_8d <int>, category_8e <int>,
## #   category_8f <int>

Each observation is a report by a mobile phone indicating an emergency or some other problem after the quake. Each has an associated timestamp and a location as latitude and longitude, which we will use to map the incidents. The many category fields might be confusing, but they relate to specific events such as occurrence of violence or food shortage.

haiti_quake is a normal data frame. But if you check your environment, you can see that it also contains other spatial data such as SpatialPointsDateFrames, SpatialPolygonsDataFrames and SpatialLinesDataFrames. The names should be quite telling but we will discuss them in detail, too. There are two spatial datasets for Haiti; one containing the departmental boundaries of Haiti and another one with all the roads.

As a first step, we transform haiti_roads into a data frame and take a look at it in a format we understand. Run head(data.frame(haiti_roads)) to print the first rows of the roads data. You can see that haiti_roads contains road names, but also additional information such as whether these are one-way, etc.

head(data.frame(haiti_roads))
##                       name        type   osm_id  ref oneway bridge
## 0                Rue Magny residential 23860949 <NA>      0      0
## 1   Boulevard Henri Truman     primary 23861001 <NA>      1      0
## 2   Boulevard Henri Truman     primary 23861018 <NA>      1      0
## 3   Boulevard Henri Truman     primary 23861022 <NA>      1      1
## 4           Nationale No 2     primary 23861052    2      1      0
## 5 Boulevard des Industries   secondary 23861200 <NA>      1      0

Let us also take a look at the departments of Haiti with head(data.frame(haiti_departments)). Again, we can see spatial information but also additional information per department such as the population density.

head(data.frame(haiti_departments))
##   ID_ADM1       ADM1 Shape_Leng Shape_Area        Capital Area Population
## 1       1      Ouest   7.216540  0.4267732 Port-au-Prince 4827    4029705
## 2       2    Sud-Est   4.042320  0.1740848         Jacmel 2023     632601
## 3       3       Nord   3.834594  0.1810655    Cap-Haïtien 2106    1067177
## 4       4   Nord-Est   2.478687  0.1400675   Fort-Liberté 1805     393967
## 5       5 Artibonite   4.993639  0.4176159       Gonaives 4984    1727524
## 6       6     Centre   3.601807  0.2969891         Hinche 3675     746236
##   Population_density
## 1           834.8260
## 2           312.7044
## 3           506.7317
## 4           218.2643
## 5           346.6140
## 6           203.0574

To proceed with the spatial analysis of the Ushahidi data, we would like to create a spatial dataset of points with additional information. This SpatialPointsDataFrame will contain the information in haiti_quake and is quite easy to produce in R. We first need to extract the longitude and latitude information from haiti_quake. Hopefully, you remember cbind and can run without problems coords_tmp <- cbind(haiti_quake\(LONGITUDE, haiti_quake\)LATITUDE).

coords_tmp <- cbind(haiti_quake$LONGITUDE, haiti_quake$LATITUDE)

Next we simply create the SpatialPointsDataFrame with these coordinates and the other information from haiti_quake. Please type haiti_quake_sdf <- SpatialPointsDataFrame(coords_tmp, data = data.frame(haiti_quake), proj4string=crswgs84). In case you wonder about proj4string, you should read through Part III of https://cran.r-project.org/doc/contrib/intro-spatial-rl.pdf. It explains that CRS stands for Coordinate Reference System of spatial objects. It sets where coordinates are placed on the Earth’s surface, which is difficult because we need to project from an almot spherical earth surface to a two-dimensional map. CRS information should always be provided, because the projections from the earth sphere to the map can be very different. Every spatial object should have a coordinate reference system (CRS) associated with it. In this case, we use the ‘British National Grid - WGS84’, a commonly used CRS.

haiti_quake_sdf <- SpatialPointsDataFrame(coords_tmp, data = data.frame(haiti_quake), proj4string=crswgs84)

I have created a little demo for you to demonstrate how projections can influence the way the world looks like on a map. Run demo_projections() in order to compare four common projections for mapping the world onto a two-dimensional surface. You can clearly see how different continents look. You might want to check online the details of the four projections.

demo_projections()
## OGR: Corrupt data
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
## OGR: Corrupt data
## OGR: Corrupt data
## OGR: Corrupt data