## Shapefile type: Polygon, (5), # of Shapes: 988
After the warm-up exercise, where we learned the nuts and bolts of spatial analysis, we are ready to face a complete case study in historical spatial analysis. In particular, we concentrate on places of memory, freely defined by us after the French historian Pierre Nora in his three-volume collection Les Lieux de Memoire or symbolic elements of the memorial heritage of any community (Nora, Pierre, From lieux de memoire to realms of memory, 1996). While Nora was interested in monuments and museums, we would like to free memories from such formal institutions and simply understand which places play an important role in the memories of those who participated in events. We concentrate on memories from oral testimonies of Holocaust survivors.
There are many collections of testimonies on the Holocaust. Some like the Shoah Foundation collections (https://sfi.usc.edu/) belong to the largest digital cultural collections; mainly because they contain video recordings of the interviews. We will deal with only the textual transcripts of the interviews, which were provided to us by the United States Holocaust Memorial Museum (https://www.ushmm.org/) in the context of the European Holocaust Research Infrastructure (http://www.ehri-project.eu) project as a set of ASCII text files. EHRI was started in 2010 and is currently funded under the Horizon2020 programme. It is a joint undertaking of Holocaust historians, archivists and specialists in the digital humanities.
In total, the testimonies contain 1,882 text files of very varying quality, with recording going back to the 1980s. They are interviews and are all in English, but are still multilingual, as they are not by English-native speakers, who, for instance, use their mother tongue to express place or organization names.
In this case study, we would like to map all the place names linked to one particular testimony. We have selected the interview with Ernst Weihs, conducted at 30 May 1989 at the United State Holocaust Memorial. He was originally from Vienna but was deported to Poland and then to Dachau, where he was liberated by the Americans and lived in Maryland at the time of the interview. Let us create a pointer to the Weihs testimony with filename <- path_to_data_file(“RG-50.030.0248_trs_en.txt”).
filename <- path_to_data_file("RG-50.030.0248_trs_en.txt")
We have already learned that we can extract place names from text documents with information extraction. We have worked with OpenCalais but EHRI has developed its own dedicated service. This service is based on its highly curated list of relevant Holocaust locations, which makes the extraction much more accurate. I have created a wrapper function that you can use. To extract the Holocaust place names from the Weihs testimony, run memory_locations <- extract_places_EHRI(filename).
memory_locations <- extract_places_EHRI(filename)
Take a look at memory_locations by running View(memory_locations). You can see there are a couple of place names as well as their longitudes (lon) and latitudes (lat).
View(memory_locations)
As a first step, we use ggmap to plot locations. The library should already be loaded. We focus on the memory locations in Europe. With europe_gmap <- get_map(location = “Europe”, zoom = 4), we download a map of Europe from Google.
europe_gmap <- get_map(location = "Europe", zoom = 4)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=Europe&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx-3VMyfPew
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Europe&key=xxx-3VMyfPew
To set up the map, we can simply type p <- ggmap(europe_gmap) + geom_point(aes(x=lon, y=lat), data=memory_locations, colour = “darkred”, size = 2).
p <- ggmap(europe_gmap) + geom_point(aes(x=lon, y=lat), data=memory_locations, colour = "darkred", size = 2)
The map has various issues such as a strong overlap of locations, which we will address next with a better mapping framework. But before that we would like to introduce the ggmap’s geocode function to you, with which you can use Google’s API to geocode locations in R. Try and execute geocode(c(“Berlin”, “Paris”, “Piccadilly Circus”, “Big Ben”)) to get the locations’ latitudes and longitudes.
geocode(c("Berlin", "Paris", "Piccadilly Circus", "Big Ben"))
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Berlin&key=xxx-3VMyfPew
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Paris&key=xxx-3VMyfPew
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Piccadilly+Circus&key=xxx-3VMyfPew
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Big+Ben&key=xxx-3VMyfPew
## # A tibble: 4 x 2
## lon lat
## <dbl> <dbl>
## 1 13.4 52.5
## 2 2.35 48.9
## 3 -0.135 51.5
## 4 -0.125 51.5
You can see we can find the coordinates of cities but also monuments, etc. geocode is really very powerful. Still, EHRI needed a dedicated service, as many of the Holocaust-relevant locations are not listed by Google anymore. They are from a different time in history.
Using memory_locations, we continue with our exploration of the places of memory using thematic maps. As seen, with the tmap package, thematic maps can be generated in R. It follows the grammar of graphs of ggplot. As a warm up, we follow a few examples from https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html. Load a shape object of the world (contained in the package) with data(World).
data(World)
qtm is tmap’s function to print out a quick map. Print out a map of the world with qtm(World).
qtm(World)
A simple choropleth is created in tmap with tm_shape(World) + tm_polygons(“HPI”).
tm_shape(World) + tm_polygons("HPI")
You can filter only the European continent with the argument filter. Try tm_shape(World, filter = World$continent==“Europe”) + tm_polygons(“HPI”, id = “name”).
tm_shape(World, filter = World$continent=="Europe") + tm_polygons("HPI", id = "name")
Rather than learning all the details of tmap as we have done for ggplot, in this case we will simply adopt the existing examples of the vignette. You will often work like this as a data scientist and change example code. data(land, rivers, metro) adds information on land, rivers and metropolitan areas. Load them, please.
data(land, rivers, metro)
The main tmap method is tm_shape, which specifies the shape object. It is comparable to ggplot’s ggplot. Using the basic method, we can add any kind of spatial object such as polygons, lines, points or rasters. The derived layers finally draws on top of the geographical objects additional ones such as markers, borders or bubbles. Multiple shapes and also multiple layers per shape can be plotted. To see them all in action, run tm_shape(land) + tm_raster(“elevation”, palette = terrain.colors(10)) + tm_shape(World) + tm_borders(“white”, lwd = .5) + tm_text(“iso_a3”, size = “AREA”) + tm_shape(metro) + tm_symbols(col = “red”, size = “pop2020”, scale = .5) + tm_legend(show = FALSE). In this expression, there are shapes (such as land), rasters of trees, bubbles, text and legends; all stacked on top of each other!
tm_shape(land) + tm_raster("elevation", palette = terrain.colors(10)) + tm_shape(World) + tm_borders("white", lwd = .5) + tm_text("iso_a3", size = "AREA") + tm_shape(metro) + tm_symbols(col = "red", size = "pop2020", scale = .5) + tm_legend(show = FALSE)
With these examples, we can create excellent examples of visualisations for the places of memories in the testimonies. But we first need to enrich the memory_locations. In particular, we would like to count the number of locations per geographical area. This can be done in R with plyr’s count function, which groups a data frame according to a specified features and counts the number of occurences. Then, we left-join the result of this operation into the locations as we did above to enrich the Haiti data. memory_locations_tmp <- dplyr::left_join(memory_locations, plyr::count(memory_locations, c(“name”)), by = c(“name”)). Please, note that instead of loading plyr and dplyr we use the ::-operator to address only its count function.
memory_locations_tmp <- dplyr::left_join(memory_locations, plyr::count(memory_locations, c("name")), by = c("name"))
Next we create a spatial points data frame. We start as above with defining the coordinates. Run coords_tmp <- cbind(memory_locations_tmp\(lon, memory_locations_tmp\)lat).
coords_tmp <- cbind(memory_locations_tmp$lon, memory_locations_tmp$lat)
Finally define memory_locations_sdf <- SpatialPointsDataFrame(coords_tmp, data = data.frame(memory_locations_tmp), proj4string = CRS(“+init=epsg:4326”)). This spatial data frame contains the information of memory locations and uses the http://www.epsg.org/ spatial reference system. 4326 is the World Geodetic System (WGS) (https://en.wikipedia.org/wiki/World_Geodetic_System), which EHRI uses for spatial referencing.
memory_locations_sdf <- SpatialPointsDataFrame(coords_tmp, data = data.frame(memory_locations_tmp), proj4string = CRS("+init=epsg:4326"))
We do not need memory_locations_tmp anymore and remove it with rm(memory_locations_tmp).
rm(memory_locations_tmp)
Let us plot all the places of memory in the world with tm_shape(World) + tm_polygons(“MAP_COLORS”, palette=“Pastel2”) + tm_shape(remove.duplicates(memory_locations_sdf)) + tm_bubbles(“freq”, title.size = “Places of Memory”, “red”).
tm_shape(World) + tm_polygons("MAP_COLORS", palette="Pastel2") + tm_shape(remove.duplicates(memory_locations_sdf)) + tm_bubbles("freq", title.size = "Places of Memory", "red")
This is a bubble map, where the size of the bubble represents the number of entries for a particular memory location. Have you noticed the remove.duplicates functions in the last two statements? With it we have removed all duplicates from a spatial data frame and avoided over-plotting the same bubble again and again. remove.duplicates is part of the sp package. From the world we can zoom in onto Germany with the bbox attribute in tm_shape. Run tm_shape(World, bbox = “Germany”) + tm_polygons(“MAP_COLORS”, palette=“Pastel2”) + tm_shape(remove.duplicates(memory_locations_sdf)) + tm_bubbles(“freq”, title.size = “Places of Memory”) + tm_text(“name”, size = “freq”, legend.size.show = FALSE, root=8, size.lowerbound = .7, auto.placement = TRUE).
tm_shape(World, bbox = "Germany") + tm_polygons("MAP_COLORS", palette="Pastel2") + tm_shape(remove.duplicates(memory_locations_sdf)) + tm_bubbles("freq", title.size = "Places of Memory") + tm_text("name", size = "freq", legend.size.show = FALSE, root=8, size.lowerbound = .7, auto.placement = TRUE)
We progress with our spatial exploration by comparing Germany and Austria, as Weihs was born in Vienna. First we need to load their shapefiles. Shapefiles are one of the most common ways spatial data are shared and are easily read into R using readOGR() from the rgdal package. Please, look it up online.
We learned earlier how to manually download shapefiles. But there is also an R-function, which does that for us. It is part of the raster package and called getdata. It gets geographic data for anywhere in the world. Data is read from files that are first downloaded if necessary. Run austria1 <- getData(“GADM”, country=“AUT”, level=1) to download the Global Administrative Areas (GADM) of Austria (AUT) at level 1, which is the level of administrative boundaries. This is basically the Austrian version of the map we have used for Haiti earlier.
austria1 <- getData("GADM", country="AUT", level=1)
To count the memory places in the administrative areas of the Austria shapefile, we use poly.counts. Define austria1$memory_places <- poly.counts(memory_locations_sdf, austria1).
austria1$memory_places <- poly.counts(memory_locations_sdf, austria1)
Print out head(austria1) to see that this has worked.
head(austria1)
## OBJECTID ID_0 ISO NAME_0 ID_1 NAME_1 HASC_1 CCN_1 CCA_1
## 1 1 16 AUT Austria 1 Burgenland AT.BU 0
## 2 2 16 AUT Austria 2 Kärnten AT.KA 0
## 3 3 16 AUT Austria 3 Niederösterreich AT.NO 0
## 4 4 16 AUT Austria 4 Oberösterreich AT.OO 0
## 5 5 16 AUT Austria 4 Oberösterreich AT.OO 4 4
## 6 6 16 AUT Austria 5 Salzburg AT.SZ 0
## TYPE_1 ENGTYPE_1 NL_NAME_1
## 1 Bundesländ|Länd State
## 2 Bundesländ|Länd State
## 3 Bundesländ|Länd State
## 4 Bundesländ|Länd State
## 5 Bundesländ|Länd State
## 6 Bundesländ|Länd State
## VARNAME_1
## 1 Burgenlândia
## 2 Carinthia|Caríntia|Carintia
## 3 Lower Austria|Baixa-Áustria|Baja Austria|Niederdonau|Österreich unter der Enns
## 4 Upper Austria|Alta-Áustria|Alta Austria|Österreich ober der Enns|Oberösterreich
## 5 Upper Austria|Alta-Áustria|Alta Austria|Österreich ober der Enns|Oberösterreich
## 6 Salzburgo
## memory_places
## 1 0
## 2 0
## 3 0
## 4 4
## 5 0
## 6 1
With austria1\(short_name <- substring(austria1\)HASC_1, 4), we extract a short name of HASC_1. Do you know how substring works?
austria1$short_name <- substring(austria1$HASC_1, 4)
Finally, we assign the qtm plot with austria1_map <- qtm(austria1, fill=“memory_count”, text=“short_name”, text.size=“AREA”, style=“gray”, text.root=5, fill.title=“Memories”).
austria1_map <- qtm(austria1, fill="memory_places", text="short_name", text.size="AREA", style="gray", text.root=5, fill.title="Memories")
Plot the map of Austria by simply typing austria1_map and you will see that Vienna is the dominant place of memory.
austria1_map
In order to compare Austria with Germany, we will repeat the process with Germany, starting with the download of the relevant shapefile. Run germany1 <- getData(“GADM”, country=“DE”, level=1).
germany1 <- getData("GADM", country="DE", level=1)
Count the memory places with germany1$memory_places <- poly.counts(memory_locations_sdf, germany1).
germany1$memory_places <- poly.counts(memory_locations_sdf, germany1)
Define the short administrative name with germany1\(short_name <- substring(germany1\)HASC_1, 4).
germany1$short_name <- substring(germany1$HASC_1, 4)
Create the map by typing in germany1_map <- qtm(germany1, fill=“memory_places”, text=“short_name”, text.size=“AREA”, style=“gray”, text.root=5, fill.title=“Memories”).
germany1_map <- qtm(germany1, fill="memory_places", text="short_name", text.size="AREA", style="gray", text.root=5, fill.title="Memories")
Finally, print out the result with germany1_map to see that the south of Germany dominates the memory locations.
germany1_map
In order to compare the places of memory in Austria we can furthermore use faceting. Run tm_shape(austria1) + tm_fill(“memory_places”, legend.show = FALSE) + tm_facets(“NAME_1”, free.coords=TRUE, drop.units=TRUE) to compare all the administrative districts in Austria with regard to the memory locations they contain. Vienna is very dominant here.
tm_shape(austria1) + tm_fill("memory_places", legend.show = FALSE) + tm_facets("NAME_1", free.coords=TRUE, drop.units=TRUE)
Of course, we can also compare geographical objects with each other by, for instance, arranging them next to each other. To compare Germany and Austria, try tmap_arrange(germany1_map, austria1_map).
tmap_arrange(germany1_map, austria1_map)
One more feature to explore of tmap before we move on is interactive plotting. To start interactive plotting, please type in tmap_mode(“view”).
tmap_mode("view")
## tmap mode set to interactive viewing
If you now run germany1_map again, you can interactively explore the map of Germany. You can change the background to OpenStreetMap and explore many other features.
germany1_map
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
Take a look at https://cran.r-project.org/web/packages/tmap/vignettes/tmap-modes.html and you will see that these interactive tmaps are produced with the JavaScript leaflet framework (http://leafletjs.com/) to publish online and mobile-friendly interactive maps. You can even export interactive tmaps to leaflets for external use in websites. But we want to move on and return to the normal plotting mode by running tmap_mode(“plot”).
tmap_mode("plot")
## tmap mode set to plotting
In this final part, we explore historical GIS and mapping work. For a quick overview, check https://en.wikipedia.org/wiki/Historical_geographic_information_system. It defines historical GIS as analysing historical geographic data and tracking geographical changes in time. We will complete our case study with a short expose of a historical GIS analysis. We have downloaded a map of the German empire from the 1940s from http://www.censusmosaic.org/data/historical-gis-files. The service is free but you need to register. After this session, you might want to check out the maps you can find there.
The historical map of the German empire is preloaded into a spatial polygon data frame in your environment, which is called germany_empire. Check it out with head(data.frame(germany_empire)).
head(data.frame(germany_empire))
## AREA PERIMETER ID LAND NAME STATUS RB TYPE
## 0 823.15809 211.93819 11001 11000 ZERBST K / /
## 1 361.51713 165.80485 11005 11000 BERNBURG K / /
## 2 10.94269 12.86059 11010 11000 KOETHEN S / /
## 3 764.87856 210.39059 11007 11000 DESSAU-KOETHEN K / /
## 4 55.95969 32.40164 11012 11000 DESSAU S / /
## 5 23.01582 20.17352 11011 11000 ZERBST S / /
Print out this historical map with qtm(germany_empire).
qtm(germany_empire)
Next we would like to create 1,000 random locations within the boundaries of germany_empire. The function spsample can do this. Its parameters should be self-evident. Please create random_points <- spsample(germany_empire, n=1000, type=“random”).
random_points <- spsample(germany_empire, n=1000, type="random")
Let us count for each Nazi Gau (an administration area), how many of the random locations are part of it. We know how already and enter germany_empire$memory_places <- poly.counts(random_points, germany_empire).
germany_empire$memory_places <- poly.counts(random_points, germany_empire)
Let us create a choropleth of the result with qtm(germany_empire, fill=“memory_places”, fill.title=“Memories”).
qtm(germany_empire, fill="memory_places", fill.title="Memories")
For a true random distribution there are some suspicious cluster of locations. But we ignore this right now, as we want to now map memory_locations_sdf onto germany_empire. A straight-forward mapping will not work in this case, as both have different projections. You can find out about the projection of memory_locations_sdf by typing in proj4string(memory_locations_sdf).
proj4string(memory_locations_sdf)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
Now let us check proj4string(germany_empire). You will see that both spatial objects have different epsg codes. memory_locations_sdf has epsg:4326, while germany_empire has epsg:32633. The projections are btw generally found in a file with the extions prj that you download with the shapefiles. Check it out on any of the shapefile sources we have mentioned.
proj4string(germany_empire)
## [1] "+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
To count the memory locations we therefore need to transform germany_empire first and recreate it according to the epsg:4326 projection. That can be easily done in R with spTransform using the epsg:4326 as the projection transformation argument. Run germany_empire_t <- spTransform(germany_empire, CRS(“+init=epsg:4326”)) to create a new spatial polygon data frame with the correct projection. Remember that the process of converting from one CRS or projection to another is handled by the spTransform() methods in the rgdal package. spTransform() has methods for all sp objects including SpatialPolygonsDataFrame, but does not work on raster objects.
germany_empire_t <- spTransform(germany_empire, CRS("+init=epsg:4326"))
Now, we can count memory locations per Gau with germany_empire_t$memory_places <- poly.counts(memory_locations_sdf, germany_empire_t).
germany_empire_t$memory_places <- poly.counts(memory_locations_sdf, germany_empire_t)
Just to demonstrate that the new projection has changed the map, we also try table(poly.counts(random_points, germany_empire_t)) and see that none of the random locations from earlier are now assigned to the administrative areas of germany_empire. The count is 0 for all administrative areas.
table(poly.counts(random_points, germany_empire_t))
##
## 0
## 988
Let us create a choropleth map with germany_empire_map <- qtm(germany_empire_t, fill=“memory_places”, fill.title=“Memories”).
germany_empire_map <- qtm(germany_empire_t, fill="memory_places", fill.title="Memories")
Print it out in interactive mode but set tmap_mode(“view”) first.
tmap_mode("view")
## tmap mode set to interactive viewing
Now, create the the interactive map with germany_empire_map. Et voila.
germany_empire_map
Finally, set plotting back to normal with tmap_mode(“plot”).
tmap_mode("plot")
## tmap mode set to plotting
You can now easily export your interactive map germany_empire_map by clicking on Export on the Viewer window of RStudio, choose ‘Save as Web Page’ and save it somewhere on your machine. You can open this saved file in any web browser. If you run your own web page you can easily embed the map as the HTML file.
On to some quick test questions.
What does historical spatial analysis do?
Uses historical maps
Get the geocode for “Kings College London” from Google.
geocode("Kings College London")
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Kings+College+London&key=xxx-3VMyfPew
## # A tibble: 1 x 2
## lon lat
## <dbl> <dbl>
## 1 -0.116 51.5
Change tm_shape(World) + tm_polygons(“HPI”) to plot life expectency across Europe. The fill attribute is “life_exp”.
tm_shape(World) + tm_polygons("life_exp")
Next we create a memory count map for modern Poland. Define poland1 using getData. The country code is “POL”.
poland1 <- getData('GADM', country='POL', level=1)
Count the memory places and define poland1$memory_places.
poland1$memory_places <- poly.counts(memory_locations_sdf, poland1)
We skip the step to create short names and create the plot directly. Use the same qtm command again, but with text=“HASC_1” and poland1.
qtm(poland1, fill="memory_places", text="HASC_1", text.size="AREA", style="gray", text.root=5, fill.title="Memories")
That’s it again. You have learned a lot of things about how to create maps and conduct spatial analysis. There is much more to learn. Check out spatial analysis with R online.