Introduction

I want to create an R object (scotland_la.rds) that is a shapefile that I can easily load into any analysis and join to datasets at Scotland Local Authority level.

Effictively all I am doing here is taking the data cleaning phase of the shapefile out of the equation so I can create maps a little quicker in future.

I’ve downloaded the shapefile from the ONS geoportal from this link. It is a big file so will not be pushing it to Github.

The file includes all UK Local Authority data so the main thing is trimming it down to Scotland level.

I’ve also downloaded the “Data zone to intermediate zone, local authority, health board….” excel file from this link to use as a lookup for local authority names.

Packages

Firstly I’ll load the required packages, some plot settings, and set the working directory using here()

library(tidyverse)
library(sf)
library(readxl)
library(extrafont)
theme_set(theme_minimal(base_size = 16, base_family = "Roboto"))
library(here)
here()
## [1] "C:/GitHub/clean_shapefile_scotland_LA"

Load data

Now I’ll import the shapefile and the lookup data

scotland_la <- st_read(here("raw_data/shapefiles/Local_Authority_Districts_December_2017_Full_Clipped_Boundaries_in_United_Kingdom_WGS84.shp"))
## Reading layer `Local_Authority_Districts_December_2017_Full_Clipped_Boundaries_in_United_Kingdom_WGS84' from data source `C:\GitHub\clean_shapefile_scotland_LA\raw_data\shapefiles\Local_Authority_Districts_December_2017_Full_Clipped_Boundaries_in_United_Kingdom_WGS84.shp' using driver `ESRI Shapefile'
## Simple feature collection with 391 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -163.4997 ymin: 5410.413 xmax: 655515.6 ymax: 1220231
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
la_lookup <- read_xlsx(path = here("raw_data/00534447.xlsx"),
                       sheet = "SIMD16 DZ look-up data") 

Data wrangle

I need to clean up both files a little bit.

Local authority lookup

Starting with the la_lookup object

glimpse(la_lookup)
## Observations: 6,976
## Variables: 27
## $ DZ                             <chr> "S01006506", "S01006507", "S010...
## $ DZname                         <chr> "Culter - 01", "Culter - 02", "...
## $ SIMD16_Rank                    <dbl> 5272, 4838, 6321, 5363, 4049, 6...
## $ SIMD16_Vigintile               <dbl> 16, 14, 19, 16, 12, 20, 17, 17,...
## $ SIMD16_Decile                  <dbl> 8, 7, 10, 8, 6, 10, 9, 9, 10, 1...
## $ SIMD16_Quintile                <dbl> 4, 4, 5, 4, 3, 5, 5, 5, 5, 5, 5...
## $ SIMD16_Employment_Domain_Rank  <dbl> 4260, 4830, 6101, 4599, 4055, 6...
## $ SIMD16_Income_Domain_Rank      <dbl> 4513, 4334, 5266, 5002, 3341, 5...
## $ SIMD16_Education_Domain_Rank   <dbl> 5367, 4017, 6032, 4914, 3509, 5...
## $ SIMD16_Health_Domain_Rank      <dbl> 4906, 5070, 5837, 3958, 3028, 5...
## $ SIMD16_Access_Domain_Rank      <dbl> 5065, 2222, 3596, 4553, 5681, 5...
## $ SIMD16_Crime_Domain_Rank       <dbl> 5519.0, 6310.0, 6134.0, 6934.5,...
## $ SIMD16_Housing_Domain_Rank     <dbl> 3248.0, 3486.0, 5342.0, 4394.5,...
## $ Population                     <dbl> 904, 830, 694, 573, 676, 781, 5...
## $ Working_Age_Population_revised <dbl> 605, 491, 519, 354, 414, 464, 3...
## $ URclass                        <dbl> 3, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3...
## $ URname                         <chr> "Accessible Small Towns", "Acce...
## $ IZcode                         <chr> "S02001236", "S02001236", "S020...
## $ IZname                         <chr> "Culter", "Culter", "Culter", "...
## $ LAcode                         <chr> "S12000033", "S12000033", "S120...
## $ LAname                         <chr> "Aberdeen City", "Aberdeen City...
## $ HBcode                         <chr> "S08000020", "S08000020", "S080...
## $ HBname                         <chr> "Grampian", "Grampian", "Grampi...
## $ MMWcode                        <chr> "S13002484", "S13002484", "S130...
## $ MMWname                        <chr> "Lower Deeside Ward", "Lower De...
## $ SPCcode                        <chr> "S16000076", "S16000076", "S160...
## $ SPCname                        <chr> "Aberdeen South and North Kinca...

I only need a fraction of this

la_lookup %<>% 
  select(LAname) %>% #Keep only the LAname variable
  distinct #Keep unique names 

#Change Western Isles nomenclature to match the shapefile
la_lookup$LAname <- str_replace(la_lookup$LAname, "Na h-Eileanan an Iar", "Na h-Eileanan Siar")

la_lookup
## # A tibble: 32 x 1
##    LAname               
##    <chr>                
##  1 Aberdeen City        
##  2 Aberdeenshire        
##  3 Angus                
##  4 Argyll and Bute      
##  5 Clackmannanshire     
##  6 Dumfries and Galloway
##  7 Dundee City          
##  8 East Ayrshire        
##  9 East Dunbartonshire  
## 10 East Lothian         
## # ... with 22 more rows

Shape file

The shape file has every local authority in the UK - I just want Scottish Local Authorities. Here I also rename the variable lad17nm to council. This is the variable name I tend to use for Local Authorites in my anlayses and so when I join other datasets to the shapefile it will be “by” this variable.

The shapefile imported factor labels for the council and lad17cd variables - sometimes over 200 levels (as the LA names can be associated with datazone or lower output area codes as well). For now I am going to convert these to character variables as I don’t need these labels and it could cause problems with differing datasets.

scotland_la %<>%
  filter(lad17nm %in% la_lookup$LAname) %>% #keep Scot LAs only 
  rename(council = lad17nm) %>%  #rename what will be the "joining" variable to "council"
  select(lad17cd, council, geometry) %>% #drop unneeded variables
  map_at(c(1,2), as.character) %>% #coerce factor variables to character
  as_data_frame() #coerce the whole object to a data_frame
## Warning in out[sel] <- map(.x[sel], .f, ...): number of items to replace is
## not a multiple of replacement length

A quick look at where we are now…

scotland_la
## # A tibble: 32 x 3
##    lad17cd  council                                               geometry
##    <chr>    <chr>                                       <MULTIPOLYGON [m]>
##  1 S120000~ Clackmannansh~ (((290589.8 703662.9, 290646.3 703525.9, 29091~
##  2 S120000~ Dumfries and ~ (((275161 620001.4, 275287.5 619798.8, 275443.~
##  3 S120000~ East Ayrshire  (((242992.7 655076.7, 243017.8 655070.1, 24303~
##  4 S120000~ East Lothian   (((351056.7 686132.9, 351068.9 686128.7, 35107~
##  5 S120000~ East Renfrews~ (((251681.9 660804.6, 251669.5 660689.8, 25166~
##  6 S120000~ Na h-Eileanan~ (((151850.5 966476.8, 151852.3 966468.9, 15186~
##  7 S120000~ Falkirk        (((287873.8 691904.6, 287889.9 691892.6, 28791~
##  8 S120000~ Fife           (((345087.6 729351.6, 345086.9 729327.9, 34509~
##  9 S120000~ Highland       (((320275.2 976744.3, 320281.5 976741.6, 32029~
## 10 S120000~ Inverclyde     (((224275.2 677939.5, 224281.1 677926, 224286.~
## # ... with 22 more rows

Alternative names

One of the most annoying things about working with public data is the differing nomenclature that can be used for Local Authorities. As you can see above, I already had to change the name of “Na h-Eileanan an Iar” to “Na h-Eileanan Siar” so it could be joined to the shapefile1. I have seen a third variation of this also. Furthermore, some offcial statistics are published with “Edinburgh, City of” as oppposed to what you can see in the table above. Throw in variaions with “&” and “and” and you can see the picture starts to become messy!

What I want to do is add rows with as many of these combination I can think of and copy the geometry from version that is in the shapefile. This way I should be able to left_join any dataset without having to rename indiviudal Local Authority names.

The geometry variable in scotland_la has a large list for every council with all the data required to draw the map. Duplicating is a little tricky but what I am going to do is extract the geometry for the councils that I am going to add additional variables for…

eilean_siar <- 
  scotland_la %>% 
  filter(council == "Na h-Eileanan Siar") %>% 
  .$geometry

edinburgh <- 
  scotland_la %>% 
  filter(council == "City of Edinburgh") %>% 
  .$geometry

dumfries <- 
  scotland_la %>% 
  filter(council == "Dumfries and Galloway") %>% 
  .$geometry

perth <-
  scotland_la %>% 
  filter(council == "Perth and Kinross") %>% 
  .$geometry

argyll <- 
  scotland_la %>% 
  filter(council == "Argyll and Bute") %>% 
  .$geometry

…now I can simply add rows to the dataframe with alternative names and plug in the extracted geometry variables where needed. Because all this wrangling has converted the scotland_la object to a data_frame under the hood, the last line in this chunk of code coerces it back to an sf object that can be mapped.

scotland_la %<>% 
  as_data_frame() %>% 
  add_row(lad17cd = "S12000013", 
          council = "Na h-Eileanan an Iar",
          geometry = eilean_siar) %>%
  add_row(lad17cd = "S12000013", 
          council = "Eilean Siar",
          geometry = eilean_siar) %>%
  add_row(lad17cd = "S12000036",
          council = "Edinburgh, City of",
          geometry = edinburgh) %>% 
  add_row(lad17cd = "S12000006",
          council = "Dumfries & Galloway",
          geometry = dumfries) %>% 
  add_row(lad17cd = "S12000024",
          council = "Perth & Kinross",
          geometry = perth) %>% 
  add_row(lad17cd = "S12000035",
          council = "Argyll & Bute",
          geometry = argyll) %>% 
  arrange(lad17cd) %>% 
  st_as_sf()

Just a quick check that this has worked.

scotland_la
## Simple feature collection with 38 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 5463.807 ymin: 530264.5 xmax: 470213.7 ymax: 1220231
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## # A tibble: 38 x 3
##    lad17cd  council                                               geometry
##    <chr>    <chr>                                       <MULTIPOLYGON [m]>
##  1 S120000~ Clackmannansh~ (((290589.8 703662.9, 290646.3 703525.9, 29091~
##  2 S120000~ Dumfries and ~ (((275161 620001.4, 275287.5 619798.8, 275443.~
##  3 S120000~ Dumfries & Ga~ (((275161 620001.4, 275287.5 619798.8, 275443.~
##  4 S120000~ East Ayrshire  (((242992.7 655076.7, 243017.8 655070.1, 24303~
##  5 S120000~ East Lothian   (((351056.7 686132.9, 351068.9 686128.7, 35107~
##  6 S120000~ East Renfrews~ (((251681.9 660804.6, 251669.5 660689.8, 25166~
##  7 S120000~ Na h-Eileanan~ (((151850.5 966476.8, 151852.3 966468.9, 15186~
##  8 S120000~ Na h-Eileanan~ (((151850.5 966476.8, 151852.3 966468.9, 15186~
##  9 S120000~ Eilean Siar    (((151850.5 966476.8, 151852.3 966468.9, 15186~
## 10 S120000~ Falkirk        (((287873.8 691904.6, 287889.9 691892.6, 28791~
## # ... with 28 more rows

Plot

And, at last, I can plot a blank map to see that it looks ok.

scot_plot <-
  scotland_la %>% 
  ggplot() +
  geom_sf() +
  theme(line = element_blank(),
        axis.text = element_blank(),
        panel.grid = element_line(colour = "transparent"),
        plot.caption = element_text(size = 10, colour = "#AAAAAA")) +
  labs(title = "Scotland",
       subtitle = "shown with local authority administrative areas",
       caption = "Contains National Statistics data © Crown copyright and database right 2018\nContains OS data © Crown copyright and database right 2018") 
scot_plot

Save

I now have the object I wanted: scotland_la . I’ll save this as a .rds object which can be imported to any r session with the code load("/path_to_file/scotland_la.rds")

save(scotland_la, file = here("assets/clean_data/scotland_la.rds"))

Session info

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  system   x86_64, mingw32             
##  ui       RStudio (1.1.456)           
##  language (EN)                        
##  collate  English_United Kingdom.1252 
##  tz       Europe/London               
##  date     2018-09-06
## Packages -----------------------------------------------------------------
##  package     * version date       source        
##  assertthat    0.2.0   2017-04-11 CRAN (R 3.5.0)
##  backports     1.1.2   2017-12-13 CRAN (R 3.5.0)
##  base        * 3.5.1   2018-07-02 local         
##  bindr         0.1.1   2018-03-13 CRAN (R 3.5.0)
##  bindrcpp    * 0.2.2   2018-03-29 CRAN (R 3.5.0)
##  broom         0.5.0   2018-07-17 CRAN (R 3.5.1)
##  cellranger    1.1.0   2016-07-27 CRAN (R 3.5.0)
##  class         7.3-14  2015-08-30 CRAN (R 3.5.1)
##  classInt      0.2-3   2018-04-16 CRAN (R 3.5.0)
##  cli           1.0.0   2017-11-05 CRAN (R 3.5.0)
##  colorspace    1.3-2   2016-12-14 CRAN (R 3.5.0)
##  compiler      3.5.1   2018-07-02 local         
##  crayon        1.3.4   2017-09-16 CRAN (R 3.5.0)
##  datasets    * 3.5.1   2018-07-02 local         
##  DBI           1.0.0   2018-05-02 CRAN (R 3.5.0)
##  devtools      1.13.6  2018-06-27 CRAN (R 3.5.1)
##  digest        0.6.16  2018-08-22 CRAN (R 3.5.1)
##  dplyr       * 0.7.6   2018-06-29 CRAN (R 3.5.1)
##  e1071         1.7-0   2018-07-28 CRAN (R 3.5.1)
##  evaluate      0.11    2018-07-17 CRAN (R 3.5.1)
##  extrafont   * 0.17    2014-12-08 CRAN (R 3.5.0)
##  extrafontdb   1.0     2012-06-11 CRAN (R 3.5.0)
##  fansi         0.3.0   2018-08-13 CRAN (R 3.5.1)
##  forcats     * 0.3.0   2018-02-19 CRAN (R 3.5.0)
##  ggplot2     * 3.0.0   2018-07-03 CRAN (R 3.5.1)
##  glue          1.3.0   2018-07-17 CRAN (R 3.5.1)
##  graphics    * 3.5.1   2018-07-02 local         
##  grDevices   * 3.5.1   2018-07-02 local         
##  grid          3.5.1   2018-07-02 local         
##  gtable        0.2.0   2016-02-26 CRAN (R 3.5.0)
##  haven         1.1.2   2018-06-27 CRAN (R 3.5.1)
##  here        * 0.1     2017-05-28 CRAN (R 3.5.0)
##  hms           0.4.2   2018-03-10 CRAN (R 3.5.0)
##  htmltools     0.3.6   2017-04-28 CRAN (R 3.5.0)
##  httr          1.3.1   2017-08-20 CRAN (R 3.5.0)
##  jsonlite      1.5     2017-06-01 CRAN (R 3.5.0)
##  knitr         1.20    2018-02-20 CRAN (R 3.5.0)
##  lattice       0.20-35 2017-03-25 CRAN (R 3.5.1)
##  lazyeval      0.2.1   2017-10-29 CRAN (R 3.5.0)
##  lubridate     1.7.4   2018-04-11 CRAN (R 3.5.0)
##  magrittr      1.5     2014-11-22 CRAN (R 3.5.0)
##  memoise       1.1.0   2017-04-21 CRAN (R 3.5.0)
##  methods     * 3.5.1   2018-07-02 local         
##  modelr        0.1.2   2018-05-11 CRAN (R 3.5.0)
##  munsell       0.5.0   2018-06-12 CRAN (R 3.5.0)
##  nlme          3.1-137 2018-04-07 CRAN (R 3.5.1)
##  pillar        1.3.0   2018-07-14 CRAN (R 3.5.1)
##  pkgconfig     2.0.2   2018-08-16 CRAN (R 3.5.1)
##  plyr          1.8.4   2016-06-08 CRAN (R 3.5.0)
##  purrr       * 0.2.5   2018-05-29 CRAN (R 3.5.0)
##  R6            2.2.2   2017-06-17 CRAN (R 3.5.0)
##  Rcpp          0.12.18 2018-07-23 CRAN (R 3.5.1)
##  readr       * 1.1.1   2017-05-16 CRAN (R 3.5.0)
##  readxl      * 1.1.0   2018-04-20 CRAN (R 3.5.0)
##  rlang         0.2.2   2018-08-16 CRAN (R 3.5.1)
##  rmarkdown   * 1.10    2018-06-11 CRAN (R 3.5.0)
##  rprojroot     1.3-2   2018-01-03 CRAN (R 3.5.0)
##  rstudioapi    0.7     2017-09-07 CRAN (R 3.5.0)
##  Rttf2pt1      1.3.7   2018-06-29 CRAN (R 3.5.0)
##  rvest         0.3.2   2016-06-17 CRAN (R 3.5.0)
##  scales        1.0.0   2018-08-09 CRAN (R 3.5.1)
##  sf          * 0.6-3   2018-05-17 CRAN (R 3.5.0)
##  spData        0.2.9.3 2018-08-01 CRAN (R 3.5.1)
##  stats       * 3.5.1   2018-07-02 local         
##  stringi       1.2.4   2018-07-20 CRAN (R 3.5.0)
##  stringr     * 1.3.1   2018-05-10 CRAN (R 3.5.0)
##  tibble      * 1.4.2   2018-01-22 CRAN (R 3.5.0)
##  tidyr       * 0.8.1   2018-05-18 CRAN (R 3.5.0)
##  tidyselect    0.2.4   2018-02-26 CRAN (R 3.5.0)
##  tidyverse   * 1.2.1   2017-11-14 CRAN (R 3.5.0)
##  tools         3.5.1   2018-07-02 local         
##  units         0.6-0   2018-06-09 CRAN (R 3.5.0)
##  utf8          1.1.4   2018-05-24 CRAN (R 3.5.0)
##  utils       * 3.5.1   2018-07-02 local         
##  withr         2.1.2   2018-03-15 CRAN (R 3.5.0)
##  xml2          1.2.0   2018-01-24 CRAN (R 3.5.0)
##  yaml          2.2.0   2018-07-25 CRAN (R 3.5.1)

  1. My wife’s first language is Scottish Gaelic and she assures me “Na h-Eileanan an Iar” is the grammatically correct way to pronounce “The Western Isles”, however somewhere in government “Na h-Eileanan Siar” has been deemed to be the offical spelling!