Where is it? Geocoding with OpenStreetMap and R

Reading Time: 2 minutes

In previous posts I’ve shared how to use APIs to retrieve webdata from a number of sources including Strava, Wikipedia, and Twitter. Recently I was looking to plot some location data, and wanted to join latitude and longitude to street addresses so they could be plotted in Tableau. The most well known geocoding service is probably Google, but I also found a number of others. I ended up settling on OSM (OpenStreetMap), which isn’t particularly fast (it limits you to one address per second), but it is free and super easy to use.

Retrieving geocoded addressess from OSM is simple and you can find the full details at the helpful Nominatim Wiki. Suppose we want to get the location of the home of Sherlock Holmes, we can simply use the following call:

https://nominatim.openstreetmap.org/search?q=221b+baker+street,+london&format=json&addressdetails=0&limit=1

What we get back will be something like this in JSON format:

[{"place_id":50843439,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright","osm_type":"node","osm_id":3916613190,"boundingbox":["51.5237104","51.5238104","-0.1585445","-0.1584445"],"lat":"51.5237604","lon":"-0.1584945","display_name":"Sherlock Holmes Museum, 221B, Baker Street, Marylebone, Westminster, London, Greater London, England, NW1 6XE, United Kingdom","class":"tourism","type":"museum","importance":0.401,"icon":"https://nominatim.openstreetmap.org/images/mapicons/tourist_museum.p.20.png"}]

You can see embedded in all that information are the values for ‘lat’ and ‘lon’. So we can create a short script in R to loop through a set of addresses and extract the data we need. As an example I am using a sample of 100 Toronto parking tickets that have addresses and we will add the longitude and latitude. First let’s install the packages and define a function to extract the JSON data using the OSM API

### PACKAGES ###
################

library(tidyverse)   # data manipulation
library(jsonlite)    # JSON

### FUNCTIONS ###
#################

nominatim_osm <- function(address = NULL)
{
  if(suppressWarnings(is.null(address)))
    return(data.frame())
  tryCatch(
    d <- jsonlite::fromJSON( 
      gsub('\\@addr\\@', gsub('\\s+', '\\%20', address), 
           'https://nominatim.openstreetmap.org/search/@addr@?format=json&addressdetails=0&limit=1')
    ), error = function(c) return(data.frame())
  )
  if(length(d) == 0) return(data.frame())
  return(data.frame(lon = as.numeric(d$lon), lat = as.numeric(d$lat)))
}

Next, let’s load the address data and prep by defining the lon and lat values.

### Load Sample Data ###
sample_addresses <- read.csv("sample_addresses.csv") %>%
                    mutate(lon = 0,
                           lat = 0)

Finally we can loop through each addresses and get the longitude and latitude using our new function. Note we add the city and province to the address for the API call, and we also add a 1 second delay to comply with OSM guidelines.

### Add Geocoding ###
for (i in 1:dim(sample_addresses)[1]) {
  print(i)
  long_lat <- nominatim_osm(paste0(sample_addresses$location2[i],", Toronto, ON"))
  Sys.sleep(1)  # ensure 1 second between API call as per OSM guidelines
  if (dim(long_lat)[1] != 0) {
    sample_addresses$lon[i] = long_lat$lon
    sample_addresses$lat[i] = long_lat$lat
  }
}

Simple as that, here’s a screenshot of our final dataframe.

As usual, I’ve put all the code and data on my Github. Thanks for reading!