For the data visualization project I made for the International Challenge BBVA I was in the situation of having a dataset with a large list of countries from whom I need their corresponding latitudes and longitudes. The list contains almost all the countries of the world, so I needed an automated solution to get this data.

Open Refine suits very well for this kind of tasks, as supports the ability to fetch JSON from any web service based on values in a OpenRefine project and create a new column out of it. So all we need is to use the existing data we already have and send it against a webservice. For that case we will use the Google’s Geocoding service.So the task described above can be accomplished as follows:

After importing your dataset to Open Refine, select the column with the country names and select “Edit Column” > “Add column by fetching URLs…”

operefine01

Add a name for the new column and a throttle delay between each call(we will be fetching the url for every row, so be kind and set a minimum delay between each call). In the Expresion field, we populate it with the url of the web service, setting the address parameter with the value of each row. Once you are done, press Ok.
Captura de pantalla 2013-08-30 a la(s) 14.18.21

What you can see in the new columns is the response of the web service in JSON format. So we have to inspect this JSON response, search for the latitude/longitude values and set them into new columns.
Captura de pantalla 2013-08-30 a la(s) 14.19.55



The JSON looks like this:

{
    "results": [{
        "address_components": [{
            "long_name": "Germany",
            "short_name": "DE",
            "types": ["country", "political"]
        }],
        "formatted_address": "Germany",
        "geometry": {
            "bounds": {
                "northeast": {
                    "lat": 55.058347,
                    "lng": 15.0418962
                },
                "southwest": {
                    "lat": 47.2701115,
                    "lng": 5.8663425
                }
            },
            "location": {
                "lat": 51.165691,
                "lng": 10.451526
            },
            "location_type": "APPROXIMATE",
            "viewport": {
                "northeast": {
                    "lat": 55.058347,
                    "lng": 15.0418962
                },
                "southwest": {
                    "lat": 47.2701115,
                    "lng": 5.8663425
                }
            }
        },
        "types": ["country", "political"]
    }],
    "status": "OK"
} 

In JSON notation, the values are accesible as:

results[0].geometry.location.lng
results[0].geometry.location.lat

Open Refine is able to parse json data, so now we can create now columns by selecting the column just created and selecting “EditColumn” > “Add column based on this column”:
Captura de pantalla 2013-08-30 a la(s) 14.38.31

On the new window, just set the name of the new column and populate the expresion field with:

value.parseJson().results[0].geometry.location.lng

Captura de pantalla 2013-08-30 a la(s) 14.40.20
As you will see, your already can see in the preview tab the longitudes value that will appear in the new column. So once we are done you will have a new fresh column containing the data:
Captura de pantalla 2013-08-30 a la(s) 14.46.11
Just repeat the last step for the latitude data and you are done! You can now delete the column containing the JSON response as it’s not necessary anymore.

Posted in Geocoding, Open Refine

Have your say

About me

Data Visualization · Interactive · HCI · Open Data · Data Science · Digital Humanities


More info here and here: