Andreas Beger

Data scientist

Code for blank state panel data


Tags: #Country Codes #R

2018-02-28 update: check out the states package on CRAN. Create panel data for independent states, includes the G&W and COW state lists, and some helper functions. Like sfind(700) to quickly check what country 700 is. Here’s the package page.

The short version:

Here is some R code to create arbitrary country-year or country-month data sets reflecting the Gleditsch and Ward or COW state system membership lists, using the cshapes R package.

The longer version:

Anyone who has done some quantitative research in international relations probably has at some point dealt with creating accurate panel data for countries, or at least has heard the term “state system membership”. Basically, at any given point in time, there is some set of independent states in existence, and this set changes as you move through time. States disintegrate into successor states, or two states combine into a new state. To some extent there is standardization in international relations on this matter through the state system membership lists compiled by the Correlates of War project or Gleditsch and Ward, and a lot of other datasets generated by political scientists adhere to these standards.

This is less so the case with other data sources. The Penn World Tables for example provided economic data for both West and East Germany at some point before the 1990 reunification. But if you look now you will find one Germany with a continuous series of data going back to the 50’s. So it seems that in more recent version they have merged the historically separate data to back-code a single German state. It’s probably understandable that economists or government agencies are not as concerned about state system membership as political scientists, but it makes it painful to create accurate datasets when you have to merge several sources together. And so as not to understate things, this usually turns into a tedious and mind-numbing process. Germany, Yugoslavia, Yemen anyone?

So to start with something I know is accurate I usually find myself creating a blank panel data set of the correct country-years or months, to which I then add on other data. To make this process a bit easier I finally got around to writing a function that automates this process, drawing on the cshapes R package, which provides shapefiles for any given point in time that accurately reflect state system membership.

How to run it, after you’ve copied the code from github:

example1 <- state_panel("2000", "2010", by="year")

head(example1[order(example1$ccode, example1$date), ])

This will produce a warning saying that state system membership on June 30th of a year was used for the data, and the following result:

ccode        date           id

188  2 2000-06-30 2000-06-30 2  
379  2 2001-06-30 2001-06-30 2  
571  2 2002-06-30 2002-06-30 2  
763  2 2003-06-30 2003-06-30 2  
955  2 2004-06-30 2004-06-30 2  
1147 2 2005-06-30 2005-06-30 2