Check set overlap between two state lists / data frames, e.g. prior to merging them.

compare(
  df1,
  df2,
  state1 = "gwcode",
  time1 = "year",
  state2 = "gwcode",
  time2 = "year"
)

report(x)

Arguments

df1

data frame

df2

data frame

state1

(character(1)) Name of the country ID var in df1, default "gwcode"

time1

(character(1)) Name of the time ID var in df1, default "year"

state2

(character(1)) Name of the country ID var in df2, default "gwcode"

time2

(character(1)) Name of the time ID var in df2, default "year"

x

a "state_sets" object produced by compare()

Details

This is a helper for interactively debugging data merges for data that may have slightly different state lists. For example, these differences in case sets could be because of country code differences.

Examples

# df2 has all countries in 2018 but some values in x1 are missing
df1 <- state_panel(2018, 2018, partial = "any")
df1$x1 <- round(runif(nrow(df1))*5)
df1$x1[sample.int(nrow(df1), size = 20, replace = FALSE)] <- NA

# df2 is missing some countries and also has missing values in x2
df2 <- state_panel(2018, 2018, partial = "any")
df2 <- df2[sample.int(nrow(df2), size = 150), ]
df2$x2 <- round(runif(nrow(df2))*5)
df2$x2[sample.int(nrow(df2), size = 20, replace = FALSE)] <- NA

comp <- compare(df1, df2)
comp
#> # A tibble: 6 × 5
#>   case_in_df1 case_in_df2 missval_df1 missval_df2     n
#>         <int>       <int> <fct>       <fct>       <int>
#> 1           1           0 0           unknown        42
#> 2           1           0 1           unknown         5
#> 3           1           1 0           0             116
#> 4           1           1 0           1              19
#> 5           1           1 1           0              14
#> 6           1           1 1           1               1

report(comp)
#> 197 total rows
#> 197 rows in df1
#> 150 rows in df2
#> 
#> 116 rows match and have no missing values
#> 20-2018, 31-2018, 41-2018, 42-2018, 51-2018, 52-2018, 54-2018, 56-2018, 58-2018, 60-2018, and 106 more
#> 
#> 1 rows match but have missing values in both
#> 436-2018
#> 
#> 19 rows match but have missing values in df2
#> 53-2018, 70-2018, 80-2018, 93-2018, 130-2018, 135-2018, 160-2018, 210-2018, 232-2018, 452-2018, and 9 more
#> 
#> 14 rows match but have missing values in df1
#> 94-2018, 340-2018, 343-2018, 344-2018, 349-2018, 483-2018, 500-2018, 590-2018, 690-2018, 731-2018, and 4 more
#> 
#> 42 rows in df1 (no missing values) but not df2
#> 2-2018, 40-2018, 55-2018, 57-2018, 91-2018, 140-2018, 205-2018, 211-2018, 221-2018, 223-2018, and 32 more
#> 
#> 5 rows in df1 (with missing values) but not df2
#> 95-2018, 371-2018, 435-2018, 520-2018, 835-2018