Determine what data files are currently on dataverse, in the local files, or in the local database.

get_db_state(db_path = find_db())

get_dvn_state(icews_doi = get_doi(), server = Sys.getenv("DATAVERSE_SERVER"))

get_local_state(raw_file_dir = find_raw())

Arguments

db_path

Path to SQLite database files.

icews_doi

DOI of the main ICEWS repo on Dataverse, see get_doi()

server

For unit tests only; default is set to dataverse::get_dataset() default.

raw_file_dir

Directory containing raw data files

Value

For get_dvn_manifest, a tibble with the following columns:

  • dvn_repo: "historic" or "weekly", see get_doi()

  • dvn_file_label: the file label on dataverse, possibly non-unique

  • dvn_file_id: the integer file ID on dataverse

  • file_name: the normalized, unique file name, see normalize_label()

For get_local_state and get_db_state, a tibble with columns:

  • file_name: the full source data file name, e.g. "events.1995.20150313082510.tab"; see normalize_label()

Details

The data files (tab-separated files, ".tab") on dataverse that contain the raw event data follow a common format denoting the set of events contained in a file and which version of the event data and/or file dump they correspond to. For example, "events.1995.20150313082510.tab" contains events for 1995 and the version is denoted by the timestamp, "20150313082510".

The download and update functions (update_icews(), download_data()) will recognize which event sets are locally available or still need to be downloaded, and whether any local even sets have been superseded by a new version in dataverse, by using

Examples

# Remote (DVN) state
# get_dvn_state()
#
# Local file state
# get_local_state()
#
# Database state
# get_db_state()