I’m a data scientist working on forecasting, rare-event prediction, and data infrastructure to support them. Over the past decade I’ve worked on client-driven projects in defense, security, academic, and non-governmental sectors. PhD in political science (2012), former US military intelligence officer (Army National Guard). American in Tallinn, Estonia.
On the side, I co-founded and organize PyData Tallinn, which has grown to 600+ members and collaborations with multiple global tech companies in Tallinn and Tartu. I am also data & ML program director for Digit.dev this year (2026).
I’ve learned over the years that I have a particular flavor as a data scientist:
- Paying attention to data quality, feature engineering, and larger design aspects of a problem often has had more value for me than clever modeling. For example, if human annotators can’t agree on how to label data, even a model performing at human level won’t be very accurate. I would usually rather use a well-understood model—e.g. gradient boosting, regularized regression—and spend time on data quality, validation, and sanity-checking predictions, than develop a more complex bespoke model without having time to test it.
- I want my data science code to follow good engineering practices: modular structure, unit and integration tests, type annotations. Lately a lot of my work has been refactoring research code and composing it to build more complex systems, and this is something I enjoy a lot.
- Modeling, design, and data choices should be reasonable and defensible to someone with domain expertise looking at the data or model output. If they are not, that’s a signal to change.
Where you can find me
What else is here
For a list of academic publications, see my research page.
For a period of time, I used to blog.
POLECAT event data: Some resources for the POLECAT event data, which is available at https://dataverse.harvard.edu/dataverse/POLECAT.
CAMEO Event Type Ontology: A web version of the CAMEO event types ontology, from the official PDF codebook on Phil Schrodt’s website.
et1000: the 1,000 most common Estonian words: Estonian is a somewhat boutique language. At the time I did this, you couldn’t find a list of the most commonly used Estonian words online, so I made this.
R packages
I wrote and maintain several open source R packages:
icews: The ICEWS event data consists of more than 270 million event data records extracted from global news stories. The raw data is delivered via dataverse. The icews R package automates the process of keeping an up to date local copy, using either a file- or SQLite-based storage backend.
states: I used to frequently work with global data for independent states. This package has some utility functions for making it easier to work with the two major lists of state system membership, Gleditsch & Ward and COW.
spduration: Implements a time-varying covariate split-population duration regression model for survival data where an unknown portion of the cases are immune from failure. These are sometimes also called cure models.
Bonus trivia
The 5th most interesting thing about me is that at the 2017 MyFitness Madness City Race at the Tallinn Song Festival Grounds, I was, due to a clerical error, part of the best all-female team. (I had gone as a team with my wife and two other women. We have the trophy at home.)