← Back to UNSimLast validated: 5/22/2026
Methodology & Validation
UNSim's predictions are validated against 181,043 real recorded votes from the UN General Assembly. Here's how the engine works and how it performs.
01 · Performance at a glance
Validation Results
Per-Vote Accuracy
68.2%
123,510 / 181,043 correct
Resolution Outcome
99.7%
Pass/fail prediction
Resolutions Tested
1,077
Sessions 60–74 (2005–2019)
Yes-Vote F1
81.8%
P=83% R=81%
Per-Class Performance
yesF1 = 81.8%
Precision: 83.1%
Recall: 80.6%
TP: 117,925 | FP: 24,001
noF1 = 17.2%
Precision: 16.9%
Recall: 17.6%
TP: 2,487 | FP: 12,224
abstainF1 = 13.8%
Precision: 12.7%
Recall: 15.0%
TP: 3,098 | FP: 21,308
02 · Breakdown by topic
Performance by Issue Area
Arms control and disarmament70.2%n=47,251
Human rights60.1%n=42,745
Economic development70.6%n=29,877
Colonialism70.2%n=25,870
Palestinian conflict68.9%n=21,842
Nuclear weapons and nuclear material76.8%n=13,458
03 · Regional performance
Performance by Regional Group
Asia-Pacific Group73.6%n=52,714
African Group84.8%n=46,313
Western European & Others38.4%n=31,000
Latin American & Caribbean85.4%n=28,797
Eastern European Group40.4%n=22,219
WEOG and EEG accuracy is lower because these groups vote No/Abstain more frequently, and the current model (v0.1) has weaker minority-class prediction. This is a known limitation being addressed with topic-specific voting history and bilateral relation modeling.
04 · Under the hood
How the Simulation Engine Works
Position Computation Pipeline
- 1. Resolution Analysis: AI parses the resolution into policy dimensions (sovereignty, human rights, development, security, environment, decolonization) with weighted emphasis.
- 2. Ideal Point Alignment (25%): Compares country's empirical left-right position (from Voeten ideal point estimates) against the resolution's aggregate position.
- 3. Policy Dimension Matching (30%): Weighted dot product between country's 6-dimensional policy profile and the resolution's dimensional emphasis. Dimensions with stronger resolution language contribute more.
- 4. Topic Voting History (20%): Historical Yes/No/Abstain rates for the country on the resolution's topic categories (6 Voeten issue areas).
- 5. Bloc Coordination (15%): Two-pass algorithm. First pass computes independent positions; second pass applies peer pressure from bloc partners weighted by bloc cohesion scores.
- 6. Bilateral Relations (10%): Alliance and rivalry modifiers based on voting similarity patterns. (Planned for v0.2)
- 7. Vote Decision: Composite score fed through softmax3 to produce probability distribution [P(Yes), P(No), P(Abstain)]. Abstain probability is boosted for countries with weak signals or cross-pressures.
Data Sources
- Erik Voeten, "United Nations General Assembly Voting Data"
- Harvard Dataverse, doi:10.7910/DVN/LEJUQZ. 6,202 roll-call votes, 869,937 individual country-votes, 1946–2019. Provides ideal point estimates and per-resolution voting records.
- V-Dem (Varieties of Democracy) v14
- Democracy indicators for 202 countries. Used for polyarchy scores, regime classification, and behavioral trait calibration.
- UN Digital Library
- Official resolution texts and voting records for recent sessions (post-2019) used in targeted validation.
- Security Council Veto List
- Complete veto history since 1946 for P5 behavioral calibration.
05 · Honest limitations
Known Limitations
Current Weaknesses
- No/Abstain prediction is weak (F1 ~14–17%) — minority class problem
- WEOG countries poorly predicted — they vote No more often on Global South resolutions
- Static ideal points — doesn't capture position drift over time
- No bilateral relations model — misses US-Israel alignment, Russia-Syria, etc.
- Resolution language not analyzed per-clause — same issue vector for all resolutions in a category
- Cannot model last-minute diplomatic pressure or vote trading
Planned Improvements
- Per-resolution text analysis → unique policy vectors
- Temporal ideal point tracking (yearly drift detection)
- Full topic-specific voting history from Voeten data
- Bilateral similarity scores from vote-correlation matrices
- Knowledge graph with treaty obligations as hard constraints
- Clause-level sensitivity analysis (language strength → vote shifts)
06 · Reproducibility
Run It Yourself
# Clone the repo git clone https://github.com/[your-repo]/unsim-v2 cd unsim-v2 # Build country profiles (193 nations) npx tsx scripts/build-country-profiles.ts # Download Voeten/TidyTuesday voting data (870K votes) mkdir -p data/raw curl -o data/raw/unvotes.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/unvotes.csv curl -o data/raw/roll_calls.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/roll_calls.csv curl -o data/raw/issues.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/issues.csv # Run large-scale validation (181K predictions) npx tsx scripts/validate-large-scale.ts # Run targeted validation (6 recent resolutions, manual comparison) npx tsx scripts/validate-against-real-votes.ts