← Back to UNSimLast validated: 5/22/2026

Methodology & Validation

UNSim's predictions are validated against 181,043 real recorded votes from the UN General Assembly. Here's how the engine works and how it performs.

01 · Performance at a glance

Validation Results

Per-Vote Accuracy
68.2%
123,510 / 181,043 correct
Resolution Outcome
99.7%
Pass/fail prediction
Resolutions Tested
1,077
Sessions 60–74 (2005–2019)
Yes-Vote F1
81.8%
P=83% R=81%

Per-Class Performance

yesF1 = 81.8%
Precision: 83.1%
Recall: 80.6%
TP: 117,925 | FP: 24,001
noF1 = 17.2%
Precision: 16.9%
Recall: 17.6%
TP: 2,487 | FP: 12,224
abstainF1 = 13.8%
Precision: 12.7%
Recall: 15.0%
TP: 3,098 | FP: 21,308
02 · Breakdown by topic

Performance by Issue Area

Arms control and disarmament
70.2%n=47,251
Human rights
60.1%n=42,745
Economic development
70.6%n=29,877
Colonialism
70.2%n=25,870
Palestinian conflict
68.9%n=21,842
Nuclear weapons and nuclear material
76.8%n=13,458
03 · Regional performance

Performance by Regional Group

Asia-Pacific Group
73.6%n=52,714
African Group
84.8%n=46,313
Western European & Others
38.4%n=31,000
Latin American & Caribbean
85.4%n=28,797
Eastern European Group
40.4%n=22,219

WEOG and EEG accuracy is lower because these groups vote No/Abstain more frequently, and the current model (v0.1) has weaker minority-class prediction. This is a known limitation being addressed with topic-specific voting history and bilateral relation modeling.

04 · Under the hood

How the Simulation Engine Works

Position Computation Pipeline

  1. 1. Resolution Analysis: AI parses the resolution into policy dimensions (sovereignty, human rights, development, security, environment, decolonization) with weighted emphasis.
  2. 2. Ideal Point Alignment (25%): Compares country's empirical left-right position (from Voeten ideal point estimates) against the resolution's aggregate position.
  3. 3. Policy Dimension Matching (30%): Weighted dot product between country's 6-dimensional policy profile and the resolution's dimensional emphasis. Dimensions with stronger resolution language contribute more.
  4. 4. Topic Voting History (20%): Historical Yes/No/Abstain rates for the country on the resolution's topic categories (6 Voeten issue areas).
  5. 5. Bloc Coordination (15%): Two-pass algorithm. First pass computes independent positions; second pass applies peer pressure from bloc partners weighted by bloc cohesion scores.
  6. 6. Bilateral Relations (10%): Alliance and rivalry modifiers based on voting similarity patterns. (Planned for v0.2)
  7. 7. Vote Decision: Composite score fed through softmax3 to produce probability distribution [P(Yes), P(No), P(Abstain)]. Abstain probability is boosted for countries with weak signals or cross-pressures.

Data Sources

Erik Voeten, "United Nations General Assembly Voting Data"
Harvard Dataverse, doi:10.7910/DVN/LEJUQZ. 6,202 roll-call votes, 869,937 individual country-votes, 1946–2019. Provides ideal point estimates and per-resolution voting records.
V-Dem (Varieties of Democracy) v14
Democracy indicators for 202 countries. Used for polyarchy scores, regime classification, and behavioral trait calibration.
UN Digital Library
Official resolution texts and voting records for recent sessions (post-2019) used in targeted validation.
Security Council Veto List
Complete veto history since 1946 for P5 behavioral calibration.
05 · Honest limitations

Known Limitations

Current Weaknesses

  • No/Abstain prediction is weak (F1 ~14–17%) — minority class problem
  • WEOG countries poorly predicted — they vote No more often on Global South resolutions
  • Static ideal points — doesn't capture position drift over time
  • No bilateral relations model — misses US-Israel alignment, Russia-Syria, etc.
  • Resolution language not analyzed per-clause — same issue vector for all resolutions in a category
  • Cannot model last-minute diplomatic pressure or vote trading

Planned Improvements

  • Per-resolution text analysis → unique policy vectors
  • Temporal ideal point tracking (yearly drift detection)
  • Full topic-specific voting history from Voeten data
  • Bilateral similarity scores from vote-correlation matrices
  • Knowledge graph with treaty obligations as hard constraints
  • Clause-level sensitivity analysis (language strength → vote shifts)
06 · Reproducibility

Run It Yourself

# Clone the repo
git clone https://github.com/[your-repo]/unsim-v2
cd unsim-v2

# Build country profiles (193 nations)
npx tsx scripts/build-country-profiles.ts

# Download Voeten/TidyTuesday voting data (870K votes)
mkdir -p data/raw
curl -o data/raw/unvotes.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/unvotes.csv
curl -o data/raw/roll_calls.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/roll_calls.csv
curl -o data/raw/issues.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/issues.csv

# Run large-scale validation (181K predictions)
npx tsx scripts/validate-large-scale.ts

# Run targeted validation (6 recent resolutions, manual comparison)
npx tsx scripts/validate-against-real-votes.ts