← Back to UNSimLast validated: 5/22/2026

Methodology & Validation

UNSim's predictions are validated against 181,043 real recorded votes from the UN General Assembly. Here's how the engine works and how it performs.

01 · Performance at a glance

Validation Results

Per-Vote Accuracy

68.2%

123,510 / 181,043 correct

Resolution Outcome

99.7%

Pass/fail prediction

Resolutions Tested

1,077

Sessions 60–74 (2005–2019)

Yes-Vote F1

81.8%

P=83% R=81%

Per-Class Performance

yesF1 = 81.8%

Precision: 83.1%

Recall: 80.6%

TP: 117,925 | FP: 24,001

noF1 = 17.2%

Precision: 16.9%

Recall: 17.6%

TP: 2,487 | FP: 12,224

abstainF1 = 13.8%

Precision: 12.7%

Recall: 15.0%

TP: 3,098 | FP: 21,308

02 · Breakdown by topic

Performance by Issue Area

Arms control and disarmament

70.2%n=47,251

Human rights

60.1%n=42,745

Economic development

70.6%n=29,877

Colonialism

70.2%n=25,870

Palestinian conflict

68.9%n=21,842

Nuclear weapons and nuclear material

76.8%n=13,458

03 · Regional performance

Performance by Regional Group

Asia-Pacific Group

73.6%n=52,714

African Group

84.8%n=46,313

Western European & Others

38.4%n=31,000

Latin American & Caribbean

85.4%n=28,797

Eastern European Group

40.4%n=22,219

WEOG and EEG accuracy is lower because these groups vote No/Abstain more frequently, and the current model (v0.1) has weaker minority-class prediction. This is a known limitation being addressed with topic-specific voting history and bilateral relation modeling.

04 · Under the hood

How the Simulation Engine Works

Position Computation Pipeline

1. Resolution Analysis: AI parses the resolution into policy dimensions (sovereignty, human rights, development, security, environment, decolonization) with weighted emphasis.
2. Ideal Point Alignment (25%): Compares country's empirical left-right position (from Voeten ideal point estimates) against the resolution's aggregate position.
3. Policy Dimension Matching (30%): Weighted dot product between country's 6-dimensional policy profile and the resolution's dimensional emphasis. Dimensions with stronger resolution language contribute more.
4. Topic Voting History (20%): Historical Yes/No/Abstain rates for the country on the resolution's topic categories (6 Voeten issue areas).
5. Bloc Coordination (15%): Two-pass algorithm. First pass computes independent positions; second pass applies peer pressure from bloc partners weighted by bloc cohesion scores.
6. Bilateral Relations (10%): Alliance and rivalry modifiers based on voting similarity patterns. (Planned for v0.2)
7. Vote Decision: Composite score fed through softmax3 to produce probability distribution [P(Yes), P(No), P(Abstain)]. Abstain probability is boosted for countries with weak signals or cross-pressures.

Data Sources

Erik Voeten, "United Nations General Assembly Voting Data": Harvard Dataverse, doi:10.7910/DVN/LEJUQZ. 6,202 roll-call votes, 869,937 individual country-votes, 1946–2019. Provides ideal point estimates and per-resolution voting records.
V-Dem (Varieties of Democracy) v14: Democracy indicators for 202 countries. Used for polyarchy scores, regime classification, and behavioral trait calibration.
UN Digital Library: Official resolution texts and voting records for recent sessions (post-2019) used in targeted validation.
Security Council Veto List: Complete veto history since 1946 for P5 behavioral calibration.

05 · Honest limitations

Known Limitations

Current Weaknesses

No/Abstain prediction is weak (F1 ~14–17%) — minority class problem
WEOG countries poorly predicted — they vote No more often on Global South resolutions
Static ideal points — doesn't capture position drift over time
No bilateral relations model — misses US-Israel alignment, Russia-Syria, etc.
Resolution language not analyzed per-clause — same issue vector for all resolutions in a category
Cannot model last-minute diplomatic pressure or vote trading

Planned Improvements

Per-resolution text analysis → unique policy vectors
Temporal ideal point tracking (yearly drift detection)
Full topic-specific voting history from Voeten data
Bilateral similarity scores from vote-correlation matrices
Knowledge graph with treaty obligations as hard constraints
Clause-level sensitivity analysis (language strength → vote shifts)

06 · Reproducibility

Run It Yourself

# Clone the repo
git clone https://github.com/[your-repo]/unsim-v2
cd unsim-v2

# Build country profiles (193 nations)
npx tsx scripts/build-country-profiles.ts

# Download Voeten/TidyTuesday voting data (870K votes)
mkdir -p data/raw
curl -o data/raw/unvotes.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/unvotes.csv
curl -o data/raw/roll_calls.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/roll_calls.csv
curl -o data/raw/issues.csv https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-23/issues.csv

# Run large-scale validation (181K predictions)
npx tsx scripts/validate-large-scale.ts

# Run targeted validation (6 recent resolutions, manual comparison)
npx tsx scripts/validate-against-real-votes.ts