Citation: Lyle Ungar, Barbara Mellers, Ville Satopää, Philip Tetlock, Jon Baron (2012) The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions. 2012 AAAI Fall Symposium Series (RSS)
Internet Archive Scholar (search for fulltext): The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions
Download: https://www.aaai.org/ocs/index.php/FSS/FSS12/paper/view/5570/5871
Tagged: prediction (RSS)

Summary

Study to answer 3 questions about how experts may best estimate event probabilities:

alone, prediction markets, or teams (Discussion among experts might help or harm (group think) accuracy. Prediction markets zero sum, thus discourage non-price info sharing, but facilitate consensus forming market price. Organizations form teams with belief team estimate will be more accurate.)
with or without training (Even people with statistics degrees shown to follow incorrect heuristics)
what formula to use when combining individual estimates (Many studies show uniform average of forecasts hard to beat)

2000 forecasters presented with dozens of possible events, scored on how close estimates averaged over all days questions open, match actual outcomes. Reported as Brier scores (sum of squares of differences).

Aggregation methods attempted included:

weighting of forcecasters based on forecaster attributes
weighting of forcecasts by recency
transformations of forecasts to push away from 0.5, toward more extreme values

Results:

Probability and scenario analysis training beneficial
Letting forcecasters see each others' forecasts and explanations beneficial
Pushing forecasts away from 0.5 most beneficial

Regarding last result, authors discuss irredeemable uncertainty (shared by group) and personal uncertainty (individual ignorance); aggregation of individual forecasters tend toward 0.5 due to personal uncertainty, methods for accounting for these:

Ask forecasters how uncertain they are, use in weighting
Transform all individual forecasts away from 0.5 before aggregation
Median of forecasts rather than mean

Brier scores (lower better, approximate based on chart):

0.36 pool of less good/involved forecasters, uniformly averaged
0.34 pool of better forecasters, uniformly averaged
0.26 teams
0.25 prediction market
0.23 teams with weighting, exponential decay, and transformation away from 0.5

Conclusions:

Working in groups greatly improves accuracy
Transformation of weighted averages away from 0.5 improves accuracy

The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions

Summary

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools