The Brier Score is a major way of “scoring” predictive accuracy. The Brier Score is calculated as “predicted outcome - the actual outcome” squared, and then divided by the number of predictions. The more accurate the predictor, the lower the Brier score.
Consider two predictors trying to predict the following outcomes: whether there will be a recession, whether the Boston Celtics will win the NBA Championship, and whether the Republicans will control the Senate following the upcoming election. Predictor A gives the recession a 20% chance, the Celtics a 10% chance and the Republicans a 70% chance. Suppose none of the three events occur. In that case, the “actual outcome” is 0. So the Brier Score is (⅓)*((.20-0)^2 + (.1-0)^2 + (.7-0)^2) = (⅓)*(.04 + .01 + .49) = (⅓)*.54 = .18. Suppose Predictor B gives the recession a 30% chance, the Celtics a 20% chance and the Republicans a 50% chance. Predictor B’s Brier score is thus (⅓)*((.3-0)^2 + (.2-0)^2 + (.5-0)^2) = (⅓)*(.09 + .04 + .25) = (⅓)*.38 = .127. Even though in “absolute percentages” both were off by 100 points, Predictor B’s Brier Score is lower.
The Brier Score’s appeal is that it is incentive-compatible. Incentive compatibility is a term from mechanism design that means that the system (“mechanism”) incentivizes the optimal behavior. In this case, incentive compatibility means that the optimal behavior of the predictor is to input their true estimates, instead of trying to game the system. Consider an alternative system, where the score is just “on average, how much did you miss by”. So Predictor A’s score would be (⅓)*(.2+.1+.7) = .33 and Predictor B’s score would be (⅓)*(.3+.2+.5) = .33. This system is not incentive-compatible. Suppose one is trying to predict whether the Boston Celtics will beat the Denver Nuggets in their upcoming game you believe that the true probability is 60%. In an optimally designed system, you would input 60% as your prediction. In the simple system above, if you input 60%, and the Celtics win, your Brier score would be 0.4 and your Brier score is 0.6 if they lose. Since they win 60% of the time, the expected Brier score is 0.6*0.4 + 0.4*0.6 = 0.48. Now suppose you just input 100%. If they win, your Brier score is 0. If they lose, your Brier score is 1. Since they lose 40% of the time, your expected Brier score is 0.4. As a result, in the simple “average miss”, the optimal strategy is not to input your true valuation, but instead to predict 100% any time you think the probability is >50%, and 0% if you think the probability is less than 50% (if it’s a true 50/50, you are indifferent between any probability). That is clearly an undesirable system for price-discovery.
Instead, the “squaring” component of the Brier score makes the system compatible. In the previous Celtics example, your expected Brier score from predicting 60% is 0.6*(0.4^2)+0.4*(0.6^2) = 0.24. If you predict 100%, then your expected Brier score is 0.4*(1^2) = 0.4. As a result, it’s better to predict your true value!
For the mathematically inclined, it may be interesting understanding the proof of why the Brier score is incentive-compatible.
Let pbelief be your belief of the probabilityLet pinputtedbe the probability you input
In an ideal system, we want pbelief = pinputted
The predictors' utility function is (for a single market): (outcome - pinputted)2
Since the outcome, in your estimated, with probability pbelief, this can be expanded to:
pbelief(1-pinputted)2 +(1-pbelief)(0-pinputted)2
Since 1-pbelief is the probability the event does not occur
We can simplify the above to: pbelief(1-pinputted)2 +(1-pbelief)(pinputted)2
We can thus expand the equation to:
pbelief(pinputted2-2pinputted+1)+pinputted2(1-pbelief)
Since pinputtedis the choice variable, we differentiate with respect to pinputted and set equal to zero:
2pbeliefpinputted-2pbelief+2pinputted-2pinputtedpbelief=0
The first and fourth terms cancel out so we get:
-2pbelief+2pinputted=0
Rearrange and get 2pinputted=2pbelief. Divide out the 2 and we get that pinputted=pbelief
As a result, the Brier score is incentive-compatible! The optimal choice is to input your true probabilities. This is desirable for predictors since it means that they do not have to think about how to optimize for the system and can focus on their main activity of accurately predicting.
One common criticism of the Brier score is that it is difficult to compare when people are predicting different outcomes. One could easily game the Brier score by predicting hundreds of all-but-certain events (e.g. predicting 100% on “will the sun come up tomorrow'') and getting an ultra-low Brier score compared to a super-forecaster who predicts much more uncertain events. As a result, a Brier score is best when comparing two forecasters who are making predictions on the same set of outcomes.