Bias, in statistics, is a measure of how an estimator systematically differs its true value. It is not merely a measure of accuracy, but rather about systematically “missing” in the same direction.

Consider the following example where you are trying to predict whether a recession will occur that year. Since there has been a recession in 21 of the last 100 years, you always predict “100% NO”. You will be directionally right most of the time (most years do not have recessions), however this is a biased estimator, since if the prediction misses it will consistently miss in the same direction. In contrast, someone who arbitrarily chooses 100% YES 21% of the time and 100% NO 79% of the time will have the same overall accuracy (81%) but their estimator is not biased. In short, bias is not the same thing as accuracy.

Bias is discussed with regards to the bias-variance tradeoff in statistical models. Variance in this context refers to a measure of how sensitive a model is to small changes in the input of the model. Suppose you were trying to predict the number of riders using the New York Subway in a given week. A high-variance, low-bias model would be to just take the previous week’s numbers and use that as the estimate for the next week. This model is high-variance since every one additional rider in the previous week changes the prediction by one–however it is unlikely to be very biased since you are roughly equally likely to miss high as you are to miss low. A low-variance, high-bias model would be to take the average of the previous 100 weeks. In that case, small changes in the input will have little input on the prediction since one more rider in one week barely affects the 100-week average. High-variance, low-bias models are often the product of “overfitting”--you take in so many parameters that small changes can have huge effects. Low-variance, high-bias models are often “underfit” and miss key features of the data. While there is a tradeoff between bias and variance, a good modeler tries to strike a perfect balance–reactive enough to new information without being over-reactive.

More From Capital

No posts found