7 Oct 2016
The Signal and the Noise, by Nate Silver, is a book about statistical predictions and forecasts. The primary question of the book is: how do we make predictions that capture more signal and less noise in a data set? This theme is similar to N. N. Taleb’s project in Anti-Fragile: how do we make decisions in a world we can’t fully understand? Silver’s answer to these questions involve a combination of empirical observations, statistical analysis that is appropriate to the data and question under review, and an answer that provides as precise and accurate a prediction about future events as possible. Avoiding overconfidence, over-fitting one’s model to the data set, and making too vague a prediction due to insufficient analysis are mistakes that Silver recommends we avoid.
One way to make better predictions is to use Bayes’s Theorem: (xy)/(xy+z(1-x))
f(x) = ————
xy + z(1-x)
This theorem states that the posterior probability, f(x’), which is the probability of an event, x’, occurring after we’ve considered some previous evidence, x, y, & z. x is the initial estimate of the event occurring independent of any evidence. y is the probability of an event occurring if x is true. z is the probability of an event occurring if x is false.
For example, if you find a pair of underwear in your dresser drawer that does not belong to you or your spouse, and you suspect your spouse of cheating on you, one way you could use Bayes’s Theorem to produce a probabilistic prediction about the likelihood of your spouse cheating on you is as follows:
x, the initial estimate that your spouse is cheating — 4% (which is the national average of men cheating on their wives)
y, the probability of underwear appearing if he is cheating — 50% (essentially random, even odds)
z, the probability of underwear appearing if he is not cheaing on you — 5% (is there some other, innocent explanation for the underwear’s appearance?
Using Bayes’s Theorem, our prediction that our spouse is cheating on us goes from 4% to 29%:
f(x’) = (.04 x .5) / ((.04 x .5) + .05(1-.04))
f(x’) = .02 / (.02 + .048)
f(x’) = .2941
Silver’s analysis of predictions and forecasts echoes that of Taleb in Anti-Fragile. Both claim that financial analysts are generally over confident in their abilities to predict financial markets, based on their performance. Both claim that our models for predicting most events aren’t as good as we say they are.
However, Silver notes some interesting exceptions, where our predictions are successful: weather and baseball, and chess. These fields yield to statistical analysis because they have a few common features:
– First, we understand the principles that cause the events pretty well. In other words, we can avoid the problem of mistaking correlation for causation in these situations. Weather, chess and baseball have relatively simple sets of rules that create the complex situations we observe. Consequently, we can use models to predict how these complex situations will evolve with some success.
– Second, there is a long history of recorded observations about these games and the weather, so we have a good data set to use in making our next predictions.
– Third, these phenomena occur regularly and frequently, so we get feedback on our predictions, which allows us to learn from our mistakes.
Phenomena that don’t share these three features — well-understood causes, a large and accurate data set, and frequent events — are harder to predict. The stock market is hard to predict because we don’t understand the complex set of causes that drives the two-dimensional change in a price chart, and extreme price changes don’t happen very often that allow us to test and learn from our predictions. Earthquakes and epidemics are hard to predict because they don’t happen very often and we have difficulty observing their causes: earthquakes are caused by forces hidden deep in the Earth’s crust, and epidemics have complicated generation and transmission paths.
The Signal and The Noise is a good foil to Taleb’s Anti-Fragile. It provides a useful introduction into statistical methods as well as many case studies where statistical analysis succeeds and fails. Taleb’s book focuses on the failures of statistical analysis, instead offering heuristics that allow one to navigate in an uncertain world, and while Silver’s book echoes many of the same heuristics — e.g. prefer long-surviving patterns/events over newer ones, in the absence of convincing evidence to the contrary it is useful to assume the future will be like the past, it is not useful to assume that you are special or unique without convincing evidence to the contrary (a la financial analysts) — Silver also illuminates areas where statistical analysis has succeeded, which is helpful for the beginning analyst to see.
Suggestions for predictions:
– Make probabilistic predictions, not specific ones.
– Don’t be overconfident of your skills or predictions, it’s okay to say you don’t know
– Don’t focus too much on analysis at the expense of understanding your observed events, e.g. how much data do you have to analyze? Are the data linked in a time series, or is each event independent of the others?
– Be willing to change your predictions in light of new evidence.
– Try to be less wrong, as opposed to more right. I.e. Taleb’s “via negativa” epistemic method.