While the excitement and emotions of this past election work themselves out in the populace at large, as a writer and contributor to the use of predictive analytics, I find the discussion about “where the polls went wrong” to be of most interest. This is an important discussion, because the most reliable polling organizations–those that have proven themselves out by being right consistently on a whole host of issues since most of the world moved to digitization and the Internet of Things in their daily lives–seemed to be dead wrong in certain of their predictions. I say certain because the polls were not completely wrong.
For partisans who point to Brexit and polling in the U.K., I hasten to add that this is comparing apples to oranges. The major U.S. polling organizations that use aggregation and Bayesian modeling did not poll Brexit. In fact, there was one reliable U.K. polling organization that did note two factors: one was that the trend in the final days was toward Brexit, and the other is that the final result was based on turnout, where greater turnout favored the “stay” vote.
But aside from these general details, this issue is of interest in project management because, unlike national and state polling, where there are sufficient numbers to support significance, at the micro-microeconomic level of project management we deal with very small datasets that expand the range of probable results. This is not an insignificant point that has been made time-and-time again over the years, particularly given single-point estimates using limited time-phased data absent a general model that provides insight into what are the likeliest results. This last point is important.
So let’s look at the national polls on the eve of the election according to RealClear. IBD/TIPP Tracking had it Trump +2 at +/-3.1% in a four way race. LA Times/USC had it Trump +3 at the 95% confidence interval, which essentially means tied. Bloomberg had Clinton +3, CBS had Clinton +4, Fox had Clinton +4, Reuters/Ipsos had Clinton +3, ABC/WashPost, Monmouth, Economist/YouGov, Rasmussen, and NBC/SM had Clinton +2 to +6. The margin for error for almost all of these polls varies from +/-3% to +/-4%.
As of this writing Clinton sits at about +1.8% nationally, the votes are still coming in and continue to confirm her popular vote lead, currently standing at about 300,000 votes. Of the polls cited, Rasmussen was the closest to the final result. Virtually every other poll, however, except IBD/TIPP, was within the margin of error.
The polling that was off in predicting the election were those that aggregated polls along with state polls, adjusted polling based on non-direct polling indicators, and/or then projected the chances of winning based on the probable electoral vote totals. This is where things were off.
Among the most popular of these sites is Nate Silver’s FiveThirtyEight blog. Silver established his bonafides in 2008 by picking winners with incredible accuracy, particularly at the state level, and subsequently in his work at the New York Times which continued to prove the efficacy of data in predictive analytics in everything from elections to sports. Since that time his significant reputation has only grown.
What Silver does is determine the probability of an electoral outcome by using poll results that are transparent in their methodologies and that have a high level of confidence. Silver’s was the most conservative of these types of polling organizations. On the eve of the election Silver gave Clinton a 71% chance of winning the presidency. The other organizations that use poll aggregation, poll normalization, or other adjusting indicators (such as betting odds, financial market indicators, and political science indicators) include the New York Times Upshot (Clinton 85%), HuffPost (Clinton 98%), PredictWise (Clinton 89%), Princeton (Clinton >99%), DailyKos (Clinton 92%), Cook (Lean Clinton), Roth (Lean Clinton), and Sabato (Lean Clinton).
In order to understand what probability means in this context, the polls were using both bottom-up state polling to track the electoral college combined with national popular vote polling. But keep in mind that, as Nate Silver wrote over the course of the election, that just a 17% chance of winning “is the same as your chances of losing a “game” of Russian roulette”. Few of us would take that bet, particularly since the result of losing the game is finality.
Still, except for FiveThirtyEight, none of the other methods using probability got it right. None, except FiveThirtyEight, left enough room for drawing the wrong chamber. Also, in fairness, the Cook, Rothenberg, and Sabato projections also left enough room to see a Trump win if the state dominoes fell right.
The place that the models failed were in the states of Florida, North Carolina, Pennsylvania, Michigan, and Wisconsin. In particular, even with Florida (result Trump +1.3%) and North Carolina (result Trump +3.8%), Trump would not win if Pennsylvania (result Trump +1.2%), Michigan (result Trump +.3), and Wisconsin (result Trump +1.0)–supposed Clinton firewall states–were not breached. So what happened?
Among the possible factors are the effect of FBI Director Comey’s public intervention, which was too soon to the election to register in the polling; ineffective polling methods in rural areas (garbage in-garbage out), bad state polling quality, voter suppression, purging, and restrictions (of the battleground states this includes Florida, North Carolina, Wisconsin, Ohio, and Iowa), voter turnout and enthusiasm (aside from the factors of voter suppression), and the inability to peg the way the high level of undecided voters would go at the last minute.
In hindsight, the national polls were good predictors. The sufficiency of the data in drawing significance, and the high level of confidence in their predictive power is borne out by the final national vote totals.
I think that where the polling failed in the projections of the electoral college was from the inability to take into account non-statistical factors, selection bias, and that the state poll models probably did not accurately reflect the electorate in the states given the lessons from the primaries. Along these lines, I believe that if pollsters look at the demographics in the respective primaries that they will find that both voter enthusiasm and composition provide the corrective to their projections. Given these factors, the aggregators and probabilistic models should all have called the race too close to call. I think both Monte Carlo and Bayesian methods in simulations will bear this out.
For example, as one who also holds a political science degree and so will put on that hat. It is a basic tenet that negative campaigns depress voter participation. This causes voters to select the lesser of two evils (or lesser of two weevils). Voter participation was down significantly due to a unprecedentedly negative campaign. When this occurs, the most motivated base will usually determine the winner in an election. This is why midterm elections are so volatile, particularly after a presidential win that causes a rebound of the opposition party. Whether this trend continues with the reintroduction of gerrymandering is yet to be seen.
What all this points to from a data analytics perspective is that one must have a model to explain what is happening. Statistics by themselves, while correct a good bit of the time, will cause one to be overconfident of a result based solely on the numbers and simulations that give the false impression of solidity, particularly when one is in a volatile environment. This is known as reification. It is a fallacious way of thinking. Combined with selection bias and the absence of a reasonable narrative model–one that introduces the social interactions necessary to understand the behavior of complex adaptive systems–one will often find that invalid results result.