You Know I’m No Good: 2016 Election Polls and Predictive Analytics

While the excitement and emotions of this past election work themselves out in the populace at large, as a writer and contributor to the use of predictive analytics, I find the discussion about “where the polls went wrong” to be of most interest.  This is an important discussion, because the most reliable polling organizations–those that have proven themselves out by being right consistently on a whole host of issues since most of the world moved to digitization and the Internet of Things in their daily lives–seemed to be dead wrong in certain of their predictions.  I say certain because the polls were not completely wrong.

For partisans who point to Brexit and polling in the U.K., I hasten to add that this is comparing apples to oranges.  The major U.S. polling organizations that use aggregation and Bayesian modeling did not poll Brexit.  In fact, there was one reliable U.K. polling organization that did note two factors:  one was that the trend in the final days was toward Brexit, and the other is that the final result was based on turnout, where greater turnout favored the “stay” vote.

But aside from these general details, this issue is of interest in project management because, unlike national and state polling, where there are sufficient numbers to support significance, at the micro-microeconomic level of project management we deal with very small datasets that expand the range of probable results.  This is not an insignificant point that has been made time-and-time again over the years, particularly given single-point estimates using limited time-phased data absent a general model that provides insight into what are the likeliest results.  This last point is important.

So let’s look at the national polls on the eve of the election according to RealClear.  IBD/TIPP Tracking had it Trump +2 at +/-3.1% in a four way race.  LA Times/USC had it Trump +3 at the 95% confidence interval, which essentially means tied.  Bloomberg had Clinton +3, CBS had Clinton +4, Fox had Clinton +4, Reuters/Ipsos had Clinton +3, ABC/WashPost, Monmouth, Economist/YouGov, Rasmussen, and NBC/SM had Clinton +2 to +6.  The margin for error for almost all of these polls varies from +/-3% to +/-4%.

As of this writing Clinton sits at about +1.8% nationally, the votes are still coming in and continue to confirm her popular vote lead, currently standing at about 300,000 votes.  Of the polls cited, Rasmussen was the closest to the final result.  Virtually every other poll, however, except IBD/TIPP, was within the margin of error.

The polling that was off in predicting the election were those that aggregated polls along with state polls, adjusted polling based on non-direct polling indicators, and/or then projected the chances of winning based on the probable electoral vote totals.  This is where things were off.

Among the most popular of these sites is Nate Silver’s FiveThirtyEight blog.  Silver established his bonafides in 2008 by picking winners with incredible accuracy, particularly at the state level, and subsequently in his work at the New York Times which continued to prove the efficacy of data in predictive analytics in everything from elections to sports.  Since that time his significant reputation has only grown.

What Silver does is determine the probability of an electoral outcome by using poll results that are transparent in their methodologies and that have a high level of confidence.  Silver’s was the most conservative of these types of polling organizations.  On the eve of the election Silver gave Clinton a 71% chance of winning the presidency. The other organizations that use poll aggregation, poll normalization, or other adjusting indicators (such as betting odds, financial market indicators, and political science indicators) include the New York Times Upshot (Clinton 85%), HuffPost (Clinton 98%), PredictWise (Clinton 89%), Princeton (Clinton >99%), DailyKos (Clinton 92%), Cook (Lean Clinton), Roth (Lean Clinton), and Sabato (Lean Clinton).

In order to understand what probability means in this context, the polls were using both bottom-up state polling to track the electoral college combined with national popular vote polling.  But keep in mind that, as Nate Silver wrote over the course of the election, that just a 17% chance of winning “is the same as your chances of losing a “game” of Russian roulette”.  Few of us would take that bet, particularly since the result of losing the game is finality.

Still, except for FiveThirtyEight, none of the other methods using probability got it right.  None, except FiveThirtyEight, left enough room for drawing the wrong chamber.  Also, in fairness, the Cook, Rothenberg, and Sabato projections also left enough room to see a Trump win if the state dominoes fell right.

The place that the models failed were in the states of Florida, North Carolina, Pennsylvania, Michigan, and Wisconsin.  In particular, even with Florida (result Trump +1.3%) and North Carolina (result Trump +3.8%), Trump would not win if Pennsylvania (result Trump +1.2%), Michigan (result Trump +.3), and Wisconsin (result Trump +1.0)–supposed Clinton firewall states–were not breached.  So what happened?

Among the possible factors are the effect of FBI Director Comey’s public intervention, which was too soon to the election to register in the polling; ineffective polling methods in rural areas (garbage in-garbage out), bad state polling quality, voter suppression, purging, and restrictions (of the battleground states this includes Florida, North Carolina, Wisconsin, Ohio, and Iowa), voter turnout and enthusiasm (aside from the factors of voter suppression), and the inability to peg the way the high level of undecided voters would go at the last minute.

In hindsight, the national polls were good predictors.  The sufficiency of the data in drawing significance, and the high level of confidence in their predictive power is borne out by the final national vote totals.

I think that where the polling failed in the projections of the electoral college was from the inability to take into account non-statistical factors, selection bias, and that the state poll models probably did not accurately reflect the electorate in the states given the lessons from the primaries.  Along these lines, I believe that if pollsters look at the demographics in the respective primaries that they will find that both voter enthusiasm and composition provide the corrective to their projections. Given these factors, the aggregators and probabilistic models should all have called the race too close to call.  I think both Monte Carlo and Bayesian methods in simulations will bear this out.

For example, as one who also holds a political science degree and so will put on that hat.  It is a basic tenet that negative campaigns depress voter participation.  This causes voters to select the lesser of two evils (or lesser of two weevils).  Voter participation was down significantly due to a unprecedentedly negative campaign.  When this occurs, the most motivated base will usually determine the winner in an election.  This is why midterm elections are so volatile, particularly after a presidential win that causes a rebound of the opposition party.  Whether this trend continues with the reintroduction of gerrymandering is yet to be seen.

What all this points to from a data analytics perspective is that one must have a model to explain what is happening.  Statistics by themselves, while correct a good bit of the time, will cause one to be overconfident of a result based solely on the numbers and simulations that give the false impression of solidity, particularly when one is in a volatile environment.  This is known as reification.  It is a fallacious way of thinking.  Combined with selection bias and the absence of a reasonable narrative model–one that introduces the social interactions necessary to understand the behavior of complex adaptive systems–one will often find that invalid results result.

I’ve Really Got to Use My (Determinism) to think of good reasons to keep on keeping on

Where do we go from here?

Made some comments on the use of risk in estimates to complete (ETC)/estimates at complete (EAC) in a discussion at LinkedIn’s Earned Value Management Group.  This got me thinking about the current debate on free will between Daniel Dennett and Sam Harris.  Jonathan MS Pearce at A Tippling Philosopher, who wrote a book about Free Will, has some interesting perspectives on the differences between them.  Basically the debate is over the amount of free will that any individual actually possesses.

The popular and somewhat naïve conception of free will assumes that the individual (or organizations, or societies, etc.) is an unhindered free agent and can act on his or her will as necessary.  We know this intuitively to be untrue but it still infects our value judgments and many legal, moral, ethical, and societal reactions to the concepts of causality, responsibility, and accountability.  We know from science, empiricism, and our day-to-day experience that the universe acts in a somewhat deterministic manner.  The question is: how deterministic is it?

In my youth I was a fan of science fiction in general and Isaac Asimov’s books in particular.  One concept that intrigued me from his Foundation series was psychohistory, developed by the character Hari Seldon.  Through the use of psychohistory Seldon could determine when a society was about to go into cultural fugue and the best time to begin a new society in order to save civilization.  This line of thought actually had a basis in many post-World War II hypotheses to explain the mass psychosis that seemed to grip Nazi Germany and Imperial Japan.  The movie The White Ribbon explored such a proposition, seeming to posit that the foundation for the madness that was to follow had its roots much earlier in German society’s loss of compassion, empathy, and sympathy.  Perhaps the cataclysm that was to occur was largely inevitable given the conditions, which seemed too small and insignificant by themselves.

So in determining what will happen and where we will go we must first determine where we are.  Depending on what is being measured there are many qualitative and quantitative ways to determine our position relative to society, where we want to be, or any other relative measurement.  As I said in a post in looking at the predictive measurements of the 2012 election as project management, especially in the predictive methodology employed by Nate Silver, “we are all dealt a deck of cards by the universe regardless of what we undertake, whether an accident of birth, our socioeconomic position, family circumstance, or our role in a market, business or project enterprise.  The limitations on our actions—our free will—are dictated by the bounds provided by the deal.  How we play those cards within the free will provided will influence the outcome.  Sometimes the cards are stacked against us and sometimes for us.  I believe that in most cases the universe provides more than a little leeway that provides for cause and effect.  Each action during the play provides additional deterministic and probabilistic variables.  The implications for those who craft policy or make decisions in understanding these concepts are obvious.”

So how does this relate to project management since many of these examples–even the imaginary one–deal with larger systems with much less uncertainty and paucity of data?  Well, we do have sufficient data when we lengthen the timeframe and actually collect data.

Dr. David Christenson in looking at DoD programs determined that CPI at the 20% mark did not change significantly at completion.  Later this observation was further refined by looking at project performance after the disastrous Navy A12 contract had had its remedial effects on project management.  This conclusion provided both a confirmation of the validity of CPI and the EVM methods that undergird it, and the amount of influence that actions have in determining the ultimate success or failure of the project after the foundation has been laid.  Subsequent studies have strengthened the deterministic observation made by Dr. Christensen of project performance.

The models that I have used incorporate technical performance measures with EVM, cost, schedule, and risk in determining future project performance.  But the basis for the determination of future project performance is a measurement of the present condition at a point in time, usually tracked along a technical baseline. Thus, our assessment of future performance is based on where our present position is fixed and the identification of the range of probabilities that are most likely to result.  The probabilities keep us grounded in reality.  The results address both contingency and determinism in day-to-day analysis.  This argues for a broader set of measurements so that the window of influence in determining outcomes is maximized.

So are we masters of our own destiny?  Not entirely and not in the manner that the phrase suggests.  Our options are limited to our present position and circumstances.  Our outcomes are limited by probability.