I wrote a quick post the other day, before Biden was declared the winner of the election by the major news agencies. I had a lot on my mind and referred to a few things that I was thinking about and wanted to write more about in the coming days. Some were short-lived, and I’ve lost the desire (need) to think about them more, but others still weigh on me. Perhaps most of all is the polling and how to adjust expectations in the future. I’m not thinking about this as a pollster, because I’m not and never will be. I’m thinking about this as a member of society who relies on polls to give me some semblance of predictability in an unpredictable world. I’m thinking about this as a consumer of information and how to best use that information to inform my decisions and expectations. I’m thinking about this as a lay person, which I absolutely am.
As of now, there are a few things that are clear: First, some polls were wrong, again, and second, we don’t yet know how wrong they were because we’re still waiting for plenty of final vote numbers. But let’s take a snapshot of today and pretend that there’s not much change left to happen and think about what the numbers actually mean.
To start to get some sense of how wrong the polls were and what they got wrong, I checked the numbers at Real Clear Politics. I know that some have criticized RCP for how they calculated their averages, but without having the time or energy to calculate my own averages, I’m going to set that aside and use their numbers for now. My quick “analysis” looked at the states in their “Battlegroud States” drop-down list, but only the states in that list that had calculated RCP averages (they didn’t bother calculating RCP averages in Virginia and others, probably because there was a clearer expectation of a winner and fewer polls to use in the calculation).
What I found was this: as of today, using today’s results from the Associated Press, returned by Google when searching “[state] election results,” a pattern that we saw in 2016 seems to have returned. As I noted in a previous post, a big part of the error in 2016 was an underestimate of support for Trump. The same was true this year. The average error for Biden was 0.78 points (with the RCP average underestimating Biden’s vote share). For Trump, however, the RCP average was off by 3.14 percentage points, again underestimating support for Trump far more than they missed support for Biden.
That’s the average error (not absolute values), which washes out errors that occur in opposite directions. For predicting outcomes, we might like that, but for deciding how accurate the polls were, it could reduce the appearance of error. That can be taken care of by looking at the absolute value of the errors and that finds a 1.33-point error for Biden and 3.14-point error for Trump. Although polling for the non-Trump candidate wasn’t perfect, it was within what would be a normal margin of error, but the error in predicting Trump’s vote share was more than double the error for Biden.
Polling in some states was better than in others. Ohio polls were the worst in their error for Trump support. The final RCP average had Trump at 47.3%, but he ended up with 53.4%. But even there, the error wasn’t huge in predicting Biden’s support. The final RCP average had Biden at 46.3% and he ended up with 45.2% (a 1.1 percentage point error).
A couple states stand out for me, for different reasons. Florida is an obvious one. The RCP average had Biden expected to win 47.9% to 47%. As of today’s count, Biden lost 47.8% to 51.2%. This is a really good example of the theme. The error in Biden’s final vote share was 0.1 percentage points. As a science guy who thinks about numbers and predictions, that seems amazingly close to me. But those same polls predicted 47% for Trump, and his final vote share was (as of today’s count) 51.2%. That’s a 4.2 percentage point error, and doesn’t feel so great to me. It’s not as bad as the error in Ohio, but it’s pretty bad, and perhaps worse because it didn’t accurately predict the winner (which is arguably the more important thing from any of these polls).
There are a couple of possible explanations for this. There’s the “shy Trump voter” idea that was pushed by lots of folks (all seemingly supporting Trump) in the days and weeks before the election. I find that more compelling of an argument now than I did before the election, but I’m still unsure why somebody would be equally shy to say they were going to vote for Trump when talking to a stranger and when answering a computer generated poll. I can kind of imagine the stranger (feeling judged by that stranger might be uncomfortable), but the computer polls looking the same makes me really skeptical. Another possibility is that, as unusual as it might be, nearly 100% of the undecided voters swung to Trump. That’s not as absurd as it might seem with a candidate like Trump. He’s toxic. Even among those who support him, there’s a clear acknowledgment that he’s abrasive. It might be that deciding to vote for an abrasive candidate like Trump is more likely to leave somebody undecided than deciding to vote for a more traditional candidate.
It’s also possible that the real fault is in the likely voter models. These polls, especially the later polls, use screens to rule out responses from unlikely voters. Perhaps the problem for pollsters is that having Trump on the ballot changes who is and who isn’t likely to vote. I don’t know enough about these screens to know if this is a genuine possibility or not, but it sure seems like there’s something special about Trump that makes the polling especially difficult.
And it’s not just polling for Trump, but polling for other elections when Trump is on the ballot. Susan Collins, for instance, was supposed to lose her election by around 4 points (averaging the results at RCP – where no RCP average was calculated). She won by more than 8 points. On the other hand, that may all be some serious confirmation bias because the same voters who gave Susan Collins another term by an 8.8-point margin voted for Biden by an 8.7-point margin. So, it’s hard to say that Trump being on the ballot messed things up so badly for Gideon (Collins’s opponent), but not for Biden.
Why would confirmation bias matter here? Because having polls that can predict outcomes is important to me. It’s important to my sanity. It’s important because I see these folks who do this work like most others who are in the business of predicting things (that are predictable). Weather reporters get a lot of grief for not always being right, but they’re right a lot more than they’re wrong, and they’re working in an area with a lot of uncertainty. Stock market predictors, on the other hand, fall into a whole other category for me. I’m sure there are scientific ways to predict stock market movement, but it sure seems like something that’s simply unpredictable, and trying to predict the unpredictable is mixing oil and water in my worldview.
But confirmation bias notwithstanding, I’m going to run with the idea that polling Trump is just hard. Polling other candidates isn’t all that difficult, except when Trump is on the ballot. As long as he’s not on the ballot, then the polls will be right more often than they’re wrong, and I can sleep well at night knowing that there’s some predictability in this otherwise unpredictable world. Hopefully the ride is, indeed, over and not just getting started. Hopefully.