By Jeffery Scott Mio, Ph.D., California State Polytechnic University, Pomona
The results of the 2016 Presidential Election were a surprise to many, particularly, one might argue, to organizations responsible for polling potential voters to get an accurate estimate of the outcome. While some might view this as a sign that polling is flawed, the issue may be taken up more specifically with how the samples for the polls were drawn rather than the method itself. The discussion that follows aims to elucidate several of the problems with the polling technique used to forecast the results of the 2016 election. This real-life example may serve as a useful demonstration to students about issues that may occur when proper sampling methods are not used, thus resulting in a non-representative sample.
First of all, the polling agencies do not and probably cannot sample a representative sample. They typically poll those who have landline, as opposed to cellular, telephone service. This skews to older people, as many if not most young adults do not have landlines. However, if they poll only those who have cell phones, this will skew to younger people, and older voters will be lost. The same is true with Survey Monkey polls, as this will skew to younger voters because younger people are more comfortable with computers than older people. This method also skews to more urban and suburban people and away from rural people.
Second, not only are any of these methods questionable, there is also the problem of who answers the poll. For example, I have a landline, but I never answer it unless it is from someone I know. If it is a pollster, I will not answer it. So what kind of people answer a pollster? We don't know, and we don't know how representative these people are. Secondarily, pollsters call multiple people at once, and when one person answers the phone, all of the other calls are dropped, so again, what kind of people are answering the poll, and how representative are they?
Third, related to #2, pollsters admit that even if they talk to a live person, many do not answer the polls, so they end up getting only about 10% participation. (By the way, the absolute minimum accepted percentage of respondents acceptable from a scientific perspective is 25%.) Who are these people, and are they representative of the voting public? Those who answer the polls may be good people and are answering honestly, but they simply are not necessarily representative of the voting population. A "sample" is an estimate of the "population," but if the sample is skewed, we have an inaccurate picture of the population. Therefore, when pollsters say that polls are a "snapshot," they may mistakenly be pointing the camera in a wrong direction.
Fourth, the real question is, "How do we sample 'likely' voters when we do not know who is likely to vote?" As it turned out, Trump actually did turn out many first-time voters or people who haven't voted for a long time. On the other hand, Clinton did not excite enough of her voters, and especially because everyone thought that Clinton was going to win, many people did not show up, or younger generations of voters felt free to vote for third-party candidates. If only a very small percentage of those who voted for third-party candidates had voted for Clinton (especially in Pennsylvania, Ohio, Michigan, and Wisconsin), Clinton would now be president.
Finally, as polls indicated, the "undecided" vote was four times higher than in most other elections. Most people read "undecided" and figure that they will break about 50-50, so Hillary's lead will remain the same in the final count. However, history tells us that most undecided voters actually break in one direction. In my estimation, most of the undecided voters were actually those who normally vote Republican, were reluctant to support Trump, but had a difficult time crossing party lines to vote for Clinton. Their indecision was mostly, "Should I vote for Clinton, or should I vote for a third-party candidate (or should I not vote)?" However, when the then-FBI Director, James Comey, announced an evaluation of a new batch of emails, I think that most of the undecideds said, "Oh, I can't deal with more Clinton scandals, so I will hold my nose and vote for my party." Earlier estimates were that Clinton had over 90% of the Democrats, but Trump only had in the low 80% range of Republicans, but in the actual vote count, Trump had over 90%. This tells me that the undecideds came home to the Republican Party.
The bottom line is that polls are supposed to sample a population, and that sample is supposed to be representative of the population. If you do not have a representative sample, your poll will necessarily be inaccurate. Because some actual voters may have been more suspicious of polls, the media, and anything that smacks of tradition, they probably did not answer the polls in sufficient numbers, thus resulting in a biased sample. This is why all of the polls seemed to support the notion that Clinton was going to win, which in fact did not happen. One thing that is accepted by all social scientists is that any one poll may be wrong, but the aggregate of polls are accurate. The problem with that line of thinking is that all of the analysts were blind to the fact that all of the polls were skewed in Clinton's direction, so of course, she would be systematically thought to be the winner.