Perhaps the key quote is: “All polls have random sampling error, inherent in relying on a sample of the population. And they all have to figure out who will vote and what their preferences are. The best polls are transparent about this.”
Much of the rest of the article describes the sorts of “adjustments” that pollsters make to convert their raw data into publishable polls.
Their discussion, which describes only the categories of adjustments rather than specifics, thus raises important questions: How much transparency is required for a poll to qualify as transparent? Do the Post’s own polls meet the transparency test their own polling director requires?
AAPOR is upfront about the sort of transparency it has in mind—and it’s not transparency to the public.
The “Transparency Initiative is designed to promote methodological disclosure ... that assists survey organizations. ... [It] is an approach to the goal of an open science of survey research.” AAPOR’s goal is to get professional pollsters to share the lessons they learn about methodology in order to improve future polls. Stale data, while of little interest to the broad public, remains interesting to professionals studying polling, statistics, survey methods, and public opinion.
Nothing is wrong—and a great deal is right—with that goal. AAPOR encourages proprietary organizations to share the lessons they’ve learned, without asking them to reveal data that may still have commercial value. There’s little doubt that allowing pollsters to keep their raw data and adjustments proprietary for a year increases participation.
With that in mind, it’s worth noting that on its way to cheering transparency, the Post referred to two very different phenomena: “random sampling error” and “[figuring] out who will vote.”
Random sampling error is an important component of statistical science. Statistics teaches that the preferences of a representative sample of a large population will approximate those of the entire population. Thus, it’s not necessary to ask everyone in California how they will vote to predict the California vote; a relatively small sampling of Californians will produce a pretty good approximation. Random sampling error refers to the range of actual outcomes that might produce the same “pretty good approximation.”
Because every poll incorporates sampling error, and because such errors are indeed random, it’s sometimes possible to eliminate, or “debias,” them. The simplest way to eliminate random errors is to take many measurements of the same item, then average the results. The prominent RealClearPolitics average of polls follows the logic of this approach; it treats each published poll as a distinct attempt to measure public opinion prone to randomly distributed sampling errors.
Figuring out who will vote is another matter.
As the Post makes clear, “figuring” is more art than science. When and how to adjust is a matter of pollster judgment, not statistical science. While good adjustments can debias raw data, the wrong adjustments can introduce non-random errors—“systematic bias.” Averaging can’t remove systematic biases that arise from human biases rather than from mathematics. If many pollsters draw their judgments from a shared conventional wisdom, their results will show the same bias, and averaging them can’t cancel it.
At the end of the day, polling organizations—as businesses—are well within their rights to withhold raw data and adjustments, and show only what they want the public to see. The suggestion that they’re being “transparent” when they do so, however, is misleading and hypocritical (at best). What the public sees is a polished number combining raw data and pollster judgment.
One consequence of the Transparency Initiative’s push toward sharing is that all pollsters will draw upon the same pooled conventional wisdom. They will thus all show systematic biases in the same direction. Past performance suggests that that direction is to the left—in this case, toward Joe Biden. The few public pollsters deploying unique techniques and boasting superior track records are projecting a different race—one that leans, or in some cases leaps, toward a Donald Trump victory.
Actual answers won’t be known for a few more days. When the dust settles, however, don’t be shocked if systematic errors and pollster bias are responsible for much of Biden’s lead in the polls.