Tuesday, October 6, 2015

Polls, Damn Polls, and Statistics

tl;dr Polling method counts, probably most for potential Liberal supporters, who likely answer robocalls less.

An interesting article showed up on CBC News suggesting that the discrepancy between polls showing either the PCs or the Liberals in the lead in the 42nd Canadian national election may be down to methodology. My mind went right back to the statistical tests I ran for my (Computer Science) PhD thesis, and it seemed like a good time to quickly flex that old muscle and find out if we're really being misled by the pollsters. You see, all the "margins of error" for polls depend on the underlying assumption that you've sampled randomly...but there may be people that only do online polls, or hang up when an automated poll call comes, and they may tend to vote for one party more than another.

CBC has a cool page listing all the polling results from different pollsters (9 in total) for this election. Five pollsters use online methods, 3 use robocalls (IVR), 1 uses live telephone agents. Using a few non-parametric tests (to avoid issues with different margins of error for different polls), we can get an idea if the polling method is biased.  8 out of the 9 pollsters produced numbers in the few days following the September 28th leader's debate, and so we'll be brash and say we're minimizing the risk of including a time bias to our sampling.

The first simple test that came to mind was the Friedman test, in which we can check if ranking of parties across all the different pollsters' results is random (the null hypothesis, H0). There's a cool tool on-line to do this quickly.
*There's a ~1% chance that it was a three way tie just after the Munk Debate (A=PCs, B=NDP, C=Liberals). The PCs are leading across the polls in aggregate (i.e. Group A has largest rank). Note that I used the latest Nanos data (Sample 9, live telephone agent) for this test, but the Nanos rankings don't change for the Sep 28th period and so the result is the same.

Is that better ranking due to the uneven number of polling methods used? At this point we pull out the Kruskal-Wallis test and run it for each party.

Liberals:

Note that since there is only one sample in group C, its value has no effect on the test statistic, so in the end it doesn't matter what I put in that box and I can't say anything about live agent response bias. IVR vs online is another matter though! There is between a 90 and 95% chance (nineteen times out of twenty) that Liberal voters answer substantially more online polls more than PC voters, answer substantially less IVRs, or both.* The latter seems a bit more likely, since the former explanation would require major Liberal overrepresentation in both online polls and the Nanos live agent telephone poll (where they lead). Also, this Master's thesis (thanks to Dennis Stevens for pointing it out) argues that IVR catches less undecided voters, so voters supporting the one and only "right wing" party are more likely to respond.

The test statistics for the Conservatives and NDP do not show a substantial bias in preferring to respond to a particular polling methodology (p > 0.1). Of course, it could be that any party is systematically underrepresented in all the polling methods, we can't tell that from this data. Note that there are plenty of limitations to these tests (e.g. no multiple testing correction), and hopefully this will start a dialog about how much we can trust polling numbers in the new millenium with smartphones, the Web, and jaded call recipients. Rerunning these tests to include all 143 poll results (where you could get a good q-value) in the poll tracker is left as an exercise to the reader, but the more important exercise is to get out and vote on October 19th, for the only poll that really matters.

[start digression]
*Note the warning at the bottom of the table about not trusting a p-value when a group has less than 5 items. Let me digress for a second and say that this awesome on-line tool is of course limited and uses an approximate method unsuitable for small samples. You can calculate an exact p-value for smaller samples using a bootstrap approach in a computer language like R. I started doing this...

Since I'm just doing a quick check here, I don't feel like installing the packages required to calculate an exact p-value, which requires 9! (i.e. 362880) permutations to be run by the way. Luckily, I remembered that there are people who love doing these types of calculations, and I can just lookup exact p-value thresholds for Kruskal-Wallis for groupings of 5, 3, and 1:

So Kruskal-Wallis 4.02 has a 10% false positive rate, and 4.87 has a 5% false positive rate. The Liberals' Kruskal-Wallis falls somewhere between these two.

[end digression]