An interesting article showed up on CBC News suggesting that the discrepancy between polls showing either the PCs or the Liberals in the lead in the 42nd Canadian national election may be down to methodology. My mind went right back to the statistical tests I ran for my (Computer Science) PhD thesis, and it seemed like a good time to quickly flex that old muscle and find out if we're really being misled by the pollsters. You see, all the "margins of error" for polls depend on the underlying assumption that you've sampled randomly...but there may be people that only do online polls, or hang up when an automated poll call comes, and they may tend to vote for one party more than another.
CBC has a cool page listing all the polling results from different pollsters (9 in total) for this election. Five pollsters use online methods, 3 use robocalls (IVR), 1 uses live telephone agents. Using a few non-parametric tests (to avoid issues with different margins of error for different polls), we can get an idea if the polling method is biased. 8 out of the 9 pollsters produced numbers in the few days following the September 28th leader's debate, and so we'll be brash and say we're minimizing the risk of including a time bias to our sampling.
The first simple test that came to mind was the Friedman test, in which we can check if ranking of parties across all the different pollsters' results is random (the null hypothesis, H0). There's a cool tool on-line to do this quickly.
Munk Debate (A=PCs, B=NDP, C=Liberals). The PCs are leading across the polls in aggregate (i.e. Group A has largest rank). Note that I used the latest Nanos data (Sample 9, live telephone agent) for this test, but the Nanos rankings don't change for the Sep 28th period and so the result is the same.
Is that better ranking due to the uneven number of polling methods used? At this point we pull out the Kruskal-Wallis test and run it for each party.
The test statistics for the Conservatives and NDP do not show a substantial bias in preferring to respond to a particular polling methodology (p > 0.1). Of course, it could be that any party is systematically underrepresented in all the polling methods, we can't tell that from this data. Note that there are plenty of limitations to these tests (e.g. no multiple testing correction), and hopefully this will start a dialog about how much we can trust polling numbers in the new millenium with smartphones, the Web, and jaded call recipients. Rerunning these tests to include all 143 poll results (where you could get a good q-value) in the poll tracker is left as an exercise to the reader, but the more important exercise is to get out and vote on October 19th, for the only poll that really matters.
*Note the warning at the bottom of the table about not trusting a p-value when a group has less than 5 items. Let me digress for a second and say that this awesome on-line tool is of course limited and uses an approximate method unsuitable for small samples. You can calculate an exact p-value for smaller samples using a bootstrap approach in a computer language like R. I started doing this...
So Kruskal-Wallis 4.02 has a 10% false positive rate, and 4.87 has a 5% false positive rate. The Liberals' Kruskal-Wallis falls somewhere between these two.