Monday, April 27, 2009

Efforts to Defeat Roll-Off

For a project that started out in my Bayesian statistics class and has found its way to my research paper on roll-off, I decided to try running a model to test the effectiveness of the NC Coordinated Campaign's efforts to fight roll-off in light of what happened in the Primary. Democrats won up and down the ballot, so it seems obvious to the casual observer that roll-off wasn't an issue in the General. I used non-informative priors on a regression for the Primary race roll-off (% fewer ballots cast in each contest relative to the Presidential contest) and calculated posterior slopes on some relevant variables. The real variable of interest here was minority registration growth rate since January. I then used the posterior estimates from that regression to put priors on a General election model.

This framework provides a strong analysis of the effectiveness of the Coordinated Campaign. It is also a great example of using Bayesian statistics to do something frequentist methods do not have an effective structure to do - examine a dataset with prior beliefs to investigate changes from the prior. The priors for the model being the posteriors from the Primary provides a more rigorous method of establishing the change in effect than could be established otherwise. In quick summary, it seems as though the Coordinated Campaign was successful, as evidenced by the plot below on the Governor's race.
In terms of interpreting this picture, the red distribution is the posterior estimate for the primary. This means that in the primary, in the average county, for every ten newly registered minority voters, three of those votes had disappeared by the time they got to the Governor's race. Now is the point where the savvy reader might interject, "But, Will, roll-off is always less in the General race, so this is not a huge concern." While savvy, that reader is failing to realize that these are slope estimates. With this model we are explaining the variance that exists geographically in roll-off. There is always variance around any mean - no matter what the mean is. Our interest is in what correlates with this variance. 

This brings us to the blue density. This line is what the data in the General election do to the prior. For those unfamiliar with Bayesian analysis, I'll butcher it and say - Primary(Red Line) Slope Estimate + General Election Data = Blue Estimate of General Election Slope. As is readily visible, this parameter is zero for all intents and purposes. It also has a lot of certainty associated with it. Whereas there was a lot of spread on the Primary estimate, there is none here.

In sum my conclusion is that the Coordinated Campaign did an excellent job stamping out the problem of newly registered minority voters falling off the ballot. Well done, Keith, if you're reading this.

Tuesday, April 14, 2009

Roll-Off, Straight Ticket Voting, and African Americans

So I have now finished playing with my data, and have decided I can put myself out there and place some confidence in the following findings. First, I've got a couple of definitions for reference:

AATOcomp – proportion of turnout in a county which is registered as African American
STVrate – proportion of ballots from a county which used the straight-ticket voting option
Roll-off – proportion of decreased cast ballots in a particular race as compared to the Presidential contest

I've distilled my thoughts into five key observations. These will form the basis of the paper that I'm working on with Professor Aldrich. Please feel free to respond with any thoughts in the comments section. These are just the data findings - what I feel like the numbers tell me - not publishable research. I'm working on developing a political theoretical framework on how they all make sense together. I don't expect to come up with a single unifying theory, but I am going to try to make as much sense of it all as I can.

Overall

1. There is a significant positive correlation between AATOcomp and STVrate. This causes issues in examining the effect of AATOcomp on roll-off due to the necessary relationship between STVrate and roll-off.

On Partisan Races

2. Without controlling for STVrate, higher levels of AATOcomp are associated with higher amounts of roll-off in top ticket races but this effect reverses as we go down the ticket and there is more overall roll-off. The effect is significant but weak at the top of the ticket, but becomes much stronger down the ballot (presumably as a result of a higher proportion of the remaining votes coming from STV). 

3. By subtracting STV vote totals from the overall totals, we can examine roll-off among non-STV ballots.* Using this method, we find that the same positive association between AATOcomp and roll-off exists still in the upper ticket. Continuing to control for STVrate, as we progress down the ticket, we see that there becomes no association at all between AATOcomp and roll-off – with one exception. After the judicial contests, there is the Soil and Water Conservation District Supervisor. In this race there is a very strong positive exponential effect between AATOcomp and roll-off (controlled for STV).

On Judicial Races

4. Surprisingly, there is only a weak positive correlation between judicial roll-off and STVrate. All correlations are positive with bivariate regression slope frequentist p-values < .15 but all are > .01 Controlling for the effects of multiple tests, it is difficult to assert this as a strong effect.

5. In the judicial races, there is no correlation between AATOcomp and roll-off. This makes sense when considered in conjunction with the facts that (1) STVrate has a minimal effect on roll-off in the nonpartisan races (see Observation 4) and that (2) this is a continuation of the trend controlled for STVrate as identified in Observation 3.

*It is important to remember here that we can only study the behavior in the aggregate. We can use this method to understand the voting behavior of the county, but not individuals.  

Tuesday, February 3, 2009

Cell Phone Only Population


Brian Schaffner over at Pollster.com provides some interesting insight into the Cell Phone Only population. He ran a multinomial logit model based on Pew Survey data to control for a host of demographic factors to see if preferences were the same. Despite an extraordinarily inclusive list of controls (even party ID), he found there to still be differences among the CPO and landline populations (though not statistically significant if you look at his 90% CIs carefully).

The most interesting thing to me is the staggering difference in sampling errors for each of the groups. Look at how much wider the 90% intervals are for the CPOs than the landliners. I'm not sure what exactly we can take that to be indicative of in the populations, but it certainly is fascinating.

While I want to agree with the results on a gut level, I have some concerns about the structure of his model. When I checked out the STATA output Schaffner provides, one thing that stuck out to me was the p-value of .845 on the age variable predicting the McCain/lean outcome relative to the base of Obama. I'm not sure if there might be some multicolinearity issues here. I'm going to download the data myself and play around some. I'm off to lunch now, but soon I will revisit this. I'll report back later.

Sunday, February 1, 2009

Senate Debate on Defense

Harvard's Social Science Statistics Blog linked to this really cool visualization. It depicts the change in partisan frequency of different terms in Senate debates on defense from 1997-2004. I've posted the video below. The absence of partisanship post-9/11 and then surge around Iraq is really cool to see - even if it is hardly profound.


Saturday, January 31, 2009

America's Most Accurate Pollster

I missed this from Mid-December, but I can't help but comment...

Investor's Business Daily put out an editorial declaring their IBD/TIPP poll the best in the nation - again. They called the popular vote margin to the decimal in 2008 and were the closest (of by 0.3%) in 2004. I extend all due congratulations - particularly in an astute allocation of the undecideds, but I just love this link...

Well they certainly were unique in calling the 18-24 year old vote for McCain 74%-22% in late October. Campus was CRAZY that week...

Saturday, January 24, 2009

Cell Phones and Surveys

Behind the Numbers ran a post by Jennifer Agiesta on cell phone users and surveys that caught my interest. She aggregated their tracking poll data and looked at the pool to study the effect of cell phones on sampling. She makes two fascinating statements that I've seen before, and I take issue with both:

1. The six versus nine point margin difference between the landline only sample and the complete one with cells is not statistically significant.

2. Weighting the landline sample to the age composition of the exit poll takes care of the discrepancy.

Both of these points are flawed. On the first point, I doubt that the difference is insignificant. The daily margin of error might be three points but these numbers are from aggregated tracking. The margin of error has to be pretty small. But even so the argument is flawed. If I didn't know the results but you asked me which candidate's supporters would be under-represented in a landline only sample, I would be a fool not to pick Obama. The margin of error is a function of the random and unbiased variation we can expect from the nature of proper sampling. Since we can predict the direction of effect of not sampling cell phones, it is specious to compare the size of the effect to the unbiased error we can expect to see. The margin works both ways. One day the sample might advantage the Democrat, and the next it could be the Republican. In a given election cycle, the cell vote does not change direction randomly as a bloc (per the currently accepted theories of the cell vote). Adding in the cell sample changed the margin of victory by three percentage points. That's a decent amount.

In sum, error is random, and bias is not. It is senseless to compare one to the other to downplay its importance.

The second point is a dangerous one. It leads to practices that cause inaccuracy. It trends toward the idea that undersampling and weighting to counter is always an acceptable way of correcting for a cell bias. First, putting a particular weight on cell user turnout is a tricky game. Age corrected the discrepancy for McCain-Obama, but what about other candidate pairs, issues and other poll questions? Second, it justifies undersampling that leads to absurd results. Check that link out. McCain beating Obama 74-22 in the 18-24 vote? There's no way.