A week ago I predicted the second half of the Democratic primaries. Six states have voted since then. How did my predictions fare?
Well, not too great. Arizona was the only one that came close. Idaho was way off. I overestimated in Arizona and Hawaii and underestimated the rest. On the other hand, I did call the meaningless “win/lose” status correctly for all six. But those calls were pretty easy, with the possible exception of Arizona. I’m still not sure how much to believe the “actual” outcome in Arizona, given the widespread complaints of voter suppression (possible systematic election fraud?)
Including the six new data points into my analysis and re-running it (with a few other small changes) caused some changes in the variable-importance plot. Let’s see the plot and then break down a few of the changes.
- Date: Momentum? Probably not so much. A string of big wins following a much rougher early half are pretty much guaranteed to produce a positive time trend. I don’t really believe this effect will persist for the rest of the primary season. For that reason, for all predictions of future states I am treating them like the date they vote is today.
- IndustryFinance: Still the largest negative effect. Still the most neglected explanation of this election season?
- RaceBlack now appears relatively stronger, and RaceAsian less so. These estimates are probably closer to their true effects now, given what we know from exit polls. RaceHispanic now appears negative, likely because of Nevada and Arizona going differently from the other western states. How Hispanic people will vote in California, for example, is a very big and important unanswered question.
- Poverty and wealth both seem to favor Clinton, while education beyond the high school level favors Sanders.
- Internet predictors: [high speed] InternetAccess, the share of Facebook likes FBshare, and Google relative search volume in the month before the election now appear stronger than before. One reason, for Google specifically, is that before I used relative search volume during one period of time for all states, whereas now I have limited it to the 30 days before the election. For states that have not voted yet, the time window is from 30 days ago to today. It’s important to note that this predictor might change before each election as candidates run ads in that state and some undecided voters go online to look up their choices (does anyone actually do that? Honest question).
Given the time sensitive nature of the model–now that Google search volume was restricted to specific time periods and the estimated Date trend is large–I’m limiting my predictions to votes occurring in the following one month.
How believable are these? I’m pretty sure Sanders will win Wisconsin and Wyoming, though perhaps not by so large a margin in Wisconsin. After that, I think some things are being influenced too heavily by the IndustryFinance variable. Look at Delaware, that’s just not believable. New York is more than 20 days away. The Sanders campaign just opened offices there. It’s possible the grassroots machine with Eye-of-Sauron level focus on New York for 10 days before its election will change things. That might be reflected in a change in relative Google search volume compared to the current time window.
Sanders supporters should not become complacent. If my predictions for all the remaining states are accurate, Sanders will still lose among pledged delegates by a wide enough margin that even if some superdelegates switch it wouldn’t be enough. Whatever effect the movement had in influencing Michigan, it will need to work even harder for a large win in Wisconsin, and twice as hard again probably just to break even in New York.
(Note: interested parties can find my data and code here on GitHub).