Predicting the second half of the Democratic primary

I’m a statistician and I support Bernie Sanders.

So this is how I spent part of my Saturday.

I gathered data from sources including the US census, Wikipedia, WolframAlpha, and some others (the Facebook map data from FiveThirtyEight, Google trends data using the gtrendsR package). Using this data–and not any polls–I built a predictive model in a standard fashion using ridge regression with cross-validation to choose the level of regularization. Since I only had data for 26 states that have voted so far, there is reason to believe (7-fold) cross-validation will not be very stable, so I also averaged the predictions from 100 models randomly generated by using different splits for cross-validation. The plot below shows that the resulting predictions come pretty close to Senator Sanders’s actual share of the vote. (Important note: that is what I’m predicting, vote share, not probability of winning).

already

The individual results for each state are listed below. My apologies for the absurdly large table… WordPress won’t let me change it (clearly they are part of the political establishment).

State Actual Predicted Error
Iowa 49.6 50.8 1.2
New Hampshire 60.4 60.8 0.4
Nevada 47.3 46.7 -0.6
South Carolina 26 25.4 -0.6
Alabama 19.2 22.6 3.4
Arkansas 29.7 31.2 1.5
Colorado 59 60.6 1.6
Georgia 28.2 26 -2.2
Massachusetts 48.7 51.1 2.4
Minnesota 61.7 61.9 0.2
Oklahoma 51.9 46.9 -5
Tennessee 32.4 32.7 0.3
Texas 33.2 33.2 0
Vermont 86.1 82.9 -3.2
Virginia 35.2 36.2 1
Kansas 67.7 62.4 -5.3
Louisiana 23.2 24.5 1.3
Nebraska 57.2 61.5 4.3
Maine 64.2 62.8 -1.4
Michigan 49.8 48.8 -1
Mississippi 16.5 20.2 3.7
Florida 33.3 36 2.7
Illinois 48.7 47.1 -1.6
Missouri 49.4 45.1 -4.3
North Carolina 40.8 38.3 -2.5
Ohio 42.7 46.3 3.6

Before we get to the predictions, let’s also look at how the models weighted each predictor variable. Below I show a boxplot for each predictor showing the weights given to that predictor by each of the 100 models (remember I’m averaging the predictions of these models).

variables2

Several things are worth noting here.

  • The census data I used had variables for portions of workers in all kinds of different industries and I threw most of these away. But I had a suspicion that IndustryFinance and IndustryManufacturing might be important. Apparently I was right about finance: states relying on that industry do not like Bernie. Surprise! Actually, the effect may not be as large as it appears in these models. Among states that have voted so far, the ones with the largest portion of finance industry are Florida and Massachusetts (tied) and the one with the lowest is Vermont. We’ll know more when New York votes on April 19.
  • Age effects: Age18to34 is actually a smaller effect than Age45to55, and this is a little surprising. It might reflect the lower turnout rates among younger voters. And even though Bernie wins millennials by a larger margin, he actually wins people under 55 as well, all other things being equal. To answer TNR’s question “Who is the Hillary voter?” — they are mostly fairly old.
  • None of the available exit poll data has information about Asians, so it is pretty surprising that RaceAsian is an important variable. It’s certainly not the narrative about race that we’ve been hearing.
  • Other things working in Sanders’s favor: education above the high school level, high speed internet access, and surpluses of likes on Facebook.
  • Things working in Clinton’s favor: poverty, high unemployment, lack of higher education, and proportions of population with high income or who are black or old.

And now for the predictions!

State Predicted
Arizona 44.8
Idaho 61
Utah 62.8
Alaska 66
Hawaii 79
Washington 65.5
Wisconsin 65.9
Wyoming 67.2
New York 46
Connecticut 44.6
Delaware 27.1
Maryland 38
Pennsylvania 48.7
Rhode Island 57.7
Indiana 58.7
West Virginia 45.5
Kentucky 49.8
Oregon 75.1
Puerto Rico 44
California 60.6
Montana 73
New Jersey 44.9
New Mexico 61.6
North Dakota 80.2
South Dakota 71.4
D.C. 26.6

These predictions are not great news for Sanders. They would translate to roughly 1923 delegates, not enough to win. Bernie will need to beat these expectations by about 5% across the board in order to win. This is nothing new: FiveThirtyEight has been singing this song since the day after the first Super Tuesday. However, I think there’s reason to still hope. Three reasons.

First, I think the IndustryFinance and RaceAsian effects are probably not as large as they appear based on the previous elections. This means the predictions for California and Hawaii might be a little too high, and the predictions for Delaware, Connecticut, New Jersey, Arizona, and New York might be too low.

Second, in a word: momentum. If Bernie can hold his own in Arizona and win big in Utah and Idaho, which is certainly within reach, he’ll be set up for a long stretch of big wins before New York votes on April 19th. This could yield the all-important media coverage he has been denied since early March (with the one exception of the Michigan upset). Bernie is probably too old for the nickname “comeback kid,” so media people reading this have an action item: think of a better nickname before April 5-9 (Wisconsin and Wyoming, both predicted over 60%).

The last reason is also the only reason I have any hope left for democracy in this country. Billionaire donors and high-rolling campaign bundlers are one thing, an army of volunteers is another. The grassroots supporting Sanders have been growing and improving in organization. We’ve made over 30 million phone calls to voters, and the rate is increasing. When this effort was focused on Michigan it was part of the reason the outcome swung by 20% from the polls. Spread out over 5 states on the last Super Tuesday it wasn’t enough. It’s currently focusing on 3 states 2 of which are already favorable, then another group of 3 which are all favorable, and then one at a time leading up to New York. It will then remain to be seen if our organization has improved enough to handle 5 states at once on April 26th, and 6 on June 7th (including the crux: California).

Taken together, these three things give me hope despite the numbers. And I’m a numbers guy. So Sanders supporters: keep up the great work! It’s gonna take a lot more of it to beat these expectations by large enough margins to win.

Now I will predict my own actions

I predict that I will post again with an update after the votes are tallied next week, and get back to phonebanking and facebanking in the meanwhile.

Speculation: once more western states have voted, the West variable will become less negative and yield higher predictions in California. Also, once New York votes the IndustryFinance variable will probably be less negative.

Something I probably won’t do: aggregate data at the county or congressional district level. I’m pretty sure that would yield much more accurate predictions, but it’s too much work for me. I don’t know the API necessary to scrape Facebook so I entered the state numbers by hand. If someone hired me to do it I would (psst, hey Jeff Weaver…).

Note: A previous version of this post had different results because I had not standardized the predictor variables properly.

Despite lower turnout than 2008, Bernie’s political revolution is still on track

In the first three Democratic primary/caucus contests of 2016 voter turnout was lower than in 2008. Various news outlets have had articles saying this is a bad sign for Bernie’s political revolution, for example this VOX article by Jeff Stein or this MSNBC article by Steve Benen. In this post I present a counter-argument.

Comparison to 2008 is misleading.

It’s true that turnout so far in 2016 has been lower than it was in 2008. However, this is because 2008 was an outlier year.  Consider turnout in the Iowa caucus. If we only look at the years 2008 and 2016, as the aforementioned articles do, it would be a bleak picture for Bernie indeed (and Hillary…). But look what happens when we expand the window to include a few more presidential election years:

qjc3rcs

Years not displayed were either midterms with lower turnout, or 1996 when Bill Clinton ran unopposed. These numbers were taken from Des Moines Public Library. It is clear that 2008 was an outlier, so we should be careful in trying to compare the present. We need to understand why it was an outlier before making conclusions about 2016.

First and most importantly, the 2008 election followed one of the most disastrous presidencies in modern US history. At the end of his term, Dubya was one of the most unpopular presidents ever with an abysmal 22 percent final approval rating. Also, in 2008 the Iowa caucus was held early in January when most college students were still on break. Sure enough, in 2008 voters aged 18-27 were 22% of Iowa Democratic caucus attendees, compared to just 18% in 2016.

In New Hampshire, turnout was about 118,000 in 1988, 154,639 in 2000, 287,557 in 2008, and 250,974 in 2016. This is less damning than Iowa. There is less justification for making comparisons in Nevada because it was recently changed from a primary to an early caucus, so previous years could be different for any number of reasons. At any rate, the 2/3 Nevada turnout cited in the VOX article is also less damning than the Iowa numbers where turnout was only about 60% what it was in 2008.

In summary, 2008 was an outlier for Democratic turnout because of the unpopular Republican president in office at the time. Sure enough, there were only 118,000 Republicans participating in the Iowa caucus in 2008 compared to 180,000 this year. Arguably the fairest comparison for Democratic turnout in Iowa in 2016 was Iowa in 1980 or 2000, the when the incumbent president was a Democrat. In that case, the political revolution effect is yooge, about the same size as the incumbent reversal effect Republicans experienced this year. Even compared to 2004, when Bush was still in office, 2016 had a significantly larger turnout.

The revolution really needs to happen in the general election, not the primary.

As long as Bernie can secure the nomination, it doesn’t matter if turnout in the primaries and caucuses is not at all time record highs. The place where he needs to win big in order to have a shot at implementing his agenda is in the general election. One might worry that disappointing turnout in primaries will lead to disappointing turnout in the general election. That’s a valid concern, so let’s address it by asking the fundamental question: Can Bernie win the general election by a large margin?

My answer: all signs currently point to a resounding yes. First let’s consider what drives high voter turnout, specifically what happened in 2008. The answer is that young people, blacks, and Hispanics all had higher turnout that year. Bernie’s support among young people is higher than Obama’s was in 2008. Some analysts have persistently pointed to Clinton’s high levels of support among non-whites as a sign of trouble for Sanders. It’s true that Bernie will have to win over more non-whites to win the nomination, and I believe he can and will. But it’s naive to suggest that a large margin of support among minorities in primaries for Clinton would translate to the kind of minority turnout that Obama was able to generate in 2008. I don’t know if either candidate will bring as many minorities out to vote because they are both white. But it is abundantly clear that Clinton would have a big millennial problem in the general election.

Another demographic trend that could have a big impact in the general election is the rise of independents. Look at this Gallup polling data:

Registered Republicans have plummeted, Democrats have declined but not as much, and Independents now dominate either group by a large margin. This trend also helps explain why this election cycle has political outsiders like Sanders and Trump doing so well. Clinton is perhaps the most Democratic party establishment figure to run for president in recent history, having picked up more insider endorsements earlier than anyone else:

ClintonEndorsements

Can she draw support from Independents in the general election? I strongly doubt it. Can Bernie? You better believe it. And dangerously, even Donald Trump and Ted Cruz get a much greater proportion of independent voters than Clinton:

In case you missed her in that graph, she’s right below Carly Fiorina. Clinton’s demographic issues with young voters and independents should be enough to make Democrats think twice before choosing her as a nominee. In fact, I would argue they should be downright terrified about her prospects. However, that’s not the point of this post… The demographic story here about Sanders is that he absolutely can carry the general election by a large margin. If he is the nominee, the voters who would have chosen Hillary will mostly still show up and vote for Bernie against Trump, Cruz, or Rubio. Some small minority will remain obsessed over the “socialist” label, and the 1% might be upset about their taxes. But I’m far less worried about those issues, considering Obama has weathered the same labeling and ran on a similar platform of increasing taxes on those with income over 250k, than I am about Clinton’s troubles.

Aside from speculation about demographic issues there’s another way to look at electability: favorability. This chart shows that high net favorability is an excellent predictor of the general election outcome. In fact, the only counter-example was a case where Gore won the popular vote and the outcome was decided by the Supreme Court…

So what does the favorability picture look like for Clinton and Sanders? These graphs show HuffPost’s average of polls over time:

Bernie’s favorability is good and looks like it’s getting better. Hillary’s favorability is bad and looks like it’s getting worse. This isn’t absolute proof that Bernie would fare better than Hillary in the general. But there is still more evidence available in hypothetical matchup polls. In those polls Sanders does better than Clinton against Trump, Cruz, Rubio, and Kasich, and about the same versus Carson. Of course, such polls also have their flaws and are not absolute proof. However, given the earlier discussion of demographics, this paints a consistent picture. Even if it’s optimistic for me to believe Bernie’s political revolution will allow him to enact a significant portion of his agenda, it seems clear that he is at least better positioned to win the general election than Clinton. Considering the Republican field, that should be terrifying to Clinton’s supporters.

One more difference between primaries and the general election: time. Sanders started way, way behind Clinton, and has had to spend all his time catching up. He doesn’t have the advantage of a huge political machine working on his behalf, and still he is keeping up and giving them a run for their money (literally!). If he wins the nomination, his organization will have expanded significantly by that time and he’ll be in a much better position to GOTV nationwide. Arguably, one more reason why turnout is low this year is that the machine which generated Obama-level turnout in 2008 is actually not trying to generate such turnout now because that would hurt their candidate. Again, if Bernie gets the nomination then this machine can do general election GOTV efforts among young people, taking out all the stops.

Low turnout is also a problem for Clinton.

Sanders is proposing a more ambitious agenda so low turnout in the general would be a bigger problem for him. But if the drop in turnout from 2008 reflects poorly on anyone, it should be the political establishment. The Iowa and Nevada state Democratic Parties should be ashamed at how disorganized and chaotic their caucuses were. The DNC and all its superdelegates should reflect on how their actions make voters feel shut out of the decision process.

Focusing on Iowa again as an example, let’s ask which candidate is actually drawing in new voters. In 2008, 43% of Democratic caucus-goers were first-time participants. Given the incredible high turnout that year, and the 4% drop in the age 18-27 group this year, we should expect in 2016 a much lower proportion who never participated before (they probably participated in 2008). In fact, it held steady at 44% this year, and Sanders carried the first-timer group with 59% to Clinton’s 37%. Even the Editorial Board of the NYTimes is feeling this Bern:

The youth vote’s biggest beneficiary by far is Bernie Sanders, who filled venues in Las Vegas with cheering young admirers last week, after winning more than 80 percent of this group in both Iowa and New Hampshire. On Saturday young people made up 18 percent of voters in Nevada’s Democratic caucus, five percentage points more than in 2008.

There is also evidence that Sanders draw out a greater proportion of Latino voters in Nevada. According to the WCV Institute, Latinos made up 19% of the caucus compared to just 13% in 2008, and Sanders won in that demographic by about 8%, larger than Clinton’s margin in the state overall.

In summary, Sanders really is drawing more people into the political process. Clinton has much less of a claim to that, so if anyone in this race is to blame for the drop in turnout since 2008 it’s her.

In conclusion…

Is it really optimistic for me to think Bernie’s political revolution could succeed? Consider how much success he has had already, coming from so far behind, struggling against a monolithic establishment that placed all its weight behind his opponent before he even announced. If he manages to overcome that and win the nomination, imagine what he can do in the general election with the Democratic establishment riding his coattails rather than actively opposing him.

Finding multiple roots of univariate functions in R

In this post I describe some methods for root finding in R and the limitations of these when there is more than one root. As an example consider the following function f(x) = \cos(x)^4 - 4\cos(x)^3 + 8\cos(x)^2 - 5\cos(x) + 1/2

f <- function(x) cos(x)^4 - 4*cos(x)^3 + 8*cos(x)^2 - 5*cos(x) + 1/2

Here the function is plotted and we can see it has two roots on this interval.

A function multiple roots

Perhaps the most widely used root finding function in R is uniroot. However, this function has major limitations if there is more than one root (as implied by its name!).

uniroot(f, c(0,2))
Error in uniroot(f, c(0, 2)) : 
 f() values at end points not of opposite sign

In the case when there are an odd number of roots on the interval, uniroot will only find and return one of them. This is a serious problem arising from the underlying mathematical difficulty of finding multiple roots. If you need to find all the roots, you must call uniroot multiple times and each time specify an interval containing only one root.

c(uniroot(f, c(0,1))$root, uniroot(f, c(1,2))$root)
[1] 0.6293433 1.4478550

The package rootSolve contains a function uniroot.all which attempts to automate this procedure. It divides the search interval into many subintervals (the default is 100) and uses uniroot on those intervals which have a sign change.

uniroot.all(f, c(0,2))
[1] 0.6293342 1.4478556

However, this approach is not guaranteed to work unless the number of subintervals is sufficiently large (which depends on how close together or to the boundary the roots are). I was recently working on an algorithm that required finding multiple roots of many functions. I could not manually plot them or check to see if uniroot.all was using a sufficiently fine grid to capture all the roots. Luckily for me, I was able to transform the functions I was working with into trigonometric polynomials. And this brings us to another one of R’s great root finding functions, polyroot.

Note that the “poly” here does not stand for multiple, but rather for polynomial. Our running example happens to be a trigonometric polynomial (what a coincidence!). If we call polyroot with the coefficients (in order of increasing degree) it will return all roots, including complex valued ones.

coefs <- c(1/2, -5, 8, -4, 1)
roots <- polyroot(coefs)
roots
[1] 0.1226314+0.000000i 0.8084142-0.000000i
    1.5344772-1.639788i 1.5344772+1.639788i

Since the polynomial had degree 4, there are 4 roots. The first two are real. The complex roots correspond to roots of the polynomial x^4 - 4x^3 + 8x^2 - 5x + 1/2 which are not achieved by the trigonometric version because |\cos(t)| \leq 1. As long as we carefully select the correct polynomial roots and transform back to the trigonometric scale, we’ll get all the roots of the original function:

acos(Re(roots[2:1]))
[1] 0.6293434 1.4478554

Another word of caution regarding substitutions: they might be numerically unstable. Careful readers may have noticed the above roots have some different digits depending on which function computed them. This error can be much larger if the substitution rule has a large derivative near one of the zeroes. This was the case for the functions that came up in my research, and I settled on the following scheme which was much more stable and accurate.

  1. Use polyroot to find zeroes of transformed function.
  2. Eliminate infeasible roots that fall outside the domain/range of the substitution rule.
  3. Transform the remaining ones back to the original domain.
  4. Partition domain into intervals each containing one of the transformed roots.
  5. Call uniroot on each interval in the above partition.

The main limitation of this approach is that not all functions can be made into polynomials. Some functions, like sums of powers of \frac{1}{\sqrt{1+x^2}} for example, can be transformed by a trigonometric substitution into a complex polynomial using Euler’s famous formula. But there are still many more functions which cannot be transformed into polynomials by any stretch. And for these we are out of luck. We can only use something like uniroot.all and hope it works.

If you’re aware of other options I haven’t mentioned here, please let me know by commenting or contacting me directly!

Annals of Useless Statistics Theorems

When I learn of statistics theorems I often try to plug some numbers in to see roughly what kind of error rate is attained or how large of a sample size is necessary to get good results. I think this is a good practice for understanding the limitations of the theory. The title of this post may seem inflammatory, but I should clarify that I don’t think “useless” things have no value. A theorem which is practically useless may still be interesting for the mathematical techniques involved. It may show the way to refinements that become increasingly useful. Or it may suggest that a procedure is not unreasonable, and further empirical study can analyze the finite sample performance of said procedure.

All that being said, I also think that because mathematics is more technically difficult than empirical study it gets a bit too much prestige and respect. So, I intend to highlight some examples of theoretical results which are considered high impact, but which are severely limited in their practical usefulness.

My first example comes from a series of recent papers appearing in what is considered the most prestigious journal of statistical theory, the Annals of Statistics. These papers (referenced below) apply Stein’s method type arguments to obtain explicit bounds on the approximation error caused by assuming non-Gaussian data is actually Gaussian. The methods would be practically useful if the error bounds were small enough…

The papers are so technical that the forms of the error bounds are very complicated, so I’m referencing a version in a tutorial by Larry Wasserman instead of the more complicated version in the original papers. If there are n (i.i.d.) data points in d-dimensional space, and we approximate the distance of their sample mean from the true mean (and take the maximum over all coordinates) by assuming the corresponding probability can be computed from a multivariate Gaussian, then the absolute error between these two probabilities P_1 and P_2 is upper bounded by

|P_1 - P_2| \leq \left( \log(dn)^7 / n \right)^{1/8}

Here’s where it gets useless. Suppose we want the error to be less or equal to 1/10, and we are only in d = 2 dimensions. How large does n have to be? Over 4.2 \times 10^{19}.

The power of these theorems is that they give asymptotic results for when d is increasing with n. So for example, suppose d = n^{1/10}. Then the error bound is less or equal to 1/10 if n \geq 4.2 \times 10^{19}, and the corresponding value of d \approx 91.

Again, the theory in these papers is beautiful and probably setting a good direction for further inquiry. But we desperately need to supplement impressive looking math with more empirical study, and perhaps make it the norm for such empirical study to be included in the theoretical papers that make it into the top journals.

References

Larry Wasserman, Stein’s Method and The Bootstrap in Low and High Dimensions: A Tutorial, http://www.cs.cmu.edu/~aarti/SMLRG/Stein.pdf

Victor Chernozhukov, Denis Chetverikov, Kengo Kato (2012). Central Limit Theorems and Multiplier Bootstrap when p is much larger than n. http://arxiv.org/abs/1212.6906

Victor Chernozhukov, Denis Chetverikov, Kengo Kato (2013). Comparison and anticoncentration bounds for maxima of Gaussian random vectors. http://arxiv.org/abs/1301.4807.

What’s the rush?

Subtitle: Research under the advanced stages of capitalism, part 1: hypercompetitive labor markets.

There is a maxim about research (and work in general), often given in the context of writing a thesis: “it’s not a sprint, but a marathon.”

My main question is this: why is research a race, or a competition at all?

I can understand the metaphor of research as exploration. In fact, that’s not just a metaphor, it’s actually true that research is a form of exploration. Exploration yields knowledge of previously unexplored terrain and opens up new frontiers. What happens at the end of a race? A few winners get recognition, everyone else pats themselves on the back for proving their level of fitness by running in a race, and… that’s it.

In my neck of the woods, the “Stanford Data Science Challenge” has been announced and it’s hosted by “LearnFast.io.” It’s a hackathon, which I’m guessing will take place in the span of just hours or a few days. I bring this up only to draw attention to the name “Learn fast.” Why? To maximize the number of hasty mistakes? In contrast, I am currently working on a project in the Stanford DataLab with a team of other students where we are spending weeks getting our hands dirty just preparing the dataset before we can even start analyzing it. This is how any reasonable person would approach an important problem- and if the problem isn’t important, why waste so many peoples’ time with a competition to solve it?

  • Rapid-fire/scattershot research with low quality-control also increases false discoveries, contributing to one of the greatest problems in many scientific disciplines.
  • The library of human knowledge has a information retrieval problem. We need more cross-discipline work, more integration and organization, and all of this requires a lot of patience while people learn to speak the same language and do work that won’t be rewarded with trophies.
  • A rushed work environment stresses people out. This unnecessary stress lowers efficiency (and often productivity as a result), creativity, not to mention quality of life…

There is an obvious one word answer to why all of this occurs: capitalism. The artificially high levels of competition in capitalist labor markets set us all against each other. Instead of collaborating and advancing together, we’re all working almost alone, and might perceive each other mostly as threats or challenges to our own success.

Tagged ,

How to predict anything*

There has been a lot of press ever since the election about how Nate Silver (and others) correctly predicted the election. For whatever reason, I feel compelled to explain the minimal basics of how to go about doing similar predictions. I will make this so simple that only knowledge of arithmetic is required. Here is my recipe:

  1. Make a list of all the outcomes that you care about. For example, do you want to predict if you will pass or fail a class, or do you want to predict what your grade will be? In the first case there are only 2 outcomes (pass/fail), so the prediction will likely be easier. In the second case your prediction gives you more information but will be harder to compute.
  2. Guess the chances of each of those outcomes. You can try to be fair and say all the outcomes are equally likely. Or you can try to use all the information you have, for example by putting a higher chance on failing the class if you already know you had a low grade on some homework. Each of the “chances” should be a number greater than zero but below one, and if you add the chances for all of the outcomes the answer should be one.
  3. Update the chances whenever you learn new information that’s relevant to the outcomes. For example, if you get another homework grade and it’s good then you should make your chance of passing a little bit higher. And remember that all the chances should add to one, so if one goes up then the others have to go down.
  4. Make predictions based on the chances. There’s more than one way to do this–if you only care about which one outcome is the most likely then you should see which of the chances is the largest. You might also want to know about a range of outcomes, like what is the chance that you get at least a B in the class? In that case just add the chance that you get a B and the chance that you get an A.

That should probably be good enough for most peoples’ purposes. What Nate Silver and those other people did was more complicated. Here are a few bonus points, but they require a bit more knowledge than arithmetic (high school math and a little bit of computer programming).

  • Modeling: The above recipe describes a “multinomial” probability model, that is one with a finite list of outcomes that you care about. But maybe there are too many outcomes to list; maybe the outcome is a count like number of goals scored in a game, or maybe the outcomes are numbers that can be ordered like the amount of money that you make by investing in a stock (it would be silly to list outcomes like $0-100, $100-200, etc). There are many standard probability models that are useful depending on the situation. A few of the most common are: Binomial for the number of “successes” in a given number of “independent trials” (e.g. number of heads when tossing a coin 100 times), Poisson for counting the number of times something happens when you know the rate (e.g. if you usually have 5 customers visiting your store every day, what is the chance of having 100 or fewer customers in the next month?), Exponential for how long it will take before something happens (e.g. how long until the next customer visits your store?), and Normal (aka “bell curve”) for almost everything else- especially if the middle outcome is more likely than all other outcomes.
  • Bayes’ formula for updating chances is perhaps one of the most important formulas ever written down. It tells you how to update your model after you learn new information. Depending on your model it might be difficult to calculate the formulas explicitly, but you can always use computer help- which brings us to the next point.
  • Simulation: The models mentioned above are very helpful because they come with standard formulas for all the kinds of predictions you might want to make, like what is the single most likely outcome, or what is the chance of being a certain amount higher or lower than the most likely outcome, and so on. But sometimes your situation is too complicated for any of the simple models listed above, or the assumptions needed to make those models work are not true for your situation (like the “independence” of trials for the Binomial model). In these kinds of cases it can be very helpful to write a computer program that randomly simulates the outcomes many times. For example, maybe I have a good guess of the chances that Obama will win each state in the electoral college, and I want to know what are the chances that he wins the election with at least 300 electoral college votes. Then I could write a computer program that simulations a thousand elections and record the percentage of those simulations in which he won 300 or more votes. Another very useful kind of simulation method was invented in the Statistics department here, it’s called bootstrapping and everybody’s doing it.

If you master all these skills then not only will you be able to predict anything*, but you’ll be able to do it better than everyone else!

* Of course “anything” is an exaggeration. These methods usually work better if you are predicting the kinds of outcomes that happen repeatedly, so there is a history of similar outcomes that suggest a type of model for you to use or allow you to make good guesses about the various chances. Sometimes the outcome is something that you didn’t even consider, like neither team wins a game because the game is canceled due to weather.

Tagged ,

Bangladesh factory fire tragedy

Over 120 workers died in a factory fire in Bangladesh while working over time to meet the demand for cheap goods on Black Friday. The building did not have proper fire escapes. Some died by jumping from the upper floors of the building.

See, worker safety measures like proper fire escapes would have cost more. The sourcing agency (Li & Fung) that buys goods for Wal-Mart already found the cheapest labor in the world, so they couldn’t threaten the Bangladeshi workers by saying they would move the factory to another country. They would have had to pass the cost on to Wal-Mart and then Wal-Mart would have no choice but to pass the cost on to us consumers.

This is late-stage global capitalism. Despite the fact that nobody would knowingly demand cheaper clothes if they knew people would die making them, capitalism finds a way. Endless decentralization, sub-contracting, and outsourcing mean that nobody has any clue who is making what and where and under what conditions. Supposedly this makes things more efficient… but then companies like Apple and Wal-Mart end up spending more money anyway, sending inspectors all over the world so they can claim innocence whenever they get bad publicity like this.

Here are some rational self-interested actors making informed decisions

Here is how Wal-Mart meets their insatiable demand for ever-cheaper sweat pants

Tagged , ,

Universities, the purpose and future of

MOOC stands for massive open online course. MOOCs may or may not drastically change higher education. There are some obvious good aspects to them, like making high quality learning material available to people who otherwise can’t access it. But some of my colleagues who, like me, aspire to be professors some day are worried that MOOCs may put them out of a job. Why should a university hire them to teach a class year after year when students can see recorded lectures instead? And the clincher: these lectures are given by top professors from top universities.

Discussing these concerns lead me to think about the purpose of universities. I’m writing this post to gather my thoughts on that topic, sort of “thinking aloud.” Here are several different views on the purpose of universities and the resulting predictions each view gives about the affect of MOOCs on academia. (The different views vary in descriptive/prescriptiveness)

The classical economics view (circa the World Wars) is that universities train the labor force and conduct research that improves technology, both of these things increase productivity (hence GDP) and that’s (supposedly) best for everyone (the underlying utilitarian philosophy or assumptions of economic theory are just not open for discussion). In this case it is certainly true that the work force could be trained with far fewer professors than we currently have (if for no other reason than that we can cut out subjects/departments like philosophy which don’t improve worker productivity), so we aspiring academics should despair. I don’t find this view very convincing, especially the part about training the labor force. I think the vast majority of college graduates will end up working in positions that are not closely related to their degree and could have learned the relevant skills on the job.

The technological superiority view (circa the Cold War) is a modification of the above which places less emphasis on teaching and more on research. Here the purpose of universities is to advance technology, giving their host nations an advantage in arms races or space races, or improving medicine so we can all live longer, etc. In this view MOOCs have basically no negative affect on teacher employment. The number of years of required education is increasing as domains of knowledge become deeper. So even if all the intro courses are MOOCs we will still need teachers for upper level, highly specialized courses. And the number of specializations is growing fast. I find this view deficient because better technology doesn’t always leave us better off. Remember that time we almost wiped each other out with cutting-edge technology? I also have the opinion that much research is a waste of time, not because it isn’t eminently useful, but because it’s actually low quality scholarship done mostly for the sake of increasing the number of publications.

The humanist view (circa before the World Wars, but humanities profs/majors will never let go) is what most liberally-minded people want to believe about their own reason for going to college. “Higher education” doesn’t just mean it’s a level above “high” school; it means our minds or spirits or quality of life (or whatever) are improved by learning. People with this view usually espouse “liberal arts” education, because even if you’re studying to be an engineer you should take an art class and learn to better appreciate the arts because that improves you as a human being (it might even inform your choices as an engineer, e.g. Steve Jobs and the Apple aesthetic). MOOCs are either bad because they are placing even less emphasis on humanities (it’s difficult to grade things like long papers, art projects, etc, in an MOOC format), or they are good because they are making education free and more available.  I am very sympathetic to this view, if for no other reason than that I am a contrarian at an engineering school. Some people with this view (but not all) tend to undervalue the other benefits of universities like scientific research.

Before I offer my own view, notice some things about universities that none of the views above explain. None of those views mention anything about maximizing the university endowment, increasing the prestige of the school, or anything like that. However, all universities behave (organizationally) as though those types of things are their most important goals. This is despite the fact that most universities explicitly state in their charters that they exist to serve the betterment of humanity or something like that. The school could have more endowment money than it knows what to do with, donations constantly rolling in, and tuition rates already scheduled to increase, and they will still re-re-subcontract their custodial service, firing all the janitors and re-hiring them at an even lower wage. Why? (The cheap and easy explanation is that many university administrations are business school graduates and they are simply behaving the way they learned to behave in business school)

Also note that the demand for education from “top” schools is much larger than those schools meet. Every year they receive far more qualified applications than they admit. Most of them could easily spend part of their large endowments to expand and accommodate more students, and also create more academic jobs in the process. Why don’t they?

My own view (descriptive) is that universities serve all the purposes above to varying degrees, but they are also a pyramid scheme of  cushy jobs and fierce guardians of elitist credentials. They provide security and leisure to the class of people who can succeed at academia. The pyramid-structure (both within and between schools) enhances prestige through intense competition at the lowest levels, and prestige is important because it justifies the credential elitism. MOOC certificates can never compete with actual degrees precisely because they are open (so they fail at being elitist). MOOCs may take jobs away from the people with less job security- adjuncts, for example. But they will not be allowed to threaten the job security of the people with cushy jobs, because those cushy jobs are one of the main points of the entire system. And though (top) universities could provide more cushy jobs by using their large endowments to grow, that might dilute their credentials. So my theory also explains why they don’t do that- they are protecting their elite status and therefore the class of people who already succeed in the system.

I am a big fan of cushy jobs, but elitism makes me sad (if I didn’t benefit from it, it would probably make me more mad than sad). I hope that the humanist view becomes more popular (it’s close to what I think the purpose of universities should be), because I think the lessons of great literature, for example, will help us choose systems that work much better for all of us. I want everyone to have nice jobs, not work too much, and have more time to spend enriching their lives (by taking MOOCs, for example). And I think these goals are realistic (they might all be accomplished by just having shorter work weeks).

Tagged , ,