Monthly Archives: November 2012

How to predict anything*

There has been a lot of press ever since the election about how Nate Silver (and others) correctly predicted the election. For whatever reason, I feel compelled to explain the minimal basics of how to go about doing similar predictions. I will make this so simple that only knowledge of arithmetic is required. Here is my recipe:

  1. Make a list of all the outcomes that you care about. For example, do you want to predict if you will pass or fail a class, or do you want to predict what your grade will be? In the first case there are only 2 outcomes (pass/fail), so the prediction will likely be easier. In the second case your prediction gives you more information but will be harder to compute.
  2. Guess the chances of each of those outcomes. You can try to be fair and say all the outcomes are equally likely. Or you can try to use all the information you have, for example by putting a higher chance on failing the class if you already know you had a low grade on some homework. Each of the “chances” should be a number greater than zero but below one, and if you add the chances for all of the outcomes the answer should be one.
  3. Update the chances whenever you learn new information that’s relevant to the outcomes. For example, if you get another homework grade and it’s good then you should make your chance of passing a little bit higher. And remember that all the chances should add to one, so if one goes up then the others have to go down.
  4. Make predictions based on the chances. There’s more than one way to do this–if you only care about which one outcome is the most likely then you should see which of the chances is the largest. You might also want to know about a range of outcomes, like what is the chance that you get at least a B in the class? In that case just add the chance that you get a B and the chance that you get an A.

That should probably be good enough for most peoples’ purposes. What Nate Silver and those other people did was more complicated. Here are a few bonus points, but they require a bit more knowledge than arithmetic (high school math and a little bit of computer programming).

  • Modeling: The above recipe describes a “multinomial” probability model, that is one with a finite list of outcomes that you care about. But maybe there are too many outcomes to list; maybe the outcome is a count like number of goals scored in a game, or maybe the outcomes are numbers that can be ordered like the amount of money that you make by investing in a stock (it would be silly to list outcomes like $0-100, $100-200, etc). There are many standard probability models that are useful depending on the situation. A few of the most common are: Binomial for the number of “successes” in a given number of “independent trials” (e.g. number of heads when tossing a coin 100 times), Poisson for counting the number of times something happens when you know the rate (e.g. if you usually have 5 customers visiting your store every day, what is the chance of having 100 or fewer customers in the next month?), Exponential for how long it will take before something happens (e.g. how long until the next customer visits your store?), and Normal (aka “bell curve”) for almost everything else- especially if the middle outcome is more likely than all other outcomes.
  • Bayes’ formula for updating chances is perhaps one of the most important formulas ever written down. It tells you how to update your model after you learn new information. Depending on your model it might be difficult to calculate the formulas explicitly, but you can always use computer help- which brings us to the next point.
  • Simulation: The models mentioned above are very helpful because they come with standard formulas for all the kinds of predictions you might want to make, like what is the single most likely outcome, or what is the chance of being a certain amount higher or lower than the most likely outcome, and so on. But sometimes your situation is too complicated for any of the simple models listed above, or the assumptions needed to make those models work are not true for your situation (like the “independence” of trials for the Binomial model). In these kinds of cases it can be very helpful to write a computer program that randomly simulates the outcomes many times. For example, maybe I have a good guess of the chances that Obama will win each state in the electoral college, and I want to know what are the chances that he wins the election with at least 300 electoral college votes. Then I could write a computer program that simulations a thousand elections and record the percentage of those simulations in which he won 300 or more votes. Another very useful kind of simulation method was invented in the Statistics department here, it’s called bootstrapping and everybody’s doing it.

If you master all these skills then not only will you be able to predict anything*, but you’ll be able to do it better than everyone else!

* Of course “anything” is an exaggeration. These methods usually work better if you are predicting the kinds of outcomes that happen repeatedly, so there is a history of similar outcomes that suggest a type of model for you to use or allow you to make good guesses about the various chances. Sometimes the outcome is something that you didn’t even consider, like neither team wins a game because the game is canceled due to weather.

Advertisements
Tagged ,

Bangladesh factory fire tragedy

Over 120 workers died in a factory fire in Bangladesh while working over time to meet the demand for cheap goods on Black Friday. The building did not have proper fire escapes. Some died by jumping from the upper floors of the building.

See, worker safety measures like proper fire escapes would have cost more. The sourcing agency (Li & Fung) that buys goods for Wal-Mart already found the cheapest labor in the world, so they couldn’t threaten the Bangladeshi workers by saying they would move the factory to another country. They would have had to pass the cost on to Wal-Mart and then Wal-Mart would have no choice but to pass the cost on to us consumers.

This is late-stage global capitalism. Despite the fact that nobody would knowingly demand cheaper clothes if they knew people would die making them, capitalism finds a way. Endless decentralization, sub-contracting, and outsourcing mean that nobody has any clue who is making what and where and under what conditions. Supposedly this makes things more efficient… but then companies like Apple and Wal-Mart end up spending more money anyway, sending inspectors all over the world so they can claim innocence whenever they get bad publicity like this.

Here are some rational self-interested actors making informed decisions

Here is how Wal-Mart meets their insatiable demand for ever-cheaper sweat pants

Tagged , ,