Bizarre Insights From Big Data

The Honda Element had many fans, but not enough to keep it alive.

Sometimes unexpected data sources offer big insights.

On Sunday, The New York Times ran my article about Gilad Elbaz, who made a fortune helping Google map the Internet. He is now building a company called Factual, which he hopes will be one of the world’s largest and most accurate repositories of facts.

The idea is to have a lot of data of all kinds on hand, because sometimes unexpected combinations of information can lead to valuable insights.

For example, if you buy a used car, your best bet is an orange one. Data scientists at Kaggle, a pattern recognition start-up in which Mr. Elbaz has invested, have matched previously separate data sets on buyers, colors and after-purchase problems. They figured out that if a car’s original owner chose an odd color, the car was most likely a means of self-expression. That self-identification raises the odds that the owner cared more than usual for the vehicle.

In a similar way, says the Kaggle founder Anthony Goldbloom, the best way to tell if a person is likely to make their flight is if he or she has preordered a vegetarian meal. The psychology of making that trip personal, by knowing your meal is on it, makes you more likely to get to the plane on time.

One more piece of nonobvious information from Kaggle: Smart specialists do not always win. “Every company is bending over backwards to hire a pedigreed Carnegie Mellon Ph.D. to unlock their data,” Mr. Goldbloom says. “The physicists and electrical engineers are better at it a lot of the time. They are grounded in real things, and they don’t obsess over which regression model they should use.” Indeed, a Singaporean actuary is currently leading a Kaggle competition to predict biological responses from molecular information.

Researchers at a San Francisco nonprofit called the Global Virus Forecasting Initiative look at things like blogs and traditional news outlets for clues to disease outbreaks. In the future, the founders say, information from mobile phones might prove useful.

A few years ago I was speaking with the founder of an African mobile phone company, called CellTel. He told me that his company realized that they could predict the location of impending massacres in the Congo, because there were spikes in the sale of prepaid phone cards.

At first the company researchers thought that this was because there were more calls around the planning or fleeing of the massacre. In fact, the reason was that the prepaid cards were denominated in dollars, not the local currency. People, sensing impending chaos, wanted to have something valuable they could carry with them that was protected against local inflation.

We will probably see more strange corollaries start to pop up, as more behavior is stored in online databases. Here are some other unusual insights, courtesy of Kaggle:

– Traffic jams tend to propagate forward as well as backward. There is still no idea why this should be true.

– In the Eurovision Song Contest, it is a pretty safe bet that Israel will vote disproportionately for Belarus. There was a mass migration from that area under the Soviet Union, and its people still apparently get sentimental for home.

– If you watch a movie that ends in a number, you will probably think less of it than if it had a different title. Numbers usually indicate sequels, and most sequels are bad.

– You can generally predict the quality of online photos by the words in their captions. Higher-rated photos had the words Peru, Cambodia, Michigan, tombs, trails and boats. The photo captions most likely to signify an uninspiring photo are San Jose, mommy, graduation and C.E.O.

On second thought, sometimes the data will yield insights that are not so surprising. Sorry, San Jose.