The Netflix prize teams have a bunch of data from customers who have rated films they’ve already seen and they have been challenged to write software that predicts future ratings.
Part of this process is hypothesis testing, essentially an experimental approach to find out what might be important in the decision process. For example, a team might guess that women will rate musicals higher than men. They can then test this prediction out on the data, making further predictions based on past conclusions, theories or even just hunches.
The other approach is to use mathematical techniques that look for patterns in the data. To use the jargon, these procedures look for ‘higher order properties’ – in other words, patterns in the patterns of data.
Think of it like looking at the relationship between different forests rather than thinking of everything as individual trees.
The trouble is, is that these mathematical procedures can sometimes find reliable high level patterns when it isn’t obvious to us what they represent. For example, the article discusses the use of a technique called singular value decomposition (SVD) to categorise movies based on their ratings;
There’s a sort of unsettling, alien quality to their computers’ results. When the teams examine the ways that singular value decomposition is slotting movies into categories, sometimes it makes sense to them — as when the computer highlights what appears to be some essence of nerdiness in a bunch of sci-fi movies. But many categorizations are now so obscure that they cannot see the reasoning behind them. Possibly the algorithms are finding connections so deep and subconscious that customers themselves wouldn’t even recognize them.
At one point, Chabbert showed me a list of movies that his algorithm had discovered share some ineffable similarity; it includes a historical movie, “Joan of Arc,” a wrestling video, “W.W.E.: SummerSlam 2004,” the comedy “It Had to Be You” and a version of Charles Dickens’s “Bleak House.” For the life of me, I can’t figure out what possible connection they have, but Chabbert assures me that this singular value decomposition scored 4 percent higher than Cinematch — so it must be doing something right. As Volinsky surmised, “They’re able to tease out all of these things that we would never, ever think of ourselves.” The machine may be understanding something about us that we do not understand ourselves.
In these cases, it’s tempting to think there’s some deeply psychological property of the film that’s been captured by the analysis. Maybe all trigger a wistful nostalgia, or perhaps each represents the same unconscious fantasy.
It could also be that each is under 90 minutes, or comes with free popcorn. It could even be that the grouping is entirely spurious and represents nothing significant. Importantly, the answer to these questions is not in the data to be discovered, we have to make the interpretation ourselves.
Experimental methods go from meaning to data, while exploratory methods go from data to meaning. Somewhere in the middle is our mind.
Join over 320,000 readers. Get a free weekly update via email here.
Related posts:
New Neuroscience Reveals 4 Rituals That Will Make You Happy
New Harvard Research Reveals A Fun Way To Be More Successful
How To Get People To Like You: 7 Ways From An FBI Behavior Expert