We hear lot these days about something called "Big Data." It's a macho-sounding thing. Anything called "Big" usually is, like big business or the big bang theory. But what does it mean?
In the world of research everybody likes to claim that they have Big Data. It's not enough just to have a bright idea while standing in line at the supermarket checkout. You need a theory that is backed up by millions upon millions of numbers, which is what Big Data means.
Years ago when I was doing research in the social sciences I had a lot of small data: personal observations, interviews, newspaper clippings and so on, and I would juggle these odds and ends around until I reached some sort of conclusion. This is no longer good enough. Now I would need a mega gigabyte computer, a huge amount of stuff to put into it, and an algorithm to tell me what it all means. The result might be the same as what I could figure out with my scraps of paper, but when it comes to scientific credibility there's no comparison. Big Data wins every time.
What is rather creepy about this is the way we have all become gathered up into these big datasets without knowing it. Every purchase, every phone call, every social media encounter, every medical record, even every book taken out of the library becomes a grain of sand in the Big Data mountain. The government uses some of it for "security" reasons, as we now know all too well, and business uses most of the rest to sell stuff. Only a fraction of Big Data is used by researchers looking to answer Big Questions about the human condition and the meaning of life.
It's not so much a privacy issue, because most Big Data is anonymous. It's the uneasy feeling that, far from being unique individuals, we are just…well…data for somebody else to use for politics or profit. We're all statistics, we know that, but a datum seems smaller and more trivial than a statistic. Statistics are useful, and often amusing, and almost invariably wrong. They tell us, for example that 70 percent of Americans who own running shoes never run, and that the average English person drinks 3.77 cups of tea a day. It was not for nothing that Mark Twain coined the phrase "Lies, damned lies, and statistics."
Big Data, made possible by computing power, is old-fashioned statistics writ large and without the deniability. It brings together such gigantic numbers of facts that it may actually hit on the truth sometimes. For example there's a new book called Dataclysm by Christian Rudder, that explores what Big Data can tell us about human behavior, mostly dating behavior. It turns out that most of it is obvious: pretty women get more dates, people who share the same interests are more likely to get along, and so on. You could get the same answers by asking your mother. Other Big Data studies from the General Social Survey and the U.S. Census have shown that people are less trusting than they used to be, and slightly less happy, and that newspaper readership is declining. Who would have guessed it?
Now I have every respect and sympathy for researchers who spend their days laboring over screens full of long numbers. It's not half as much fun as research used to be, although I'm sure it is much more scientific. It is almost embarrassing to recall that what we used to call research involved things like reading books, going out into the real world, talking to people, and trying to figure out what on earth was going on.
Copyright: David Bouchier