In a prior post I alluded to the fact that the buzzword “Big Data” is just a new term for “data mining.” The potential for big data analytics to discover new things about us is frightening from a privacy perspective, as I discussed. But, it can also be—let’s be honest—very cool.
For example, this Google Ngram—based on the texts of millions of books—charts the shifting usages of the phrase “the United States is” versus “the United States are.” The scientist Stephen Wolfram created a lot of buzz with a blog about how he crunched over 20 years of his own e-mail traffic and other data to reveal patterns in his daily activities.
Such examples and others like them are definitely nifty. Yet they are also a bit... empty. They don’t quite seem revolutionary. And in fact a number of commentators have raised the question about whether big data is a trendy topic du jour that is being seriously over-hyped. The question is whether big data is just a compelling idea that will turn out to have few truly transformative applications—an intellectual trend like that temporarily grips the imagination. (The fleeting fascination with “chaos theory” in the 1990s comes to mind.)
Or, like the Aztecs using wheels but only for children’s toys, have we stumbled upon a great tool that we are only starting to figure out how to really, fully exploit?
Ultimately we can’t know yet. But I recently came across this passage which is very relevant to this question. It’s the great anthropologist Clifford Geertz paraphrasing the philosopher Susanne Langer; Geertz is talking about a concept in anthropology, but I suspect this pretty much captures the status of Big Data as well:
In her book, Philosophy in a New Key, Susanne Langer remarks that certain ideas burst upon the intellectual landscape with a tremendous force. They resolve so many fundamental problems at once that they seem also to promise that they will resolve all fundamental problems, clarify all obscure issues. Everyone snaps them up as the open sesame of some new positive science, the conceptual center-point around which a comprehensive system of analysis can be built. The sudden vogue of such a grande ideé, crowding out almost everything else for a while, is due, she says, “to the fact that all sensitive and active minds turn at once to exploiting it. We try it in every connection, for every purpose, experiment with possible stretches of its strict meaning, with generalizations and derivatives.”
After we have become familiar with the new idea, however, after it has become part of our general stock of theoretical concepts, our expectations are brought more into balance with its actual uses, and its excessive popularity is ended. A few zealots persist in the old key-to-the-universe view of it; but less driven thinkers settle down after a while to the problems the idea has really generated. They try to apply it and extend it where it applies and where it is capable of extension; and they desist where it does not apply or cannot be extended. It becomes, if it was, in truth, a seminal idea in the first place, a permanent and enduring part of our intellectual armory. But it no longer has the grandiose, all-promising scope, the infinite versatility of apparent application, it once had. The second law of thermodynamics, or the principle of natural selection, or the notion of unconscious motivation, or the organization of the means of production does not explain everything, not even everything human, but it still explains something; and our attention shifts to isolating just what that something is, to disentangling ourselves from a lot of pseudoscience to which, in the first flush of its celebrity, it has also given rise.
We’ve certainly seen attempts to over-use the neat-o concepts of data mining in recent years—such as the embrace by parts of our security establishment of Total Information Awareness and related notions that pattern-based data mining can be used to identify terrorists. Here’s looking forward to the day when “less driven thinkers” within those agencies “settle down” to a realistic view of what data mining can do.
Which is not to say that it can’t actually do a lot of things, good and bad—and that we don’t need better privacy protections.