Simple Dataset About American Colonists Shows Power of Metadata

Jay Stanley,
Senior Policy Analyst,
ACLU Speech, Privacy, and Technology Project
June 11, 2013

In the best tradition of educators who manage to be both entertaining and enlightening, Duke sociology professor Kieran Healy has posted “Using Metadata to Find Paul Revere”—a fascinating demonstration of just how revealing metadata can be when subject to certain quite simple but powerful number-crunching techniques. Using simple information about 260 colonists in the years before the American Revolution (what organizations they belong to), he shows step by step how the lowest analyst at the “Royal Security Agency” could use that data to build powerful insights into what might be going on among the rebellious colonists.

The scariest thing about this is just how small and simple the starting data set is. Healy concludes:

I must ask you to imagine what might be possible if we were but able to collect information on very many more people, and also synthesize information from different kinds of ties between people! For the simple methods I have described are quite generalizable in these ways, and their capability only becomes more apparent as the size and scope of the information they are given increases. We would not need to know what was being whispered between individuals, only that they were connected in various ways. The analytical engine would do the rest!

In other words, this demonstration has just show us a hint of what an organization like the NSA can probably do with metadata.

More evidence that (as we have argued at greater length elsewhere) those downplaying the intrusiveness of metadata are way behind the times.