Why ‘Just Metadata’ is not the whole story, MIT can show you

prism2

One of the key arguments concerning NSA surveillance for defenders (Obama for instance) of the program is claiming “it’s just metadata”. This is in no way a valid argument. If you know the slightest thing about metadata, you would know that a little (or a lot) of metadata can tell you a lot about a person. Hey it’s the era of Big Data isn’t.

One of the things I talked about in my talk on Big Data & Privacy is that their isn’t such a thing as anonymous data. A lot of this has to do with patterns in metadata. Take this study that was published in Nature about how much a little metadata can reveal, entitled Unique in the Crowd: The privacy bounds of human mobility. The researchers conclusion was quite simple: metadata reveals a ton.

A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to an individual. For instance, in one study, a medical database was successfully combined with a voters list to extract the health record of the governor of Massachusetts27. In another, mobile phone data have been re-identified using users’ top locations28. Finally, part of the Netflix challenge dataset was re-identified using outside information from The Internet Movie Database29.

All together, the ubiquity of mobility datasets, the uniqueness of human traces, and the information that can be inferred from them highlight the importance of understanding the privacy bounds of human mobility. We show that the uniqueness of human mobility traces is high and that mobility datasets are likely to be re-identifiable using information only on a few outside locations. Finally, we show that one formula determines the uniqueness of mobility traces providing mathematical bounds to the privacy of mobility data. The uniqueness of traces is found to decrease according to a power function with an exponent that scales linearly with the number of known spatio-temporal points. This implies that even coarse datasets provide little anonymity.

Another cool case comes from GigaOM writer Derrick Harris. He spend some time digging into his own metadata from phone records, email and LinkedIn. His conclusion:

As you can see from just my few hours of tinkering with my personal data over the weekend, metadata can paint a pretty complete picture of a person’s habits and connections.

Find out for yourself
Researchers at the MIT Media Lab have build a web app that — once you grant it permission to do so — digs through your email history to piece together a “people-centric view of your email life”. It displays senders, recipients (including CC’s)and timestamps. What first appears to be an arbitrary list of people you’ve contacted is actually linked together in logical ways. Of course all depending on how much you use gmail and for how long.

So ‘just metadata’ does not justify anything if you ask me.

Leave a Reply