NSA can easily find individuals hidden in metadata - study

Published 26 Dec, 2013 22:13 | Updated 26 Dec, 2013 22:12

In defending the NSA's surveillance policies, many have cited the agency's claim that it merely collects phone numbers dialed, lengths of calls, and other metadata. Yet researchers now say the NSA can identify individuals in that vast collection of data.

Scholars at Stanford University in California set out to determine how, if at all, the NSA's metadata collection impacts the individual Americans whose information is swept up. The indiscriminate collection of phone records is one of the NSA's primary surveillance programs, and one of the first revealed by NSA whistleblower Edward Snowden. US President Obama sat down with Charlie Rose of PBS in June to defend the government's position.

“Program number one is called the 2015 program. What that does is it gets data from the service providers – like a Verizon – in bulk,” Obama said. “And basically you have call pairs. You have my telephone number connecting with your telephone number. There are no names, there's no content in that database. All it is, is the number pairs, when those calls took place, how long they took place. So that database is sitting there.”

That might be true, technically. But Stanford researchers Jonathan Mayer and Patrick Mutchler found that the agency does not need to collect names to identify an individual.

The pair built an app - known as MetaPhone - which Android users could volunteer to sign up for, in order to give the researchers access to their metadata. They assumed it would be simple to find users based on their metadata and, as it were, it took just hours.

“We randomly sampled 5,000 numbers from our crowdsourced MetaPhone dataset and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources – all free and public – we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook,” Mayer and Muchler wrote.

“What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73,” they continued.

One of the experiment's natural limitations was money. Mayer and Muchler had found how easy it was to track the data with low resources, but the NSA has an annual budget in the tens of millions.

“How about if money were no object? We don't have the budget or credentials to access a premium data aggregators, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched. Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.”

This study is not the first of its kind. Since media outlets first began publishing the Snowden leaks six months ago, security experts and university professors have conducted tests to find out how much data is really collected and what the NSA is capable of doing with it. One study found that intelligence analysts could accurately guess a Facebook user's sexual orientation based on the pages they “liked.”

“If a few academic researchers can get this far this quickly, it's difficult to believe theNSA would have any trouble identifying the overwhelming majority of american phone numbers,” the Stanford team wrote.

You can share this story on social media:

Follow RT on