Friday, June 7, 2013

Not your granddaddy's metadata: don't believe the PRISM anti-hype

There's plenty of news and pundit noise being extruded on the PRISM break, i.e., on reports originating with Glenn Greenwald of the U.K.'s Guardian about massive, broad NSA surveillance of phone and internet activity.

The thing that's burning me up about the story is not so much that the government is spying on us. As many are pointing out, this is not news, even if the smoking gun is very newsworthy corroboration of hints and suspicions that many -- and many who have never in their lives worn tinfoil headgear -- have been describing, inferring, and warning about for years.

The thing that's burning me up is the misinformation being promulgated by President Obama, Senator Feinstein, the Wall Street Journal, and others asserting that mere data mining of "metadata" is minor and unimportant and non-invasive. This is a lie. We should make an effort to understand why. And we should also understand that social media 'miracles' and conveniences coming out of Silicon Valley and elsewhere have let a very powerful genie out of its bottle. We need to have a national and international conversation about what that really means.

It's not necessarily easy, especially for people who aren't data nerds, to understand what is even meant by "metadata" ... let alone to assess the degree to which its exposure compromises privacy and (when it is stored indefinitely, and perhaps abused by a government -- whether today's or some future administration), security.

As someone who has a day job as a data nerd, I'm going to give the topic a shot.

The truth about pervasive domestic spying, revealed

From Glenn Greenwald in the Guardian article that started it all, NSA collecting phone records of millions of Verizon customers daily:
The National Security Agency is currently collecting the telephone records of millions of US customers of Verizon, one of America's largest telecoms providers, under a top secret court order issued in April.

The order, a copy of which has been obtained by the Guardian, requires Verizon on an "ongoing, daily basis" to give the NSA information on all telephone calls in its systems, both within the US and between the US and other countries.

The document shows for the first time that under the Obama administration the communication records of millions of US citizens are being collected indiscriminately and in bulk – regardless of whether they are suspected of any wrongdoing.
And it ain't just metadata about phone calls. It's all you and I do on the intertubes as well. Again, the Guardian's Greenwald in NSA Prism program taps in to user data of Apple, Google and others:
The National Security Agency has obtained direct access to the systems of Google, Facebook, Apple and other US internet giants, according to a top secret document obtained by the Guardian.

The NSA access is part of a previously undisclosed program called Prism, which allows officials to collect material including search history, the content of emails, file transfers and live chats, the document says.

[...]

Although the presentation claims the program is run with the assistance of the companies, all those who responded to a Guardian request for comment on Thursday denied knowledge of any such program.
And from the Washington Post, U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program, where you can see slides from the PowerPoint presentation that has made apparent the scope of government snooping:
The National Security Agency and the FBI are tapping directly into the central servers of nine leading U.S. Internet companies, extracting audio and video chats, photographs, e-mails, documents, and connection logs that enable analysts to track foreign targets, according to a top-secret document obtained by The Washington Post.

The program, code-named PRISM, has not been made public until now. It may be the first of its kind. The NSA prides itself on stealing secrets and breaking codes, and it is accustomed to corporate partnerships that help it divert data traffic or sidestep barriers. But there has never been a Google or Facebook before, and it is unlikely that there are richer troves of valuable intelligence than the ones in Silicon Valley.
Those are rich troves of intelligence about you. And you. And you. And me.

Smoke and mirrors: misdirection about metadata

In response to this inconvenient little infoleak? We're assured by our leaders in politics and business that the collection and analysis of "metadata" is nothing to be worried about.

(What is metadata? Data about data, in a word. In the case of those phone calls, it's the caller's number, the number called, the time the call started, and the call's duration -- for every call routed by Verizon.)

No names. No credit card numbers. And therefore, our political leaders tell us, there's no real problem.

Nothing to see here. Everybody just move along...

From Elspeth Reeve in The Atlantic Wire, Washington Is Trapped in Its Own Prism of Data-Mining Self-Defense:
At a press briefing ostensibly about his health-care program and its success on Friday afternoon, President Obama defended the specificity of the NSA program that has become "the most prolific contributor" to his daily intelligence briefings. Don't worry, the president said, "No one is listening to your phone calls," and the NSA is not looking at names or their content. But metadata reveals the phone numbers, and the time, length, and location of calls. "The program does not allow the Government to listen in on anyone's phone calls," Director of National Intelligence James Clapper (right) wrote in his two-page response to The Guardian article on Thursday night, which President Obama largely echoed on Friday. California Sen. Dianne Feinstein assured reporters on Thursday, "As you know, this is just metadata. There is no content involved. In other words, no content of a communication." The Wall Street Journal's editorial board is sure there's nothing to be worried about. "We bow to no one in our desire to limit government power, but data-mining is less intrusive on individuals than routine airport security," the Journal says, in an editorial titled "Thank You for Data-Mining."
These assertions are not only ridiculous, they're clumsy.

On May 31st, Quentin Hardy of the NY Times hosted plenary sessions on the second day of the 2nd annual DataEdge Conference on the UC Berkeley campus, which is where I happen to work. Also, as it happens, I had the good fortune to attend DataEdge. It was eye-opening couple of days, but not all in a good way.

Following the conference, the NY Times' Hardy wrote about the single most damning point (in my reckoning, anyway) made in the two days attendees spent geeking out at Sutardja Dai Hall. That point applies directly to the revelations published by the Guardian less than a week later, and goes right to the heart of the question of the innocuousness of "metadata."

Hardy's article is titled Why Big Data Is Not Truth. In it, he covers Kate Crawford's scintillating keynote address, The Raw and the Cooked: Mythologies of Big Data. (I'm told that video of the conference sessions will eventually be published on the web site of Berkeley's School of Information, which organized and hosted DataEdge; if you'd like to stay tuned, I'll circle back and link to it in the comments when I see the video come on-line.) Here's Hardy's rendering of Kate Crawford's myth #5:
Myth 5: Big Data Is Anonymous

A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. “With just two, you can identify 50 percent of them,” Ms. Crawford said. “With a fingerprint, you need 12 data points to identify somebody.” Likewise, smart grids can spot when your friends come over. Search engine queries can yield health data that would be protected if it came up in a doctor’s office.
And here's the abstract of the study in Nature referenced by Crawford, Unique in the Crowd: The privacy bounds of human mobility:
We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual's privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.
In a soundbyte? A little metadata goes a long way. Perhaps a lot further than somebody whose activity is described by the metadata might like.

Four spatio-temporal points (where you were when); a few "Likes" on Facebook; lingustic analysis of your tweets; a social-network analysis of the people with whom you exchange e-mail... If the "metadata" from all these can be resolved to a person, like a fingerprint, then "isolated" activities one might not worry about putting in the public record add up, in the aggregate, to a very great deal that can be known about who you are, what you do, with whom you do it, what you believe ... perhaps how you vote? Where you bank? With whom you had a "confidential" consultation or a "secret" affair (if, say, you both carried your cell phones into the same location on multiple occasions)?

The cult of social media and the generation of trackable data

George Packer wrote a fascinating article for the 27 May 2013 issue of The New Yorker, titled Change The World. His main thesis is that social media moguls in Silicon Valley are beginning to engage in politics to a degree they have not done before. (He also opens a window onto the changes Silicon Valley has undergone in the past 35 years, a nostalgic romp for me: Packer and I attended the same high school, he graduated a year after I did, and the bike route he described taking to school was the same route I rode.)

But more striking than the signs of incipient political engagement by the likes of Mark Zuckerberg, or nostalgia about 1970s Palo Alto, is Packer's portrayal of the overwhelming hubris of those driving some of the most powerful culture-inflecting companies of the 21st century. From the article:
A few years ago, when Barack Obama visited one Silicon Valley campus, an employee of the company told a colleague that he wasn't going to take time from his work to go hear the President's remarks, explaining, "I'm making more of a difference than anybody in government could possibly make." In 2006, Google started its philanthropic arm, Google.org, but other tech giants did not follow its lead. At places like Facebook, it was felt that making the world a more open and connected place could do far more good than working on any charitable cause. Two of the key words in industry jargon are "impactful" and "scalable" -- rapid growth and human progress are seen as virtually indistinguishable. One of the mottoes posted on the walls at Facebook is "Move fast and break things." Government is considered slow, staffed by mediocrities, ridden with obsolete rules and inefficiencies.
Itamar Rosen of Facebook -- who manages the company's Data Science team and was interviewed at the DataEdge conference I attended late last week -- admitted that breaking things is in fact fading as a Facebook ethos. By the time you've got a billion users, Rosen explained, keeping things working starts to matter.

And as we're seeing this week, Facebook and its most successful social media peers are rather more deeply embedded with certain segments of government than Silicon Valley's freewheeling, libertarian ethos suggests.

The thing is, all the groovy connectedness social media enables leaves a trackable spoor of metadata that is in actual fact the monetizable lifeblood of the corporate entities that permit us to like, friend, share, tweet, post, and tumbl (tumbl?).

That trackable spoor of metadata is how these corporations learn enough about us to enable targeted advertising that they turn around and sell to advertisers. And, as we learned this week, that they turn over to the NSA so that the NSA too can target us for ... whatever they want, now and in the future.

By way of contextualizing the Facebook CEO's evolution into a more politically sophisticated animal, Packer describes an interview with Mark Zuckerberg five years ago:
In a 2008 interview, Mark Zuckerberg recounted how young Lebanese Muslims who might have been tempted by extremism broadened their views after going on Facebook and friending people "who have gone to Europe." He suggested that the social network could help solve the problem of terrorism. "It's not out of a deep hatred of anyone," Zuckerberg offered. "It comes from a lack of connectedness, a lack of communication, a lack of empathy, and a lack of understanding." Successive U.S. Administrations had failed to resolve the Israeli-Palestinian conflict; perhaps the answer was to get as many people as possible on Facebook.
Ouch. That last bit may have been a little heavy-handed, you can just about see the smoke coming out of the author's ears. But Packer's point about breathless glorification of some inherent good in mere "connectedness" is made more carefully at two later points in his article:
Technology can be an answer to incompetence and inefficiency. But it has little to say about larger issues of justice and fairness, unless you think that political problems are bugs that can be fixed by engineering rather than fundamental conflicts of interest and value.
And, a few pages later:
The idea of a frictionless world, in which technology is a force for progress as well as a source of wealth, leaves out the fact that politics inevitably means clashing interests, with winners and losers.
Indeed. In real life, it's complicated.

And connectedness has consequences.

So?

Quentin Hardy further quotes from Kate Crawford's DataEdge keynote in a finish to his article, words that aptly warn against the news that leaked out of NSA a few days after Crawford spoke them:
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, "We need to think about how we will navigate these systems. Not just individually, but as a society.""
I would say that now might be a very fine time to think about just exactly that.



Related posts on One Finger Typing:
Six ways your electronica owns you
Pimped by our own devices: electronica, the cloud, and privacy piracy
Four eyes: 4 ways Google Glass might change the world
Time, History, and Human Forgetting


Thanks to Camille Gévaudan via Wikimedia Commons for the image of Mark Zuckerberg in an interview with Maurice Lévy in May 2011.

2 comments:

  1. Thanks for this Steve. I've been thinking about this issue today, as a lot of people have I'm sure. Your info is helping me. I realize that I don't think of my internet usage as very private at all. I know it's monitored at work and when I search something, ads for that pop up for the next few weeks....and so forth. A friend said her kids know when she's about to get home using the "find my phone" app and I realized I could track my kid that way, which seems so creepy but also appealing. Just rambling here....

    ReplyDelete
    Replies
    1. O, I sure don't like the idea of my (hypothetical) kids -- or anybody else, for that matter -- tracking me using some device in my pocket as a beacon. That is creepy ... and I agree it's creepy that a parent would be (understandably!) tempted to answer that key question, Where's My Child, by any means necessary.

      It seems so quaint now to have been worried about those RFID beacons that the government now embeds in U.S. passports.......

      There's a healthy bloom of diaries posted on Daily Kos about PRISM today ... I've cross-posted this blog and it too is generating discussion.

      Delete