Archive by Author

Facebook Agrees to F.T.C. Settlement on Privacy

29 Nov

Not being careful about privacy is expensive. In a settlement with the FTC, facebook agreed to “subject itself to an independent audit on consumer privacy for the next 20 years,” and “pay $16,000 a day for each allegation if they violated the settlement in the future.”

 

Read more here: http://www.nytimes.com/2011/11/30/technology/facebook-agrees-to-ftc-settlement-on-privacy.html

Finishing the proof of the iterative construction mechanism

4 Oct

I didn’t quite get to finish the utility theorem for the iterative construction mechanism in class. Its in the notes though: check out the proof of Theorem 5 in the lecture notes if you want to see how to finish the calculations.

Class project

27 Sep

FYI: The list of project deadlines and some project ideas is now online: http://www.cis.upenn.edu/~aaroth/courses/projectideas.html

Talks instead of class on Thursday

20 Sep

As I announced in class today, I will not be lecturing this Thursday 9/22. This is because in a stroke of good luck, we have two excellent talks on differential privacy, given by Adam Smith and Sofya Raskhodnikova. Note that Sofya’s talk will be in a different time and place than I announced in class!

 

Adam’s talk will be at noon on 9/22 in Levine 307.

Sofya’s talk will be at 1:30 on 9/22 on Towne 315 (This is where and when we usually have class!)

 

Speaker: Adam Smith

Title: “Differential Privacy in Statistical Estimation Problems”

Consider an agency holding a large database of sensitive personal information — medical records, census survey answers, or web search records, for example. The agency would like to discover and publicly release global characteristics of the data (say, to inform policy and business decisions) while protecting the privacy of individuals’ records. This problem is known variously as “statistical disclosure control”, “privacy-preserving data mining” or simply “database privacy”.

This talk will describe differential privacy, a notion which emerged from a recent line of work in theoretical computer science that seeks to formulate and satisfy rigorous definitions of privacy for such statistical databases.

After a brief introduction to the topic, I will discuss recent results on differentially private versions of basic algorithms from statistics. I will discuss both a general result that works for any asymptotically normal statistical estimator (based on a STOC 2011 paper) and some results tailored to convex optimization problems (unpublished work, joint with A. G. Thakurta).

 

Speaker: Sofya Raskhodnikova

Title: “Testing and Reconstruction of Lipschitz Functions with Applications to Differential Privacy”

A function f : D -> R has Lipschitz constant c if d_R(f(x), f(y)) <= c d_D(x, y) for all x, y in D, where d_R and d_D denote the distance functions on the range and domain of f, respectively. We say a function is Lipschitz if it has Lipschitz constant 1. (Note that rescaling by a factor of 1/c converts a function with a Lipschitz constant c into a Lipschitz function.) Intuitively, a Lipschitz constant of f is a bound on how sensitive f is to small changes in its input. Lipschitz constants are important in mathematical analysis, the theory of differential equations and other areas of mathematics and computer science. However, in general, it is computationally infeasible to find a Lipschitz constant of a given function f or even to verify that f is c-Lipschitz for a given number c.

We initiate the study of testing (which is a relaxation of the decision problem described above) and local reconstruction of the Lipschitz property of functions. A property tester, given a proximity parameter epsilon, has to distinguish functions with the property (in this case, Lipschitz) from functions that are epsilon-far from having the property, that is, differ from every function with the property on at least an epsilon fraction of the domain. A local filter reconstructs a desired property (in this case, Lipschitz) in the following sense: given an arbitrary function f and a query x, it returns g(x), where the resulting function g satisfies the property, changing f only when necessary. If f has the property, g must be equal to f.

We design efficient testers and local reconstructors for functions over domains of the form {1,…,n}^d, equipped with L1 distance, and give corresponding lower bounds. The testers that we developed have applications to programs analysis. The reconstructors have applications to data privacy. The application to privacy is based on the fact that a function f of entries in a database of sensitive information can be released with noise of magnitude proportional to a Lipschitz constant of f, while preserving the privacy of individuals whose data is stored in the database (Dwork, McSherry, Nissim and Smith, TCC 2006). We give a differentially private mechanism, based on local filters, for releasing a function f when a purported Lipschitz constant of f is provided by a distrusted client.

What your twitter profile reveals about you

20 Sep

Michael pointed me to the hunch.com “Twitter Predictor”. It is a remarkable piece of software — if you have a twitter account, I suggest you try it out.

What it does is it takes a look at the publicly available twitter graph: your twitter profile, and the profiles of the people who you follow, and those that follow you. It then predicts how you will answer a wide variety of personal questions about yourself, ranging from your technological prowess, to your beliefs about religion, the death penalty, abortion, firearms, superstitions, and all matter of other seemingly unrelated things.

So how well does it work? Remarkably well.

When I tried it out, the “Twitter Predictor” asked me 20 questions and guessed my answers correctly on all 20 of them. Many of these might be considered “sensitive information”, involving my views on religion, abortion, and other hot-button issues. This is all the more remarkable given how little information they really had about me. I am not what you would call an active twitter user. I have made a grand total of 26 tweets, the last of which was in 2008. The plurality of these tweets are invitations for people to join me for lunch. I only follow 23 people, and needless to say, I haven’t “followed” anyone that I have met since 2008.

So what does this mean for privacy? I think that it is a compelling demonstration of the power of correlations in data. You might think that making the twitter graph public is innocuous enough, without realizing that it contains information beyond that. I might not mind if everyone knows who the 23 people I knew in grad school who signed up for twitter accounts in 2008 are, but I might mind if everyone knows who I voted for in the last election.

 

This all just serves to illustrate the problems with trying to partition the world into “sensitive information” and “insensitive information”, or with assuming that data analysts don’t know anything else about the world except what you have told them…

Privacy breach at a Stanford Hospital.

8 Sep

A timely article in the New York times about why it is dangerous to make private data sets available in their original form, even to contractors who have signed confidentiality agreements.


							

Welcome to the Algorithmic Foundations of Data Privacy

1 Sep

Welcome to CIS 800/002, The Algorithmic Foundations of Data Privacy at the University of Pennsylvania. The class will meet Tuesdays and Thursdays from 1:30-3:00 pm in Towne 315.

This class will serve as an introduction to research in differential privacy and is designed for graduate students wanting to familiarize themselves and contribute to research in this area. This course will be graded on two components: A semester-long research project, and participation, which will include contributing posts to this blog. Posts will include math, so make sure you can see the embedded \LaTeX equations here:

\Pr[M(D) \in S] \leq \exp(\epsilon)\Pr[M(D') \in S] + \delta

The first class will be on Thursday, September 8th. See you there!