Foundations of Data Privacy Course Blog

Lower Bound for Noise in a Private Mechanism (Justin Hsu)


Before differential privacy was formally defined, there was work done on privacy for statistical databases. A 2003 paper by Irit Dinur and Kobbi Nissim, titled Revealing Information while Preserving Privacy, considers many of the same issues as differential privacy. The paper defines a notion of privacy, and establishes a tight bound on how much noise must be added to protect database privacy.

The paper first models a statistical database as a string of bits, together with a query-responder that answers sum-subset queries against this database with some added perturbation, to protect privacy. Without a definition of differential privacy, the authors did not try to give a definition of privacy, but instead defined non-privacy, which happens if with high probability, an adversary can reconstruct almost all of the database by querying the database.

As expected, if the adversary can issue many queries, a large amount of noise is needed. The main result of the paper is that if less than noise is added, then the database is non-private given a polynomially bounded adversary. The main idea is for the adversary to draw a polynomial number of queries at random, and solve a linear program to find a database that is consistent with the responses, knowing that the noise is . The paper shows that with high probability, at least one of the queries chosen will disqualify every string that differs from the actual database on some fraction of the elements. In other words, once the potential strings that don’t agree with the query answer are weeded out, all the strings that almost entirely reconstruct the hidden database (with high probability).

A further wrinkle is that this lower bound on noise is tight. That is, the paper produces an algorithm with noise that is provably private. However, since there is no good definition of privacy here (only non-privacy), the only way to have a private algorithm is to have one that reveals almost no useful information at all.

This paper is interesting for a few other reasons. Firstly, the authors briefly explore a “CD” model, where the database is perturbed and “written on a CD” and distributed to the adversary, who can make arbitrary queries against this modified database. Secondly, because the paper investigates how much the noise can be reduced, if we are willing to constrain the adversary complexity further (say, for an adversary that can issue linear, or logarithmic number of queries). Finally, the paper indicates that a better definition of privacy is needed, otherwise very little usable information can be released.