Before differential privacy was formally defined, there was work done on privacy for statistical databases. A 2003 paper by Irit Dinur and Kobbi Nissim, titled Revealing Information while Preserving Privacy, considers many of the same issues as differential privacy. The paper defines a notion of privacy, and establishes a tight bound on how much noise must be added to protect database privacy.
The paper first models a statistical database as a string of bits, together with a query-responder that answers sum-subset queries against this database with some added perturbation, to protect privacy. Without a definition of differential privacy, the authors did not try to give a definition of privacy, but instead defined non-privacy, which happens if with high probability, an adversary can reconstruct almost all of the database by querying the database.
As expected, if the adversary can issue many queries, a large amount of noise is needed. The main result of the paper is that if less than
A further wrinkle is that this lower bound on noise is tight. That is, the paper produces an algorithm with
This paper is interesting for a few other reasons. Firstly, the authors briefly explore a “CD” model, where the database is perturbed and “written on a CD” and distributed to the adversary, who can make arbitrary queries against this modified database. Secondly, because the paper investigates how much the noise can be reduced, if we are willing to constrain the adversary complexity further (say, for an adversary that can issue linear, or logarithmic number of queries). Finally, the paper indicates that a better definition of privacy is needed, otherwise very little usable information can be released.