Archive | November, 2011

## Facebook Agrees to F.T.C. Settlement on Privacy

29 Nov

Not being careful about privacy is expensive. In a settlement with the FTC, facebook agreed to “subject itself to an independent audit on consumer privacy for the next 20 years,” and “pay \$16,000 a day for each allegation if they violated the settlement in the future.”

## An Unweighted Vertex Cover Algorithm: An Alternative View(Hallucinated edges) (Salar Moarref)

28 Nov

Remember that in class we talked about the $Vertex$ $Cover$ problem in the framework of differential privacy. In original form of the problem, we want to pick a set $S$ of vertices of minimal size so that every edge in the graph is incident to at least one vertex in $S$. In the privacy-preserving version of the problem, the private information we wish to conceal is the presence of absence of each edge. We talked about a randomized algorithm which outputs a permutation of all vertices of the graph, and the vertex cover will be defined by picking, for each edge, whichever of its endpoints appears first in the permutation. We saw that this vertex cover is $(2+O(1/\epsilon))$-approximate and $\epsilon$-differentially private. The algorithm was based on a simple non-private 2-approximation to vertex cover [2] that repeatedly selects an uncovered edge uniformly at random, and includes a random end-point of the edge which can be viewed as selecting a vertex at random with probability proportional to its uncovered degree. We took this formulation and mix in a uniform distribution over the vertices, using a weight that will grow as the number of remaining vertices decreases:

Algorithm 1: Unweighted Vertex Cover [1]
let $n \leftarrow |V|, V_1 \leftarrow V, E_1 \leftarrow E$
for $i = 1,2,..,n$
let $w_i \leftarrow (4/\epsilon) \times \sqrt{n/(n-i+1)}$
pick a vertex $v \in V_i$ with probability proportional to $d_{E_i}(v)+w_i$
output $v$. let $V_{i+1} \leftarrow V_i \backslash \{v\}, E_{i+1} \leftarrow E_i \backslash (\{v\} \times V_i)$
end for

As stated the algorithm outputs a sequence of vertices, one per iteration. As remarked above, this permutation defines a vertex cover  by picking the earlier occurring end point of each edge.

There is a slightly different way to implement the intuition behind the above algorithm: imagine adding $O(1/\epsilon)$ “hallucinated” edges to each vertex (the other endpoints of these hallucinated edges being fresh “hallucinated” vertices), and then sampling vertices without replacement proportional to these altered degrees. However, once (say) $n/2$ vetices have been sampled, output the remaining vertices in random order. More specifically, given a graph $G=(V,E)$, we mimic the randomized proportional-to-degree algorithm for $\alpha n$ rounds where $\alpha < 1$, and output the remaining vertices in random order. That is, in each of the first  $\alpha n$ rounds, we select the next vertex $i$ with probability proportional to $d(i)+1/\epsilon$: this is equivalent to imagining that each vertex has $1/\epsilon$ “hallucinated” edges in addition to its real edges. When we select a vertex, we remove it from the graph, together with the real and hallucinated edges adjacent to it. This is equivalent to picking a random (real or hallucinated) edge from the graph, and outputting a random real endpoint. Outputting a vertex affects the real edges in the remaining graph, but does not change the hallucinated edges incident to other vertices.

The privacy analysis is similar to that of theorem 5.1 of [1]: if we assume weights being $w_i = 1/\epsilon$ for the first $\alpha n$ rounds and $w_i=\infty$ for the remaining rounds, we will have $2\sum_{i=(1-\alpha)n}^{n}\frac{1}{iw_i} \leq \epsilon(\frac{2\alpha}{1-\alpha})$-differential privacy.

To analyze the utility, we couple the algorithm with a run of the 2-approximation non-private algorithm $A$ that at each step picks an arbitrary edge of the graph and then picks a random endpoint. We refer to vertices that have non-zero “real” degree at the time they are selected by our algorithm as interesting vertices: the cost of our algorithm is simply the number of interesting vertices it selects in the
course of its run. Let $I_1$ denote the number of interesting vertices it selects during the first $\alpha n$ steps, and $I_2$ denote the number of interesting vertices it selects during its remaining $(1 - \alpha)n$ steps, when it is simply ordering vertices randomly. Clearly, the total cost is $I_1 + I_2$. We may view the first phase of our algorithm as selecting an edge at random (from among both real and hallucinated ones) and then outputting one of its endpoints at random. Now, for the rounds in which our
algorithm selects a real edge, we can couple this selection with one step of an imagined run of $A$ (selecting the same edge and endpoint). Note that this run of $A$ maintains a vertex cover that is a subset of our vertex cover, and that once our algorithm has completed a vertex cover, no interesting vertices remain. Therefore, while our algorithm continues to incur cost, $A$ has not yet found a vertex cover. In the first phase of our algorithm, every interesting vertex our algorithm selects has at least one real edge adjacent to it, as well as $1/\epsilon$ hallucinated edges. Conditioned on selecting an interesting vertex, our algorithm had selected a real edge with probability at least $\epsilon' = 1/(1+1/\epsilon)$. Let $R$ denote the random variable that represents the number of steps $A$ is run for. $E[R] \leq 2OPT$ since $A$ is a 2-approximation algorithm. By linearity of expectation:

$2OPT \geq E[R] \geq \epsilon' . E[I_1]$

Gupta et. al in [1] show that most of the algorithm’s cost comes from the first phase and hence that $I_2$ is not much larger than $I_1$:

$E[I_1] \geq \ln(\frac{1}{1-\alpha}) . E[I_2]$

Combining these facts, we get that:

$\frac{E[cost]}{OPT} \leq \frac{2}{\epsilon'}(1+\frac{1}{\ln((1-\alpha)^-1)})$

[1] A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. Nov 2009

[2] L. Pitt. A simple probabilistic approximation algorithm for vertex cover. Technical report, Yale University, 1985

## Fuzz (Arjun Narayan)

22 Nov

Fuzz is a programming language and runtime for running differentially
private queries. This post is in two parts: the first deals with the
Fuzz programming language and how it certifies programs as
differentially private through the use of a novel statically checked
type system, and the second with a runtime that takes care of some of
the real world problems that arise when we try and achieve differential privacy in practice.

“[A type system is a] tractable syntactic method for proving the
absence of certain program behaviors by classifying phrases according
to the kinds of values they compute” [3]. Importantly, a sound type
system provides a mathematical proof of the absence of those program
behaviors. The type systems most programmers are familiar with are
fairly conservative — for example, the Java programming language has
a type system that certifies (among other things) that the return
value of a method is guaranteed to be a subtype of the reference that
it is assigned to. We can also have stronger type systems that check
more complex properties, but may run into issues such as the
properties being computationally hard or even undecidable.

Fuzz is a functional programming language with a static type
system. The high level goal is to have the type system check if a
given program is differentially private. This type system captures a
notion of function sensitivity — “A function is said to be
c-sensitive if it can magnify distances between inputs by a factor of
at most c” [1]. Thus a Fuzz program can get an upper bound on the
sensitivity of a given program from its input database. Once we know
the sensitivity of the function, we can use the Laplace mechanism to
add a sufficient amount of noise so that the output of the program is
differentially private.

In the general case, calculating the sensitivity of an arbitrary
program is non-trivial, so the Fuzz language employs a very restricted
set of primitives and operators for which static checking is
tractable. The sensitivities of these primitives are proven in [1],
and the type system is proved to be sound. This means that the when a
type checker admits a program as having at most sensitivity c, we have
a _proof_ of this property [4]. Importantly, these primitives are
compositional — if f(x) is 2-sensitive and g(y) is 3-sensitive, we
can say that g(f(x)) is 6-sensitive, which makes type checking
tractable.

Given a type system for Fuzz and a working type checker, we thus have
a system that accepts programs and certifies them as differentially
private. Given that the type checker has to be decidable, we only
admit a subset of all differentially private programs as well-typed
in Fuzz. This is a tradeoff that has to be made in all type systems
— for example, observe that the java program

if(complex expression that always evaluates to true)
return 1;
else return false;

is not well typed — but if run will always return an integer — so
we need not reject it. However, the type system cannot always perform
the sort of computation to prove that the complex expression in the
if-clause always takes the then-branch and never takes the
else-branch, so it conservatively rejects the program as
ill-typed. Similarly, the Fuzz type system is unable to certify all
differentially private programs as such.

Thus we have a few primitives in Fuzz who’s sensitivities are
verified, whic enables the programmer to construct larger programs
that the static type checker can verify as differentially private. In
practice Fuzz is in fact expressive enough to be practically useful,
and there are several examples given in [2].

Thus we have a programming language and a type system that compiles
programs that are differentially private. However, our programming
language lives in a restricted functional world with no side effects,
and unfortunately this translates poorly to the real world where we
can write programs in the vein of:

if victim.reallysecretproperty == true
then {calculate the value of pi to a billion digits; return 1}
else {return 1}

In terms of the output, the above code has sensitivity zero. But to an
adversary the secret bit of information is leaked by simply observing
how long the program takes to run. We cannot hope to statically
capture this property easily — we would have to ensure that every
program takes _exactly_ as long to run regardless of code branches,
and then we would still have to deal with memory effects, the fact
that different rows of the database occupy different cache lines (and
thus have different cache-eviction behavior that might be visible to
the adversary) and deal with programs that do not halt.

The Fuzz runtime employs a very restricted runtime environment to
prevent the leakage of information through these
side-channels. Remember, we cannot simply mitigate the flow — even a
single bit of leakage violates our differential privacy
guarantees. The usefulness of the type system and its guarantees now
becomes clear: because a type system performs its analysis on the
_structures_ of the database and the query without looking at the
contents of the database, any conclusions it reaches and reveals to
the user is always differentially private. Thus we avoid the side
channel concerns of dynamically typed systems: if our analyzer needs
to look at the contents of the database, certifying that any outputs
and side effects are differentially private requires a more complex
proof.

However we are still left to close the gap between side-effect free
functional world and the real world runtime environment. The three
main side effects are a privacy budget attack, state attacks and
timing attacks. The privacy budget attack takes the form:

If victim.reallysecretproperty == true
then {do something that uses up the entire privacy budget}
else {do nothing}

Thus by simply observing the privacy budget after executing the query,
we reveal a bit of secret information. This is fixed by disallowing
nested queries, which we can statically check at the compiler
level. The state attack is solved by not having any global variables,
given that we have a functional programming language.

The timing channel is the hardest to close — to do so, the Fuzz
runtime employs a high level strategy of running primitives that run
through the entire row of a database into “microqueries” — each
microquery only looks at a single row of the database — and the
results are then aggregated together. Microqueries are then run using
a new primitive called “predictable transactions” which take an exact
amount of time to run, down to the microsecond level, regardless of
their output value. Predictable transactions are implemented at the
runtime level: the runtime takes a default value and a timeout, and
executes the code for the microquery while timing the computation. If
the timeout is reached before computation is finished, the entire
computation is “rolled back” and the default value is returned. If
computation finishes early, the microquery is paused until the timeout
expires. Thus, the microquery always takes exactly the amount of time
specified in the timeout. This is transparent to the program: it never
knows if the default value was returned or the value was calculated by
the microquery. In the worst case where the adversary attempts to
deliberately engineer the return of the default value in the presence
of a single row, we still prevent all side channels as the “output
channel” is covered by the Laplace mechanism. The timing channel is
thus rendered non-informative.

There are additional details to make the runtime perform the rollback
step in a constant amount of time — this has to happen regardless of
what the program does within the microquery — no matter how much
memory it allocates, or what system call it attempts to trigger that
might stall us and make us miss our deadline. The important takeaway
is that the timeout deadline is a _hard_ deadline — the runtime must
be able to abort anything the adversary might do and return the
default value without delaying even a few microseconds. Thus, this
requires tight control of the garbage collector, the memory allocation
code, etc.

In conclusion, [1] demonstrates a novel type system that allows us to
automatically certify some programs as differentially private without
requiring an explicit proof with each new algorithm. [2] implements a
runtime that takes care of the real-world problems that arise when we
try and implement a programming language runtime that actually has to
seal off all the adversarial situations that could arise to break the
differential privacy guarantees when the code we are running is
untrusted.

[1] Jason Reed, Benjamin Pierce — Distance Makes the Types Grow
Stronger, ICFP 2010

[2] Andreas Haeberlen, Benjamin Pierce, Arjun Narayan — Differential
Privacy Under Fire, Usenix Security 2011

[3] Benjamin C. Pierce — Types and Programming Languages, MIT Press 2002

[4] This is not strictly true, as our implementation of the type
checker is not proven bug-free. An interesting work is [5] which
proves the differential privacy properties in Coq — an interactive
theorem prover — and builds a program checker based on Hoare Logic
in Coq and proves that the programs it admits are differentially
private. If we implemented the Fuzz type checker in Coq and proved it
correct we would be closer to a proof [6].

[5] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella
Béguelin — Probabilistic Relational Reasoning for Differential Privacy, POPL 2012

[6] Conditioned on the Coq theorem prover being bugfree [7]

[7] And assuming ZF is consistent…[8]

## A Non-Interactive Algorithm for releasing accurate answers to m-sparse queries while preserving differential privacy (Rasul Tutunov)

8 Nov

On the lecture we saw the Sparse Multiplicative Weights (SMW) IDC Algorithm for releasing accurate answers to m-sparse queries while preserving differential privacy in the interactive setting (the data curator acts as an intermediary and must answer an adaptively chosen stream of queries as they arrive). Here I am going to describe another algorithm (proposed by Avrim Blum and Aaron Roth) for releasing accurate answers to m-sparse queries that works in the non-interactive setting, in which the data estimator must in one shot output a data-structure which encodes the answers to every query on interest. In this algorithm the entire computation must be performed in time polynomial in n, and the time required to evaluate any query on the output data structure must also be polynomial.

The non-interactive mechanism releases a random projection of the database into polynomially many dimensions, together with the corresponding projection matrix.Queries are evaluated by computing their projection using the public projection matrix , and then taking the inner product of the projected query and the projected database. The difficulty comes because of the projection matrix projects vectors from |X|-dimensional space to poly(n) dimensional space, and so normally would take  |X|poly(n) bits to represent. The algorithm are constrained to run in time poly(n), however, and so we need a concise representation of the projection matrix. This is achieved by using a matrix implicitly generated by a family of limited-independence hash functions which have concise representations. This requires using a limited independence version of the Johnson-Lindenstrauss  lemma, and of concentration bounds.

Before present the algorithm let me recall some basic concepts. A database is a multiset of elements from some (possibly) infinite abstract universe X.We write |D| = n  to denote the cardinality of D.For any $x \in X$ we can also write D[x] to denote:$D[x] = \{ x'\in D:x'=x\}$ the number of elements of type x in the database. Viewed this way, a database $D\in N^{|X|}$ is a vector with integer entries in the range [0,n]. A linear query $Q:X\to [0,1]$ is a function mapping elements in the universe to values on the real unit interval.The value of a linear query on a database is simply the average value of $Q$ on elements of the database:

$\frac{1}{n}\sum\limits_{x\in D}^{}Q(x) = \frac{1}{n}\sum\limits_{x\in X}^{}Q(x)D[x]$

Similarly to how we think of a database as a vector, we can think of a query as a vector $Q\in[0,1]^{|X|}$ with Q[x]. Viewed this way $Q(D) = \frac{1}{n}\langle Q,D\rangle$.

Definition 1 (Sparsity) The sparsity of a linear query Q is $|\{x\in X:Q(x) > 0\}|$, the number of elements in the universe on which it takes a non-zero value.We say that a query is m-sparse if its sparsity is at most m.We will assume that given an m-sparse query we ca quickly (in the polynomial in m) enumerate the elements $x\in X$ on which $Q(x) > 0$.

Definition 2 (Accuracy for non-Interactive Mechanism). Let $\mathcal{Q}$ be a set of queries. A non-interactive mechanism $M:X*\to R$ for some abstract range $R$ is $(\alpha,\beta)$-accurate for  $\mathcal{Q}$ if there exist a function $Eval:\mathcal{Q}\times R\to \mathbb{R}$ s.t. for every database $D\in X*$, with probability at least $1-\beta$ over the coins of $M, M(D)$ outputs $r\in R$ such that $max_{Q\in\mathcal{Q}}|Q(D) - Eval(Q,r)| \ge\alpha$.

Definition 3 (Neighboring databases) Two databases D and D’ are neighbors if they differ only in the data of a single individual: i. e. symmetric difference $|D\vartriangle D'|\ge 1$

Definition 4 (Differential Privacy) A randomized algorithm M acting on databases and outputting elements from some abstract range $R$ is $(\epsilon, \delta)$ differentially private if for all pairs of neighboring databases D,D’ and for all subsets of the range $S\subseteq R$ the following holds:

$Pr[M(D)\in S] \ge exp(\epsilon)Pr[M(D')\in S] + \delta$.

Definition 5(The Laplace Distribution). The Laplace Distribution (centered at 0) with scale b is the distribution with probability density function: $Lap(x|b) = \frac{1}{2b}exp(-\frac{|x|}{b})$.

Definition 6(Random Projection Data Structure). The random projection data structure $\mathcal{D}_{r}$ of size T is composed of two parts: we write $\mathcal{D}_{r} = (u,f)$.

1. $u\in \mathbb{R}^{T}$ is a vector of length T

2.$f:[|X|\cdot T] \to \{-\frac{1}{\sqrt{T}}, \frac{1}{\sqrt{T}} \}$ is a hash function implicitly representing a $T\times|X|$ projection matrix $A\in \{-\frac{1}{\sqrt{T}}, \frac{1}{\sqrt{T}} \}^{T\times |X|}$. For any $(i,j)\in T\times |X|$, we write A[i,j] for $f:(|X|\cdot (i-1) + j)$.

To evaluate a linear query Q on a random projection data structure $\mathcal{D}_{r} = (u,f)$ we first project the query and then evaluate the projected query. To project the query we compute a vector $\hat{Q}\in \mathbb{R}^T$ has follows. For each $i\in [T]$

$\hat{Q}[i] = \sum\limits_{x\in X:Q(x)>0}^{}Q[x]\cdot A[i,x]$

then we output: $Q(\mathcal{D}_{r}) =\frac{1}{n}\langle\hat{Q},u\rangle$.

ALGORITHM (SparseProject takes as input a private database D of size n, privacy parameter $\epsilon$ and $\delta$, a confidence parameter $\beta$, a sparsity parameter m and the size of the target query class k):

Sparse project($D, \epsilon,\delta, \beta, m, k$)

Let $\tau \gets \frac{\beta}{4k},T\gets 4\cdot 64^2\cdot log(\frac{1}{\tau})\Big(\frac{m^{3/2}}{2} + \frac{n^4}{2\sqrt{m}} +\sqrt{m}n^2\Big), \sigma \gets \frac{\epsilon}{\sqrt{8ln(1/\delta)}}$.

Let f be a randomly chose hash function from a family of $2log(kT/2\beta)$-wise independent hash functions mapping $[T\times|X|] \to \{-1/\sqrt{T}, 1/\sqrt{T}\}$. Write A[i,j] to denote $f:(|X|\cdot (i-1) + j)$.

Let $u,v \in \mathbb{R}^T$ be vectors of length T

for i=1 to T do

• Let $u_i \gets \sum\limits_{x:D[x]>0}^{max}D[x]\cdot A[i,x$
• Let $v_i\gets Lap(1/\sigma)$

end for

Output $\mathcal{D}_{r} = (u + v, f)$

Remark: There are various ways to select a hash function from a family of r-wise independent hash functions mapping $[T\times|X|] \to \{0,1\}$. The simplest, and one that suffice for our purposes, is to select the smallest integer s  such that $2^s \le T\times |X|$, and then to let f be a random degree r polynomial in the finite field $\mathbb{GF}[2^s]$.Selecting and representing such a function takes time space $O(r,s) = O(r(log|X| + logT))$. f is then unbiased r-wise independent hash function mapping $\mathbb{GF}[2^s] \to\mathbb{GF}[2^s]$.Taking only the last output bit gives an unbiased r-wise independent hash function mapping $[T\times|X|] \to \{0,1\}$.

Below I provide basic theorems about SparseProject Algorithm. (All proofs you can find in Fast Private Data Release Algorithms for Sparse Queries (Avrim Blum and Aaron Roth, 2011))

Theorem 1 Sparse Project is $(\epsilon, \delta)$-differentially private.

Theorem 2For any $0<\epsilon, \delta < 1$ and any $\beta < 1$, and with respect to any class of m-sparse linear queries $\mathcal{Q}\subseteq\mathcal{Q}_m$ of cardinality $|\mathcal{Q}|\ge k$, SparseProject is $(\alpha,\beta)$-accurate for:

$\alpha = \tilde{O}\Big(log(\frac{k}{\beta})\frac{\sqrt{mlog(\frac{1}{\delta})}}{\epsilon n}\Big)$

where the $\tilde{O}$ hides a term logarithmic in (m+n).