On the lecture we saw the Sparse Multiplicative Weights (SMW) IDC Algorithm for releasing accurate answers to *m*-sparse queries while preserving differential privacy in the interactive setting (the data curator acts as an intermediary and must answer an adaptively chosen stream of queries as they arrive). Here I am going to describe another algorithm (proposed by Avrim Blum and Aaron Roth) for releasing accurate answers to *m*-sparse queries that works in the non-interactive setting, in which the data estimator must in one shot output a data-structure which encodes the answers to every query on interest. In this algorithm the entire computation must be performed in time polynomial in *n*, and the time required to evaluate any query on the output data structure must also be polynomial.

The non-interactive mechanism releases a random projection of the database into polynomially many dimensions, together with the corresponding projection matrix.Queries are evaluated by computing their projection using the public projection matrix , and then taking the inner product of the projected query and the projected database. The difficulty comes because of the projection matrix projects vectors from *|X|*-dimensional space to *poly(n) *dimensional space, and so normally would take *|X|poly(n) *bits to represent. The algorithm are constrained to run in time *poly(n)*, however, and so we need a concise representation of the projection matrix. This is achieved by using a matrix implicitly generated by a family of limited-independence hash functions which have concise representations. This requires using a limited independence version of the Johnson-Lindenstrauss lemma, and of concentration bounds.

Before present the algorithm let me recall some basic concepts. A database *D *is a multiset of elements from some (possibly) infinite abstract universe *X*.We write *|D| = n* to denote the cardinality of *D*.For any we can also write D[x] to denote: the number of elements of type x in the database. Viewed this way, a database is a vector with integer entries in the range *[0,n]. *A linear query is a function mapping elements in the universe to values on the real unit interval.The value of a linear query on a database is simply the average value of on elements of the database:

Similarly to how we think of a database as a vector, we can think of a query as a vector with Q[x]. Viewed this way .

**Definition 1** (Sparsity) The sparsity of a linear query Q is , the number of elements in the universe on which it takes a non-zero value.We say that a query is *m*-sparse if its sparsity is at most *m*.We will assume that given an m-sparse query we ca quickly (in the polynomial in m) enumerate the elements on which .

**Definition 2** (Accuracy for non-Interactive Mechanism). Let be a set of queries. A non-interactive mechanism for some abstract range is *-accurate for ** *if there exist a function s.t. for every database , with probability at least over the coins of outputs such that .

**Definition 3** (Neighboring databases) Two databases D and D’ are neighbors if they differ only in the data of a single individual: i. e. symmetric difference

**Definition 4** (Differential Privacy) A randomized algorithm M acting on databases and outputting elements from some abstract range is differentially private if for all pairs of neighboring databases D,D’ and for all subsets of the range the following holds:

.

**Definition 5**(The Laplace Distribution). The Laplace Distribution (centered at 0) with scale *b* is the distribution with probability density function: .

**Definition 6**(Random Projection Data Structure). The random projection data structure of size T is composed of two parts: we write .

1. is a vector of length T

2. is a hash function implicitly representing a projection matrix . For any , we write A[i,j] for .

To evaluate a linear query Q on a random projection data structure we first project the query and then evaluate the projected query. To project the query we compute a vector has follows. For each

then we output: .

**ALGORITHM (**SparseProject takes as input a private database D of size *n, *privacy parameter and , a confidence parameter , a sparsity parameter *m *and the size of the target query class *k***):**

**Sparse project**()

**Let** .

**Let **f be a randomly chose hash function from a family of -wise independent hash functions mapping . Write A[i,j] to denote .

**Let ** be vectors of length T

**for** *i=1 *to T **do**

**Let****Let**

**end for**

**Output**

**Remark**: There are various ways to select a hash function from a family of *r*-wise independent hash functions mapping . The simplest, and one that suffice for our purposes, is to select the smallest integer *s* such that , and then to let f be a random degree *r* polynomial in the finite field .Selecting and representing such a function takes time space . f is then unbiased *r-*wise independent hash function mapping .Taking only the last output bit gives an unbiased *r-*wise independent hash function mapping .

Below I provide basic theorems about SparseProject Algorithm. (All proofs you can find in *Fast Private Data Release Algorithms for Sparse Queries* (Avrim Blum and Aaron Roth, 2011))

**Theorem 1*** Sparse Project is -differentially private.*

**Theorem 2***For any and any , and with respect to any class of m-sparse linear queries of cardinality , SparseProject is -accurate for:*

* *

*where the hides a term logarithmic in (m+n).*

## Leave a Reply