Archive by Author

An Unweighted Vertex Cover Algorithm: An Alternative View(Hallucinated edges) (Salar Moarref)

28 Nov

Remember that in class we talked about the Vertex Cover problem in the framework of differential privacy. In original form of the problem, we want to pick a set S of vertices of minimal size so that every edge in the graph is incident to at least one vertex in S. In the privacy-preserving version of the problem, the private information we wish to conceal is the presence of absence of each edge. We talked about a randomized algorithm which outputs a permutation of all vertices of the graph, and the vertex cover will be defined by picking, for each edge, whichever of its endpoints appears first in the permutation. We saw that this vertex cover is (2+O(1/\epsilon))-approximate and \epsilon-differentially private. The algorithm was based on a simple non-private 2-approximation to vertex cover [2] that repeatedly selects an uncovered edge uniformly at random, and includes a random end-point of the edge which can be viewed as selecting a vertex at random with probability proportional to its uncovered degree. We took this formulation and mix in a uniform distribution over the vertices, using a weight that will grow as the number of remaining vertices decreases:

Algorithm 1: Unweighted Vertex Cover [1]
let n \leftarrow |V|, V_1 \leftarrow V, E_1 \leftarrow E
for i = 1,2,..,n
let w_i \leftarrow (4/\epsilon) \times \sqrt{n/(n-i+1)}
pick a vertex v \in V_i with probability proportional to d_{E_i}(v)+w_i
output v. let V_{i+1} \leftarrow V_i \backslash \{v\}, E_{i+1} \leftarrow E_i \backslash (\{v\} \times V_i)
end for

As stated the algorithm outputs a sequence of vertices, one per iteration. As remarked above, this permutation defines a vertex cover  by picking the earlier occurring end point of each edge.

There is a slightly different way to implement the intuition behind the above algorithm: imagine adding O(1/\epsilon) “hallucinated” edges to each vertex (the other endpoints of these hallucinated edges being fresh “hallucinated” vertices), and then sampling vertices without replacement proportional to these altered degrees. However, once (say) n/2 vetices have been sampled, output the remaining vertices in random order. More specifically, given a graph G=(V,E), we mimic the randomized proportional-to-degree algorithm for \alpha n rounds where \alpha < 1, and output the remaining vertices in random order. That is, in each of the first  \alpha n rounds, we select the next vertex i with probability proportional to d(i)+1/\epsilon: this is equivalent to imagining that each vertex has 1/\epsilon “hallucinated” edges in addition to its real edges. When we select a vertex, we remove it from the graph, together with the real and hallucinated edges adjacent to it. This is equivalent to picking a random (real or hallucinated) edge from the graph, and outputting a random real endpoint. Outputting a vertex affects the real edges in the remaining graph, but does not change the hallucinated edges incident to other vertices.

The privacy analysis is similar to that of theorem 5.1 of [1]: if we assume weights being w_i = 1/\epsilon for the first \alpha n rounds and w_i=\infty for the remaining rounds, we will have 2\sum_{i=(1-\alpha)n}^{n}\frac{1}{iw_i} \leq \epsilon(\frac{2\alpha}{1-\alpha})-differential privacy.

To analyze the utility, we couple the algorithm with a run of the 2-approximation non-private algorithm A that at each step picks an arbitrary edge of the graph and then picks a random endpoint. We refer to vertices that have non-zero “real” degree at the time they are selected by our algorithm as interesting vertices: the cost of our algorithm is simply the number of interesting vertices it selects in the
course of its run. Let I_1 denote the number of interesting vertices it selects during the first \alpha n steps, and I_2 denote the number of interesting vertices it selects during its remaining (1 - \alpha)n steps, when it is simply ordering vertices randomly. Clearly, the total cost is I_1 + I_2. We may view the first phase of our algorithm as selecting an edge at random (from among both real and hallucinated ones) and then outputting one of its endpoints at random. Now, for the rounds in which our
algorithm selects a real edge, we can couple this selection with one step of an imagined run of A (selecting the same edge and endpoint). Note that this run of A maintains a vertex cover that is a subset of our vertex cover, and that once our algorithm has completed a vertex cover, no interesting vertices remain. Therefore, while our algorithm continues to incur cost, A has not yet found a vertex cover. In the first phase of our algorithm, every interesting vertex our algorithm selects has at least one real edge adjacent to it, as well as 1/\epsilon hallucinated edges. Conditioned on selecting an interesting vertex, our algorithm had selected a real edge with probability at least \epsilon' = 1/(1+1/\epsilon). Let R denote the random variable that represents the number of steps A is run for. E[R] \leq 2OPT since A is a 2-approximation algorithm. By linearity of expectation:

2OPT \geq E[R] \geq \epsilon' . E[I_1]

Gupta et. al in [1] show that most of the algorithm’s cost comes from the first phase and hence that I_2 is not much larger than I_1:

E[I_1] \geq \ln(\frac{1}{1-\alpha}) . E[I_2]

Combining these facts, we get that:

\frac{E[cost]}{OPT} \leq \frac{2}{\epsilon'}(1+\frac{1}{\ln((1-\alpha)^-1)})

[1] A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. Nov 2009

[2] L. Pitt. A simple probabilistic approximation algorithm for vertex cover. Technical report, Yale University, 1985

Advertisements