Archive by Author

Fuzz (Arjun Narayan)

22 Nov

Fuzz is a programming language and runtime for running differentially
private queries. This post is in two parts: the first deals with the
Fuzz programming language and how it certifies programs as
differentially private through the use of a novel statically checked
type system, and the second with a runtime that takes care of some of
the real world problems that arise when we try and achieve differential privacy in practice.

“[A type system is a] tractable syntactic method for proving the
absence of certain program behaviors by classifying phrases according
to the kinds of values they compute” [3]. Importantly, a sound type
system provides a mathematical proof of the absence of those program
behaviors. The type systems most programmers are familiar with are
fairly conservative — for example, the Java programming language has
a type system that certifies (among other things) that the return
value of a method is guaranteed to be a subtype of the reference that
it is assigned to. We can also have stronger type systems that check
more complex properties, but may run into issues such as the
properties being computationally hard or even undecidable.

Fuzz is a functional programming language with a static type
system. The high level goal is to have the type system check if a
given program is differentially private. This type system captures a
notion of function sensitivity — “A function is said to be
c-sensitive if it can magnify distances between inputs by a factor of
at most c” [1]. Thus a Fuzz program can get an upper bound on the
sensitivity of a given program from its input database. Once we know
the sensitivity of the function, we can use the Laplace mechanism to
add a sufficient amount of noise so that the output of the program is
differentially private.

In the general case, calculating the sensitivity of an arbitrary
program is non-trivial, so the Fuzz language employs a very restricted
set of primitives and operators for which static checking is
tractable. The sensitivities of these primitives are proven in [1],
and the type system is proved to be sound. This means that the when a
type checker admits a program as having at most sensitivity c, we have
a _proof_ of this property [4]. Importantly, these primitives are
compositional — if f(x) is 2-sensitive and g(y) is 3-sensitive, we
can say that g(f(x)) is 6-sensitive, which makes type checking
tractable.

Given a type system for Fuzz and a working type checker, we thus have
a system that accepts programs and certifies them as differentially
private. Given that the type checker has to be decidable, we only
admit a subset of all differentially private programs as well-typed
in Fuzz. This is a tradeoff that has to be made in all type systems
— for example, observe that the java program

if(complex expression that always evaluates to true)
return 1;
else return false;

is not well typed — but if run will always return an integer — so
we need not reject it. However, the type system cannot always perform
the sort of computation to prove that the complex expression in the
if-clause always takes the then-branch and never takes the
else-branch, so it conservatively rejects the program as
ill-typed. Similarly, the Fuzz type system is unable to certify all
differentially private programs as such.

Thus we have a few primitives in Fuzz who’s sensitivities are
verified, whic enables the programmer to construct larger programs
that the static type checker can verify as differentially private. In
practice Fuzz is in fact expressive enough to be practically useful,
and there are several examples given in [2].

Thus we have a programming language and a type system that compiles
programs that are differentially private. However, our programming
language lives in a restricted functional world with no side effects,
and unfortunately this translates poorly to the real world where we
can write programs in the vein of:

if victim.reallysecretproperty == true
then {calculate the value of pi to a billion digits; return 1}
else {return 1}

In terms of the output, the above code has sensitivity zero. But to an
adversary the secret bit of information is leaked by simply observing
how long the program takes to run. We cannot hope to statically
capture this property easily — we would have to ensure that every
program takes _exactly_ as long to run regardless of code branches,
and then we would still have to deal with memory effects, the fact
that different rows of the database occupy different cache lines (and
thus have different cache-eviction behavior that might be visible to
the adversary) and deal with programs that do not halt.

The Fuzz runtime employs a very restricted runtime environment to
prevent the leakage of information through these
side-channels. Remember, we cannot simply mitigate the flow — even a
single bit of leakage violates our differential privacy
guarantees. The usefulness of the type system and its guarantees now
becomes clear: because a type system performs its analysis on the
_structures_ of the database and the query without looking at the
contents of the database, any conclusions it reaches and reveals to
the user is always differentially private. Thus we avoid the side
channel concerns of dynamically typed systems: if our analyzer needs
to look at the contents of the database, certifying that any outputs
and side effects are differentially private requires a more complex
proof.

However we are still left to close the gap between side-effect free
functional world and the real world runtime environment. The three
main side effects are a privacy budget attack, state attacks and
timing attacks. The privacy budget attack takes the form:

If victim.reallysecretproperty == true
then {do something that uses up the entire privacy budget}
else {do nothing}

Thus by simply observing the privacy budget after executing the query,
we reveal a bit of secret information. This is fixed by disallowing
nested queries, which we can statically check at the compiler
level. The state attack is solved by not having any global variables,
given that we have a functional programming language.

The timing channel is the hardest to close — to do so, the Fuzz
runtime employs a high level strategy of running primitives that run
through the entire row of a database into “microqueries” — each
microquery only looks at a single row of the database — and the
results are then aggregated together. Microqueries are then run using
a new primitive called “predictable transactions” which take an exact
amount of time to run, down to the microsecond level, regardless of
their output value. Predictable transactions are implemented at the
runtime level: the runtime takes a default value and a timeout, and
executes the code for the microquery while timing the computation. If
the timeout is reached before computation is finished, the entire
computation is “rolled back” and the default value is returned. If
computation finishes early, the microquery is paused until the timeout
expires. Thus, the microquery always takes exactly the amount of time
specified in the timeout. This is transparent to the program: it never
knows if the default value was returned or the value was calculated by
the microquery. In the worst case where the adversary attempts to
deliberately engineer the return of the default value in the presence
of a single row, we still prevent all side channels as the “output
channel” is covered by the Laplace mechanism. The timing channel is
thus rendered non-informative.

There are additional details to make the runtime perform the rollback
step in a constant amount of time — this has to happen regardless of
what the program does within the microquery — no matter how much
memory it allocates, or what system call it attempts to trigger that
might stall us and make us miss our deadline. The important takeaway
is that the timeout deadline is a _hard_ deadline — the runtime must
be able to abort anything the adversary might do and return the
default value without delaying even a few microseconds. Thus, this
requires tight control of the garbage collector, the memory allocation
code, etc.

In conclusion, [1] demonstrates a novel type system that allows us to
automatically certify some programs as differentially private without
requiring an explicit proof with each new algorithm. [2] implements a
runtime that takes care of the real-world problems that arise when we
try and implement a programming language runtime that actually has to
seal off all the adversarial situations that could arise to break the
differential privacy guarantees when the code we are running is
untrusted.

[1] Jason Reed, Benjamin Pierce — Distance Makes the Types Grow
Stronger, ICFP 2010

[2] Andreas Haeberlen, Benjamin Pierce, Arjun Narayan — Differential
Privacy Under Fire, Usenix Security 2011

[3] Benjamin C. Pierce — Types and Programming Languages, MIT Press 2002

[4] This is not strictly true, as our implementation of the type
checker is not proven bug-free. An interesting work is [5] which
proves the differential privacy properties in Coq — an interactive
theorem prover — and builds a program checker based on Hoare Logic
in Coq and proves that the programs it admits are differentially
private. If we implemented the Fuzz type checker in Coq and proved it
correct we would be closer to a proof [6].

[5] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella
Béguelin — Probabilistic Relational Reasoning for Differential Privacy, POPL 2012

[6] Conditioned on the Coq theorem prover being bugfree [7]

[7] And assuming ZF is consistent…[8]

[8] http://xkcd.com/816/