Expanders, Property Testing and the PCP theorem

Hardness of approximation — Lecture 3 & 4

Posted in lectures by HQN on March 27, 2009

We proved

Theorem (Hastad 1997)

NP = PCP_{1-\epsilon, 1/2+\delta}[O(\log n), 3] for any given \epsilon, \delta > 0.

The outline of the proof is as follows. (This exact outline will be used at least one more time, starting from a slightly different version of LabelCover.)

  1. We start from the NP-hard problem Gap-Max-LabelCover_\Sigma(1,\tau), and design a 3-bit PCP verifier for it (with logarithmic randomness).
  2. The verifier expects labels to be encoded with the (binary) long code, which is a map LC: \Sigma \to \{0,1\}^{2^m}, where |\Sigma|=m. Each 01-vector of length 2^m can be viewed as the truth table of a function f: \{0,1\}^m \to \{0,1\}. Thus, the long code LC(a) of a symbol a is also one such function. In particular, it is the dictator function LC(a)(x_1,\dots,x_m) = x_a. Then, the verifier chooses an edge of the graph at random, pick 3 bits from the (supposed) long codes of the two labels and perform a simple linear test on those bits.
  3. The completeness of the verifier is straightforward.
  4. For soundness, we prove the contra-positive, meaning if the test passes with high probability, then there’s a labelling satisfying > \tau-fraction of the edges of the LabelCover instance. To show that such a labelling exists, we use the probabilistic method to choose a random labelling based on the Fourier coefficients of the functions representing (and perhaps pretending to be) long codes.

Bellare-Goldreich-Sudan introduced the long code. This is an excellent expository paper on many ideas we have and will discuss. For Fourier analysis of boolean functions, O’Donnell’s tutorial at STOC is a good starting point.

Hardness of approximation — Lecture 2

Posted in lectures by HQN on March 27, 2009

Actually part of the blog post on Lecture 1 was presented in Lecture 2. The main theme of lecture 2 was the following:

  • We showed that the PCP theorem is equivalent to the NP-hardness of several gap problems, Gap-Max-E3SAT and Gap-LabelCover in particular. The last post has shown that Gap-Max-E3SAT is NP-hard. To show that Gap-Max-LabelCover(1,\rho) is NP-hard for some constant $\rho$ is not difficult: put all variables on the left, clauses on the right, connect a variable and a clause if the variable belongs to the clause, labels for clauses are 001,010, …, 111 corresponding to combinations of literals which satisfies the clause; labels for variables are 001 or 010 which “stand for” TRUE or FALSE; finally, the constraint on an edge “projects” the clause’s label to the literal’s truth assignment.
  • The above reduction yields bipartite graphs which are 7-regular on the right, but may not be regular on the left, since each variable can appear in an arbitrary number of clauses. For our purposes, we also want left-regular bipartite instances, which can easily be done by reducing from Gap-Max-E3SAT(d). Check Luca Trevisan’s survey for a proof that Gap-Max-E3SAT(d) is NP-hard for some constant d. (Vazirani’s book also contains a proof with d=29, I think.) The proof involves a very nice (but standard) application of expanders.
  • A natural PCP verifier for the Gap-LabelCover problem can be viewed as a verifier of a 2-player 1-round game (2P1R)
  • Then, Raz’s Parallel Repetition theorem can be applied to exponentially reduce the soundness of the PCP verifier for Gap-LabelCover. Since this result will be used to construct Hastad’s 3-bit PCP, we formally state it here.

Theorem (Raz’s Gap-LabelCover):

Given any \tau > 0, there exists an alphabet \Sigma with size |\Sigma| = poly(1/\tau) for which Gap-LabelCover(1,\tau) is NP-hard. Moreover, bipartite graph instances of this Gap-LabelCover problem can be assumed to be d_1-regular on the left and d_2-regular on the right where d_1,d_2 are constants. Furthermore, the constraint h_{u,v} for every edge (u,v) of the graph satisfies the projection property, i.e. it checks if L(u) = h_{u,v}(L(v)), where L(u), L(v) are the labels for u, v respectively.

Hardness of approximation – Lecture 1

Posted in lectures, spr09 by HQN on March 16, 2009

The second half of this semester is devoted to proving hardness of approximation. For example, we will show that it is {\mathop{\mathbf{NP}}}-hard to approximate MAX-3SAT to within any constant better than {7/8} (of the opimal). In their FOCS 97 paper, Karloff and Zwick have shown us how to use SPD to design a {7/8}-approximation algorithm. Thus the above hardness result is essentially optimal.

I am typing this lecture to test Luca Trevisan’s latex2wp converter (thanks, Luca!). I probably won’t have the time to type lectures any more this semester. Here’s a brief outline of what I will be talking about in the next 7 lectures. I hope I can finish them on time:

Lecture 1: gap-producing reduction from PCP.

  • How do we show that an optimization problem is {\mathop{\mathbf{NP}}}-hard to approximate to within some ratio? Answer: design a gap-producing reduction from an {\mathop{\mathbf{NP}}}-hard problem, which is equivalent to showing that the corresponding gap-version of the problem is {\mathop{\mathbf{NP}}}-hard.
  • How do we design such a gap-producing reduction? There are two basic strategies:
    • Start from a problem which already has a gap, i.e. an {\mathop{\mathbf{NP}}}-hard gap-version of some problem. Then, the reduction has to be “gap-preserving” somehow. We will not discuss this strategy in Lecuture 1. We will see many more examples along this line later.
    • Use the PCP theorem. In particular, use the PCP verifier for some/any {\mathop{\mathbf{NP}}}-complete problem as a subroutine in the gap-producing reduction. I have already given one example of this last semester. I will re-state the example again below. The FGLSS reduction will be the main example this time.

Lectures 2 + 3 gap-amplification.

  • The “reduction from PCP” strategy may not produce very good gap. To prove strong hardness results, we need to “amplify” the gap.
  • There are several ways of doing gap-amplifiction:
    • Repeat the verifier independently many times (at the expense of query and random bits)
    • Use expanders! (still too many query bits)
    • Use parallel-repetition and then alphabet reduction (somehow). We will discuss Hastad’s 3-bit PCP, its analysis, and some consequences.

Lectures 4 + 5 unique games conjecture (UGC)

  • UGC is a conjecture regarding the {\mathop{\mathbf{NP}}}-hardness of a certain gap problem. Using it, we can design nice gap-producing reduction.
  • There’ll be quite a bit of Fourier analysis of boolean functions. Majority is stablest theorem. Hardness of approximating MAX-CUT.

Lectures 6 + 7 gap-preserving reductions + time filler.

1. How do we show that a problem is {\mathop{\mathbf{NP}}}-hard to approximate to within a certain ratio {\rho}?

To be concrete, take MAX-3SAT as an example. The general strategy is:

  • start from {\mathop{\mathbf{NP}}}-complete problem {\Pi}
  • let {\mathop{\mathsf{opt}}(I)} denote the optimal cost of an instance {I} of MAX-3SAT; design a polynomial-time (Karp/Cook) reduction {f: \Pi \rightarrow} MAX-3SAT such that, given any input {x} to {\Pi},
    • if {x} is a YES-instance of {\Pi}, then {\mathop{\mathsf{opt}}(f(x)) \geq g(|f(x)|)} for some function {g}
    • if {x} is a NO-instance of {\Pi}, then {\mathop{\mathsf{opt}}(f(x)) < \rho \cdot g(|f(x)|)}

Such a reduction is called a gap-producing reduction. A typical {\mathop{\mathbf{NP}}}-hardness is too weak to produce any “good” gap (for example, with {\rho=7/8} for MAX-3SAT). Here, we use {|y|} to denote the length of an input {y} to the problem at hand (MAX-3SAT in this case).

Let {c,s: {\mathbb N} \rightarrow {\mathbb R}^+} be any two functions. Let Gap-MAX-3SAT{(c,s)} be the (decision) problem of distinguishing between

  • instances {\varphi} of MAX-3SAT for which {\mathop{\mathsf{opt}}(\varphi) \geq c(|\varphi|)}, and
  • instances {\varphi} of MAX-3SAT for which {\mathop{\mathsf{opt}}(\varphi) < s(|\varphi|)}

Proposition 1 The existence of a reduction as described above is equivalent to the fact that Gap-MAX-3SAT{(g,\rho\cdot g)} is {\mathop{\mathbf{NP}}}-hard.

Proposition 2 If Gap-MAX-3SAT{(c,s)} is {\mathop{\mathbf{NP}}}-hard then MAX-3SAT is {\mathop{\mathbf{NP}}}-hard to approximate to within {s/c}.

Proof: Suppose there is an approximation algorithm {A} with ratio {s/c}; namely, for any input {\varphi}, we always have {A(\varphi) \geq (s/c) \cdot \mathop{\mathsf{opt}}(\varphi)}. (Here, {A(\varphi)} be the number of satisfied clauses returned by {A}.)

If {\mathop{\mathsf{opt}}(\varphi) \geq c}, then certainly {A(\varphi) \geq s}. If {\mathop{\mathsf{opt}}(\varphi) < s}, then {A(\varphi) \leq \mathop{\mathsf{opt}}(\varphi) < s}. Thus, we can use {A} do decide in polynomial time if {\varphi} is a YES- or a NO-instsance of the gap problem, a contradiction to the fact that it is {\mathop{\mathbf{NP}}}-hard. \Box

Certainly, the above line of reasoning is not limited to MAX-3SAT. We could have replace MAX-3SAT by MAX-{\Pi} for any problem {\Pi}, and Gap-MAX-3SAT by Gap-MAX-{\Pi}. It is also convinient to normalize the objective function of {\Pi} so that the cost is between {0} and {1}, so that {0 < s < c \leq 1}. For example, for MAX-3SAT we can define the objective function to be the fraction of satisfiable clauses of an input formula {\varphi}. Last but not least, the same line of reasoning works for MIN-{\Pi} and Gap-Min-{\Pi} too! I’ll leave the technical details to you.

2. How do we design a gap-producing reduction?

Equivalently, how to we prove that a gap-problem is {\mathop{\mathbf{NP}}}-hard? As we have mentioned, the typical {\mathop{\mathbf{NP}}}-hardness reduction is — in most cases — too weak for this purpose. Fortunately, the PCP theorem gives us precisely one such reduction. Moreover, this PCP “technology” is sufficiently strong that it can be used to design many gap-producing reductions based on it.

Note that, it is somewhat misleading to talk about the PCP theorem. There are many PCP theorems, each with different parameters. Different PCP theorems give us different starting points for designing gap-producing reductions. When people say the PCP theorem, they mean the following theorem:

Theorem 3 (The PCP Theorem) {\mathop{\mathbf{NP}} = \mathop{\mathbf{PCP}}[O(\log n), O(1)]}

We will prove other PCP theorems in the next few weeks. To illustrate the PCP “technology”, we first show that it is actually equivalent to the hardness of some gap problem.

Theorem 4 The PCP theorem is equivalent to the fact that, there is some constant {\rho<1} for which Gap-MAX-E3SAT{(1,\rho)} is {\mathop{\mathbf{NP}}}-hard.

Proof: Let’s assume the PCP theorem first. We will produce a reduction from an {\mathop{\mathbf{NP}}}-complete language {L} to Gap-MAX-E3SAT{(1,\rho)}. More concretely, consider any {\mathop{\mathbf{NP}}}-complete language {L}. The reduction works by constructing in polynomial time an E3-CNF formula {\varphi_x} with {m} clauses, given an input {x}. The construction satisfies the following properties, for some constant {\rho<1}:

\displaystyle  \begin{array}{rcl}  x \in L & \Longrightarrow & \mathop{\mathsf{opt}}(\varphi_x) = 1 \\ x \notin L & \Longrightarrow & \mathop{\mathsf{opt}}(\varphi_x) < \rho.  \end{array}

By the PCP theorem, there is some {(r, q)}-restricted verifier {V} recognizing {L}, where {r = O(\log n)} and {q} is a fixed constant. We will use {V} to construct {\varphi_x} for each input string {x}. In other words, {V} is a sub-routine in the gap-producing reduction we are designing.

Note that, when {V} is adaptive the length of the proof does not need to be more than {2^r2^q}. When {V} is non-adaptive, the proof’s length does not need to be more than {q2^r}. In both cases, {V} only needs polynomial-size proofs. Let {p=2^{r+q} \geq q2^r} be the upperbound on proof sizes.

Construct {\varphi_x} as follows. Create {p} variables {x_1, \dots, x_p}, so that each truth assignment to these variables corresponds to a proof presented to {V}. For each random string {R} of length {r}, there are some combinations of the answers to {V}‘s queries that make {V} accept. We can model this fact by a CNF formula {\psi_R} on {\{x_1,\dots,x_p\}} such that {\psi_R(\mathbf x) = {\tt TRUE}} iff {V} accepts the proof {\mathbf x}. The formula {\psi_R} can be constructed in polynomial time by simulating {V} on the random string {R} and generating all possible combinations of answers. Since {q} is a constant, there are only constantly ({2^q}) many answer combinations. By adding a few auxiliary variables, we can convert {\psi_R} into E3-CNF form. Originally {\psi_R} has {\leq 2^q} clauses. Each clause gives rise to at most {q} size-{3} clauses. Hence, after the E3-CNF conversion {\psi_R} has at most {q2^q} clauses.

Finally, let {\varphi_x = \bigwedge_{R} \psi_R}, then {\varphi_x} itself can be constructed in polynomial time since there are only polynomially many random strings {R}. (This is why the randomness of {O(\log n)} is crucial!) Let {m} be the total number of {3}-CNF clauses of {\varphi_x}, then {m \leq r(|x|)q2^q = O(\log n)q2^q}.

  • When {x\in L}, there is a proof {\pi} (a truth assignment) such that {V} always accepts. Hence, under this assignment {\varphi_x} is satisfiable.
  • When {x \notin L}, set {\pi_i = x_i} for all {i} and feed {\pi} as a proof to {V}. In this case, {V} only accepts with probability {< 1/2}. Hence, at least half of the {\psi_R} are not satisfiable by any truth assignment. For each {\psi_R} that is not satisfied, there is at least one clause that is not satisfied. The number of non-satisfied clauses is thus at least {\frac 1 2 r(|x|)}. Consequently, setting {\rho = (1-\frac{1}{2q2^q})} we have

    \displaystyle  \mathop{\mathsf{opt}}(\varphi_x) < \frac 1 m \left(m - \frac 1 2 r(|x|)\right) \rho m.

Conversely, assume Gap-MAX-E3SAT{(1,\rho)} is {\mathop{\mathbf{NP}}}-hard for some constant {\rho<1}. Let us prove the PCP theorem. The fact that {\mathop{\mathbf{PCP}}[O(\log n), O(1)] \subseteq \mathop{\mathbf{NP}}} is easy. We show {\mathop{\mathbf{NP}} \subseteq \mathop{\mathbf{PCP}}[O(\log n), O(1)]} by designing an {(r,q)}-verifier {V} for some {\mathop{\mathbf{NP}}}-complete language {L}, with {r=O(\log n)} and {q=O(1)}.

Since Gap-MAX-E3SAT{(1,\rho)} is {\mathop{\mathbf{NP}}}-hard, there’s a poly-time reduction from {L} to Gap-MAX-E3SAT{(1,\rho)}. Consider any input string {x}. Use the assumed reduction to construct {\varphi_x}. The strategy for {V} is to pick a constant number {k} of clauses of {\varphi_x} at random, ask the prover for the values of (at most {3k}) variables in these clauses, and accept iff all the clauses are satisfied. Clearly {V} has perfect completeness. When {x\notin L}, at most {\rho m} clauses are satisfied. Hence, the probability that {V} accepts is at most

\displaystyle  \frac{\binom{\rho m}{k}}{\binom{m}{k}} = \frac{(\rho m)(\rho m-1)\dots(\rho m-k+1)} {m(m-1)\dots (m-k+1)} < \rho^k \leq 1/2

when {k \geq \ln 2/ \ln(1/\rho)}. Since {m = {\tt poly}(|x|)}, the number of random bits {V} used is {O(\lg m) = O(\lg |x|)}, and the number of query bits needed is at most {3\ln 2/\ln (1/\rho)}, which is a constant. \Box

3. Max-Clique and the FGLSS Reduction

We give another example of a gap-producing reduction using a PCP verifier as a sub-routine.

The PCP connection refers to the use of a PCP characterization of {\mathop{\mathbf{NP}}} to show hardness results for optimization problems. This connection was first noticed via a reduction from interactive proofs to Max-Clique in the pioneering work of Feige, Goldwasser, Lovász, Safra, and Szegedy. Since then, the reduction is referred to as the FGLSS reduction.

Consider an {(r,q)}-restricted verifier {V} for a language {L \in \mathop{\mathbf{PCP}}_{c,s}[q, r]}. On input {x} a transcript is a tuple {T = \langle R,Q_1,a_1,\dots,Q_q,a_q \rangle} such that {|R|=r} is a random string, the {Q_i} and {a_i} are the queries and corresponding answers that {V} made and received, in that order, given the random string. {T} is an accepting transcript if {V} accepts {x} after seeing the answers.

Two transcripts {T = \langle R,Q_1,a_1,\dots,Q_q,a_q \rangle} and {T' = \langle R',Q'_1,a'_1,\dots,Q'_q,a'_q \rangle} are consistent with each other if {Q_i=Q'_j \Rightarrow a_i=a'_j \ \forall i,j}, i.e. if for the same questions we get the same answers.

On an input {x} which {V} tries to verify whether {x \in L} or not, we will construct a graph {G_x} in polynomial time such that, for any epsilon>0,

\displaystyle  \begin{array}{rcl}  x \in L & \Rightarrow & \mathop{\mathsf{opt}}(G_x) \geq \frac{c}{2^q}|V_x| \\ x \notin L & \Rightarrow & \mathop{\mathsf{opt}}(G_x) \leq \frac{s}{2^q}|V_x|. \end{array}

Let {G_x =(V_x,E_x)}, where {V_x} represents all accepting transcripts of {V} on {x} and {E_x} consists of edges connecting consistent pairs of transcripts. It follows that {|V_x| \leq 2^{r+q}}. We can add dummy vertices so that {|V_x| = 2^{r+q}}.

Note that the first question {V} asks is deterministic, knowing {x} and {R}. Then, knowing the first answer the second question is known, etc. Thus, the questions in a transcript are in fact redundant for the encoding of transcripts. Consequently, the vertices of {G_x} with the same random string {R} form a cluster of independent vertices.

If {x \in L}, then there is some proof {\pi} such that {\mathop{\mathbf{Prob}}[V^{\pi}(x) \ {\tt accepts}] \geq c}. Consider the set of all transcripts whose answers come from {\pi}, then all these transcripts are consistent with each other. In other words, they form a clique. The fact that {\mathop{\mathbf{Prob}}[V^{\pi}(x) {\tt accepts}] \geq c}. implies that the clique size is at least {c2^r}. Hence,

\displaystyle  \mathop{\mathsf{opt}}(G_x) \geq c2^r = \frac{c}{2^q}|V_x|.

Conversely, from a clique of {G_x} of size {k}, say, we can construct a proof {\pi} for which {V^{\pi}} accepts with probability {k/2^r}. The proof is constructed by taking the union of the answers of the transcripts from the clique, adding dummy answers if they were not part of any transcript in the clique. Consequently, when {x \notin L} there cannot be a clique of size more than {s2^r}, otherwise there would be a proof {\pi} for which {V^\pi} accepts with probability more than {s}. Hence, in this case

\displaystyle  \mathop{\mathsf{opt}}(G_x) \leq s2^r = \frac{s}{2^q}|V_x|.

Remark: The FGLSS reduction runs in time {poly(|x|) \cdot poly(2^{r+q})}

Lemma 5 If {\mathop{\mathbf{NP}} \subseteq \mathop{\mathbf{PCP}}_{c,s}[r,q]}, and if {2^{r+q} = {\tt poly}(n)}, then Max-Clique is hard to approximate to within {\frac s c+\epsilon} for any \epsilon>0.

Theorem 6 It is {\mathop{\mathbf{NP}}}-hard to approximate Max-Clique to within any constant {\rho > \frac 1 2}.

Next time, we will see how to “amplify” the gap to prove stronger in-approximation results for Max-Clique.

Welcome to part II!

Posted in lectures, spr09 by atri on January 13, 2009

In today’s lecture Hung and I did a quick recap on what we covered in the last semester. On Wednesday, we will start with Dinur’s proof of the PCP theorem.

I finally put up the summaries of your talks that you sent on the blog. Sorry for the delay.

Property Testing Lecture 4

Posted in lectures by atri on November 17, 2008

In the last lecture, we finished the proof of the soundness for the natural tester for testing graph bipartiteness. The main idea in the proof was to divide up the set of random vertices S into two subsets U and W. The analysis proceeded by showing that with high probability over the choices of U and W, W forms a “witness” to the fact that no bipartition of U can be extended to a biparitiion of the input graph (which in turns implies that the input graph is not bipartite). The gain in this approach over the naive analysis is that U has constant size and hence, one can take union bound over all possible partitions of U.

In the second part of the lecture, we studied Locally Decodable Codes (LDCs). In particular a (q,\delta,\epsilon)-locally decodable code C\subseteq\Sigma^m has a local decoder with the following property. For any input \mathbf{y}\in\Sigma^m (such that \Delta(\mathbf{y},\mathbf{c})\le \delta m for a unique \mathbf{c}\in C) and i\in [m], the decoder outputs the value c_i with constant probability. We had already seen that the Hamming code is a (2,\delta,1-2\delta)-LDC for any \delta<1/4. Hamming code has the optimal block length for any 2-query LDC. We then saw a construction of a (q,\delta,1-q\delta)-LDC over \mathbb{F}_r for r\ge q+1 based on Reed-Muller codes. The first chapter in Sergey Yekhanin‘s thesis has a nice introduction to these results.

Property Testing Lecture 3

Posted in lectures by atri on November 9, 2008

On Friday, we started  talking about property testing and in particular, testing whether the input graph (represented by a vector corresponding to its adjacency matrix) is bipartite.  (The queries ask whether an edge exists between a pair of vertices in the graph.)

Before talking about the tester for graph bipartiteness, we discussed some high level connections of property testing to learning and decision tree complexity. In particular, property testing is easier than learning as it is only a decision problem (whereas a learning algorithm has to output an approximation to the function being learnt) whereas it is harder than learning as it also has to deal with inputs that are far from having the property (while learning only has to deal with functions that are from a fixed concept class). The decision tree complexity of a property is the stricter version of property testing where the tester has to determine whether the input satisfies the property or not. Graph (non)-bipartitness is a monotone graph property. The Andreaa-Karp-Rosenberg conjecture states that all monotone graph properties require any (deterministic) algorithm to probe all possible edges (this conjecture is also known as the graph evasiveness conjecture). For a quick overview of the problem and notes on a result by Kahn, Saks and Strutevant that proves the conjecture for prime number of vertices and uses topology, refer to these notes by James Lee.

We then looked at the following very natural tester for graph bipartiteness: pick a random subgraph of the input graph and accept if and only if the subgraph is bipartite. This tester obviously has perfect completeness. We saw that the naive analysis of the tester does not work well. In the next lecture, we will see a more sophisticated analysis of the soundness.

The notion of property testing was first defined in the classic paper by Oded Goldreich, Shafi Goldwasser and Dana Ron titled Property Testing and its Connection to Learning and Approximation. This paper also pioneered testing of graph properties. The tester above was first analyzed in this paper. I followed the presentation from these notes by Dana Ron. For more on property testing (especially on graph property testing) see this page by Oded Goldreich.

Property Testing Lectures 1 and 2

Posted in lectures by atri on November 9, 2008

In the first two lectures on property testing, we started with codeword testing. A codeword tester for a code C\subseteq \Sigma^m, with query complexity q, is a randomized algorithm that given an input \mathbf{y}\in\Sigma^m does the following:

  1. If \mathbf{y}\in C, then the tester accepts with probability 1,
  2. If \delta(\mathbf{y},C)\ge \epsilon then the tester rejects with probability s>0 (where s can depend on \epsilon).
  3. Makes at most q oracle queries to \mathbf{y}.

We studied the Hadamard code Had:\{0,1\}^n\rightarrow\{0,1\}^{2^n}, where every codeword corresponds to the evaluation vector of a linear function f:\{0,1\}^n\rightarrow\{0,1\} (f being linear implies that for every \mathbf{x},\mathbf{y}\in\{0,1\}^n, f(\mathbf{x})+f(\mathbf{y})=f(\mathbf{x}+\mathbf{y})). This definition suggests a natural tester for the Hadamard code: think of the input from \{0,1\}^{2^n}as a function g:\{0,1\}^n\rightarrow\{0,1\}, pick random inputs \mathbf{x},\mathbf{y}\in\{0,1\}^n and accept if and only if g(\mathbf{x})+g(\mathbf{y})=g(\mathbf{x}+\mathbf{y}). The tester makes only 3 queries and obviously has perfect completeness. We saw an analysis using Fourier analysis in the lecture that showed that this tester rejects with probability at least \delta(g,Had).

The “linearity tester” above was first analyzed in a paper titled Self-Testing/Correcting with Applications to Numerical Problems by Manuel Blum, Mike Luby and Ronitt Rubinfeld. The analysis of the soundess of the tester using Fourier analysis first appeared in the paper by Mihir Bellare, Don Coppersmith, Johan Hastad, Marcos Kiwi and Madhu Sudan titled Linearity Testing in Characteristic Two. I followed the presentation from these notes by Venkat Guruswami and Ryan O’Donnell.

Lecture 9: Zig-Zag Product

Posted in lectures by HQN on October 9, 2008

We talked about the zig-zag product today. Let G, H be an (n,m,\alpha)-spectral expander and an (m,d,\beta)-spectral expander, respectively. Let’s use \star to denote the zig-zag product. The graph G\star H is an (mn, d^2, f(\alpha,\beta))-spectral expander where f(\alpha,\beta)<1 if \alpha<1 and \beta<1. Moreover, f(\alpha,\beta) \leq \alpha+\beta+\beta^2. For better upper-bounds on f(\alpha,\beta), see the original zig-zag product paper. For a recent new construction of expanders, see this paper. Perhaps one of you can present this paper.

For each vertex u \in V(G), label all edges incident to u arbitrarily by e_u^1, \dots, e_u^m.  The product is defined as follows:

  • The graph G\star H has vertex set V = \{(u,i) \ | \ u\in V(G), i \in [m]=V(H)\}.
  • There is an edge from (u,i) to (v,j) iff there are k, l \in [m] such that ik \in E(H), e_u^k = e_v^l, and lj \in E{H}. It is not difficult to see that the product has mn vertices and degree d^2.

Let \mathbf{\hat A}, \mathbf{\hat B} be the normalized adjacency matrices of G, H, respectively. Define an mn \times mn matrix \mathbf P where p_{(u,k),(v,l)} = 1 iff e_u^k = e_v^l. Then, \mathbf P is a symmetric permutation matrix. Now, let \mathbf{\tilde B} = \mathbf I_n \otimes \mathbf{\hat B} (where \otimes denote the tensor product of two matrices). Then, it follows that \mathbf M = \mathbf{\tilde B P \tilde B} is the normalized adjacency matrix of the product graph G\star H. So,

f(\alpha,\beta) = \max_{\mathbf{x \perp u}} \frac{|\langle \mathbf{\tilde B P \tilde B x, x} \rangle|}{\langle x, x \rangle}

Fix a vector \mathbf{x \perp u} \in \mathbb R^{mn}. Define a “collection vector” C(\mathbf x) \in \mathbb R^n as follows: C(\mathbf x)_u = \frac 1 m \sum_{i\in [m]} x_{(u,i)}. So the uth entry of C(\mathbf x) is just the average over all entries x_{(u,i)} in the uth “cloud” of the vector \mathbf x. Also define

\mathbf x^{\|} := C(\mathbf x) \otimes \mathbf 1_m where \mathbf 1_m \in \mathbb R^m is the all-1 vector

\mathbf x^\perp = \mathbf x - \mathbf x^\|

Fix a vector \mathbf x \perp \mathbf u, we want to derive something like |\langle \mathbf{\tilde B P \tilde B x, x} \rangle| \leq f(\alpha, \beta) \langle \mathbf{x,x} \rangle. So, let’s start with the left hand side. Note that, since \mathbf x^\| is uniform on each “cloud”, we have \mathbf{\tilde Bx^\| = x^\|}.

|\langle \mathbf{\tilde B P \tilde B (x^\|+x^\perp), (x^\|+x^\perp)} \rangle| \leq |\langle \mathbf{Px^\|, x^\|} \rangle| + 2|\langle \mathbf{P\tilde B x^\perp, \tilde B x^\|} \rangle| + \|\mathbf{\tilde Bx^\perp} \|^2

Consider the last term. Since \mathbf x^\perp is anti-uniform on each “cloud” (i.e. the sum of \mathbf x^\perp‘s coordinates on each cloud is zero), implying that \|\mathbf{\tilde Bx^\perp}\| \leq \beta \|\mathbf x^\perp\|.

We now bound the second term. Since \mathbf P is a permutation matrix, it does not changes a vector’s norm. We have

2|\langle \mathbf{P\tilde B x^\perp, \tilde B x^\|} \rangle| \leq 2\| \mathbf{P\tilde Bx^\perp}\| \|\mathbf{\tilde B x^\|}\| \leq 2\beta \|x^\perp\| \|x^\|\|

Finally, we bound the first term. Relatively straightforward computation leads to

\langle \mathbf{Px^\|, x^\|} \rangle = m \langle \mathbf{\hat A} C(\mathbf x), C(\mathbf x)\rangle \leq m\alpha \|C(\mathbf x)\|^2 = \alpha \|\mathbf x^\|\|^2.

Consequently,

|\langle \mathbf{\tilde B P \tilde B x, x} \rangle| \leq \alpha \|\mathbf x^\|\|^2 + 2\beta \|x^\perp\| \|x^\|\| + \beta^2 \|\mathbf x^\perp\|^2

Noting that \|\mathbf x\|^2 = \|\mathbf x^\|\|^2 + \|\mathbf x^\perp\|^2. It is easy to show that

|\langle \mathbf{\tilde B P \tilde B x, x} \rangle| \leq \max(\alpha+\beta, \beta+\beta^2)\|\mathbf x\|^2 \leq (\alpha+\beta+\beta^2)\|\mathbf x\|^2.

Now, suppose we start from a (d^4,d,1/5)-spectral expander H. Let G_1=H^2 and G_n = G_{n-1}^2 \star H. It is not difficult to prove by induction that G_n is a (d^{4n}, d^2, 2/5)-spectral expander.

The zig-zag construction can also be made strongly explicit.

Lecture 8: Confining a random walk on an expander is hard

Posted in lectures by HQN on September 23, 2008

Let G be an (n,d,\alpha)-spectral expander, B be a subset of vertices of G of size \beta n. Suppose we uniformly choose a random vertex of G and walk randomly for t steps.

Theorem 1 (Confining a random walk is hard). Let (B,t) be the event that the walk is confined within B the entire time.Then, Prob\left[(B,t)\right] \leq \sqrt\beta (\alpha + (1-\alpha)\beta)^t. In particular, if \alpha, \beta<1 then this confinement probability is exponentially small.

Proof. Let \mathbf P = (p_{ij}) be the “projection into B matrix“, i.e. p_{ii} = 1, \forall i \in B and p_{ij}=0 for all other i,j. Noting that \mathbf P is idempotent, it is not difficult to see that

Prob\left[(B,t)\right] = \|(\mathbf{P\hat A})^t\mathbf{Pu}\|_1 = \|(\mathbf{P\hat AP})^t\mathbf{Pu}\|_1 \leq \sqrt n \|(\mathbf{P\hat AP})^t\mathbf{Pu}\|_2

Thus, to bound Prob\left[(B,t)\right] we need to know how much the matrix \mathbf{M = P\hat AP} shrinks a vector after each multiplication. The same trick we did in the last lecture gives, for any non-zero vector \mathbf y

\frac{\|\mathbf{My}\|}{\|\mathbf y\|} \leq \lambda_1(\mathbf M) = \max_{\mathbf z\neq \mathbf 0} \frac{\mathbf z^T\mathbf{Mz}}{\mathbf z^T\mathbf z}

Now, consider any non-zero vector \mathbf z. Let \mathbf{x = Pz}. Then, \mathbf z^T\mathbf{Mz} = \mathbf z^T\mathbf{P\hat APz} = \langle \mathbf{\hat Ax, x} \rangle. First, express \mathbf x as a linear combination of the orthonormal eigenvectors of \mathbf{\hat A}:

\mathbf x = c_1\mathbf u_1 + c_2 \mathbf u_2 + \cdots c_n \mathbf u_n

Let \mathbf x^{\|} = c_1\mathbf u_1 and \mathbf x^{\perp} = \mathbf{x-x^{\|}}. Then,

\langle \mathbf{\hat Ax, x} \rangle = \langle \mathbf{\hat A x^{\|}} + \mathbf{\hat A x^{\perp}}, \mathbf x^{\|} + \mathbf x^{\perp} \rangle = \|\mathbf x^{\|}\|^2 + \langle \mathbf{\hat A x^\perp, x^\perp} \rangle

The usual trick gives

\langle \mathbf{\hat A x^\perp, x^\perp} \rangle = \|\mathbf x^\perp\|^2 \frac{\langle \mathbf{\hat A x^\perp, x^\perp} \rangle}{\|\mathbf x^\perp\|^2} \leq \hat\lambda \|\mathbf x^\perp\|^2 = \hat\lambda \left(\|\mathbf x\|^2 - \|\mathbf x^{\|}\|^2\right)

By Cauchy-Schwarz, \|\mathbf x^{\|}\|^2 = c_1^2 = \langle \mathbf x, \mathbf u_1\rangle^2 \leq \beta \|\mathbf x\|^2, we conclude that

\langle \mathbf{\hat Ax, x} \rangle \leq \hat\lambda\|\mathbf x\|^2 + (1-\hat\lambda)\|\mathbf x^\|\|^2 \leq (\alpha + (1-\alpha)\beta)\|\mathbf x\|^2 \leq (\alpha + (1-\alpha)\beta)\|\mathbf z\|^2

Consequently, each time we apply \mathbf M to a vector \mathbf y we reduce its length by a ratio of at least (\alpha + (1-\alpha)\beta). There are t applications, and the initial vector \mathbf{Pu} has length \sqrt{\beta/n}. The theorem follows.

Theorem 2 (Saving random bits for RP). A language L is in the class RP if there exists a polynomial time randomized algorithm A satisfying the following conditions:

  • x \in L \Rightarrow Prob(A accepts x) \geq 1/2
  • x \notin L \Rightarrow Prob(A accepts x) = 0

Suppose A uses r random bits. To reduce the error probability to 1/2^k, the easy way is to run the algorithm k times and accept the input if the algorithm accepts it at least once. However, this approach requires kr random bits. By imposing a (strongly explicit) (n,d,\alpha)-spectral expander on the space of random strings of length r, we can obtain the same effect with only r + O(k)\log d random bits. Here, we require (\alpha + (1-\alpha)1/2)<1. A strongly explicit expander is an expander which, given a vertex, there’s a poly-time algorithm computing the neighboring vertices.

Some linear algebra

Posted in lectures by HQN on September 21, 2008

I mentioned the Perron-Frobenius theorem in the lecture. There are two versions of the theorem, one for positive matrices and one for non-negative matrices. Define the spectral radius \rho(\mathbf A) = \max\{|\lambda|\}, where the max goes over all eigenvalues \lambda of \mathbf A. Here |\lambda| is the modulus of \lambda, as we allow complex eigenvalues.

Theorem 1 (Perron-Frobenius, positive matrices). Let \rho be the spectral radius of a positive matrix \mathbf A. Then,

  1. \rho is positive
  2. \rho is an (real) eigenvalue of \mathbf A
  3. there exists a positive \rho-eigenvector \mathbf x, i.e. \mathbf{Ax} =\rho \mathbf{x, x>0}
  4. \rho has geometric and algebraic multiplicity 1, i.e. \rho is a single root of \det(\lambda\mathbf{I-A})=0 and the \rho-eigenspace has dimension 1
  5. |\lambda|<\rho for all other eigenvalues \lambda \neq \rho of \mathbf A, i.e. \rho is the unique eigenvalue of maximum modulus

Theorem 2 (Perron-Frobenius, non-negative matrices). Let \rho be the spectral radius of a non-negative matrix \mathbf A. Then,

  1. \rho is non-negative
  2. \rho is an (real) eigenvalue of \mathbf A
  3. there exists a non-negative \rho-eigenvector \mathbf x, i.e. \mathbf{Ax} =\rho \mathbf{x, x\geq 0}

Theorem 2 follows from Theorem 1 by a perturbing the non-negative matrix (so that it becomes positive), then take a limit. See my favourite linear algebra book for the proofs of these theorems.