I wrote down a guidelines/advice post for the presentations in my coding theory course. Please follow the guidelines while preparing for your talk in the seminar.
Theorem (Hastad 1997)
NP = PCP for any given .
The outline of the proof is as follows. (This exact outline will be used at least one more time, starting from a slightly different version of LabelCover.)
- We start from the NP-hard problem Gap-Max-LabelCover, and design a 3-bit PCP verifier for it (with logarithmic randomness).
- The verifier expects labels to be encoded with the (binary) long code, which is a map LC, where . Each -vector of length can be viewed as the truth table of a function . Thus, the long code LC of a symbol is also one such function. In particular, it is the dictator function LC. Then, the verifier chooses an edge of the graph at random, pick 3 bits from the (supposed) long codes of the two labels and perform a simple linear test on those bits.
- The completeness of the verifier is straightforward.
- For soundness, we prove the contra-positive, meaning if the test passes with high probability, then there’s a labelling satisfying -fraction of the edges of the LabelCover instance. To show that such a labelling exists, we use the probabilistic method to choose a random labelling based on the Fourier coefficients of the functions representing (and perhaps pretending to be) long codes.
Bellare-Goldreich-Sudan introduced the long code. This is an excellent expository paper on many ideas we have and will discuss. For Fourier analysis of boolean functions, O’Donnell’s tutorial at STOC is a good starting point.
Actually part of the blog post on Lecture 1 was presented in Lecture 2. The main theme of lecture 2 was the following:
- We showed that the PCP theorem is equivalent to the NP-hardness of several gap problems, Gap-Max-E3SAT and Gap-LabelCover in particular. The last post has shown that Gap-Max-E3SAT is NP-hard. To show that Gap-Max-LabelCover is NP-hard for some constant $\rho$ is not difficult: put all variables on the left, clauses on the right, connect a variable and a clause if the variable belongs to the clause, labels for clauses are 001,010, …, 111 corresponding to combinations of literals which satisfies the clause; labels for variables are 001 or 010 which “stand for” TRUE or FALSE; finally, the constraint on an edge “projects” the clause’s label to the literal’s truth assignment.
- The above reduction yields bipartite graphs which are 7-regular on the right, but may not be regular on the left, since each variable can appear in an arbitrary number of clauses. For our purposes, we also want left-regular bipartite instances, which can easily be done by reducing from Gap-Max-E3SAT(d). Check Luca Trevisan’s survey for a proof that Gap-Max-E3SAT(d) is NP-hard for some constant d. (Vazirani’s book also contains a proof with d=29, I think.) The proof involves a very nice (but standard) application of expanders.
- A natural PCP verifier for the Gap-LabelCover problem can be viewed as a verifier of a 2-player 1-round game (2P1R)
- Then, Raz’s Parallel Repetition theorem can be applied to exponentially reduce the soundness of the PCP verifier for Gap-LabelCover. Since this result will be used to construct Hastad’s 3-bit PCP, we formally state it here.
Theorem (Raz’s Gap-LabelCover):
Given any , there exists an alphabet with size for which Gap-LabelCover is NP-hard. Moreover, bipartite graph instances of this Gap-LabelCover problem can be assumed to be -regular on the left and -regular on the right where are constants. Furthermore, the constraint for every edge of the graph satisfies the projection property, i.e. it checks if , where are the labels for respectively.
The second half of this semester is devoted to proving hardness of approximation. For example, we will show that it is -hard to approximate MAX-3SAT to within any constant better than (of the opimal). In their FOCS 97 paper, Karloff and Zwick have shown us how to use SPD to design a -approximation algorithm. Thus the above hardness result is essentially optimal.
I am typing this lecture to test Luca Trevisan’s latex2wp converter (thanks, Luca!). I probably won’t have the time to type lectures any more this semester. Here’s a brief outline of what I will be talking about in the next 7 lectures. I hope I can finish them on time:
Lecture 1: gap-producing reduction from PCP.
- How do we show that an optimization problem is -hard to approximate to within some ratio? Answer: design a gap-producing reduction from an -hard problem, which is equivalent to showing that the corresponding gap-version of the problem is -hard.
- How do we design such a gap-producing reduction? There are two basic strategies:
- Start from a problem which already has a gap, i.e. an -hard gap-version of some problem. Then, the reduction has to be “gap-preserving” somehow. We will not discuss this strategy in Lecuture 1. We will see many more examples along this line later.
- Use the PCP theorem. In particular, use the PCP verifier for some/any -complete problem as a subroutine in the gap-producing reduction. I have already given one example of this last semester. I will re-state the example again below. The FGLSS reduction will be the main example this time.
Lectures 2 + 3 gap-amplification.
- The “reduction from PCP” strategy may not produce very good gap. To prove strong hardness results, we need to “amplify” the gap.
- There are several ways of doing gap-amplifiction:
- Repeat the verifier independently many times (at the expense of query and random bits)
- Use expanders! (still too many query bits)
- Use parallel-repetition and then alphabet reduction (somehow). We will discuss Hastad’s 3-bit PCP, its analysis, and some consequences.
Lectures 4 + 5 unique games conjecture (UGC)
- UGC is a conjecture regarding the -hardness of a certain gap problem. Using it, we can design nice gap-producing reduction.
- There’ll be quite a bit of Fourier analysis of boolean functions. Majority is stablest theorem. Hardness of approximating MAX-CUT.
Lectures 6 + 7 gap-preserving reductions + time filler.
1. How do we show that a problem is -hard to approximate to within a certain ratio ?
To be concrete, take MAX-3SAT as an example. The general strategy is:
- start from -complete problem
- let denote the optimal cost of an instance of MAX-3SAT; design a polynomial-time (Karp/Cook) reduction MAX-3SAT such that, given any input to ,
- if is a YES-instance of , then for some function
- if is a NO-instance of , then
Such a reduction is called a gap-producing reduction. A typical -hardness is too weak to produce any “good” gap (for example, with for MAX-3SAT). Here, we use to denote the length of an input to the problem at hand (MAX-3SAT in this case).
Let be any two functions. Let Gap-MAX-3SAT be the (decision) problem of distinguishing between
- instances of MAX-3SAT for which , and
- instances of MAX-3SAT for which
Proposition 1 The existence of a reduction as described above is equivalent to the fact that Gap-MAX-3SAT is -hard.
Proposition 2 If Gap-MAX-3SAT is -hard then MAX-3SAT is -hard to approximate to within .
Proof: Suppose there is an approximation algorithm with ratio ; namely, for any input , we always have . (Here, be the number of satisfied clauses returned by .)
If , then certainly . If , then . Thus, we can use do decide in polynomial time if is a YES- or a NO-instsance of the gap problem, a contradiction to the fact that it is -hard.
Certainly, the above line of reasoning is not limited to MAX-3SAT. We could have replace MAX-3SAT by MAX- for any problem , and Gap-MAX-3SAT by Gap-MAX-. It is also convinient to normalize the objective function of so that the cost is between and , so that . For example, for MAX-3SAT we can define the objective function to be the fraction of satisfiable clauses of an input formula . Last but not least, the same line of reasoning works for MIN- and Gap-Min- too! I’ll leave the technical details to you.
2. How do we design a gap-producing reduction?
Equivalently, how to we prove that a gap-problem is -hard? As we have mentioned, the typical -hardness reduction is — in most cases — too weak for this purpose. Fortunately, the PCP theorem gives us precisely one such reduction. Moreover, this PCP “technology” is sufficiently strong that it can be used to design many gap-producing reductions based on it.
Note that, it is somewhat misleading to talk about the PCP theorem. There are many PCP theorems, each with different parameters. Different PCP theorems give us different starting points for designing gap-producing reductions. When people say the PCP theorem, they mean the following theorem:
Theorem 3 (The PCP Theorem)
We will prove other PCP theorems in the next few weeks. To illustrate the PCP “technology”, we first show that it is actually equivalent to the hardness of some gap problem.
Theorem 4 The PCP theorem is equivalent to the fact that, there is some constant for which Gap-MAX-E3SAT is -hard.
Proof: Let’s assume the PCP theorem first. We will produce a reduction from an -complete language to Gap-MAX-E3SAT. More concretely, consider any -complete language . The reduction works by constructing in polynomial time an E3-CNF formula with clauses, given an input . The construction satisfies the following properties, for some constant :
By the PCP theorem, there is some -restricted verifier recognizing , where and is a fixed constant. We will use to construct for each input string . In other words, is a sub-routine in the gap-producing reduction we are designing.
Note that, when is adaptive the length of the proof does not need to be more than . When is non-adaptive, the proof’s length does not need to be more than . In both cases, only needs polynomial-size proofs. Let be the upperbound on proof sizes.
Construct as follows. Create variables , so that each truth assignment to these variables corresponds to a proof presented to . For each random string of length , there are some combinations of the answers to ‘s queries that make accept. We can model this fact by a CNF formula on such that iff accepts the proof . The formula can be constructed in polynomial time by simulating on the random string and generating all possible combinations of answers. Since is a constant, there are only constantly () many answer combinations. By adding a few auxiliary variables, we can convert into E3-CNF form. Originally has clauses. Each clause gives rise to at most size- clauses. Hence, after the E3-CNF conversion has at most clauses.
Finally, let , then itself can be constructed in polynomial time since there are only polynomially many random strings . (This is why the randomness of is crucial!) Let be the total number of -CNF clauses of , then .
- When , there is a proof (a truth assignment) such that always accepts. Hence, under this assignment is satisfiable.
- When , set for all and feed as a proof to . In this case, only accepts with probability . Hence, at least half of the are not satisfiable by any truth assignment. For each that is not satisfied, there is at least one clause that is not satisfied. The number of non-satisfied clauses is thus at least . Consequently, setting we have
Conversely, assume Gap-MAX-E3SAT is -hard for some constant . Let us prove the PCP theorem. The fact that is easy. We show by designing an -verifier for some -complete language , with and .
Since Gap-MAX-E3SAT is -hard, there’s a poly-time reduction from to Gap-MAX-E3SAT. Consider any input string . Use the assumed reduction to construct . The strategy for is to pick a constant number of clauses of at random, ask the prover for the values of (at most ) variables in these clauses, and accept iff all the clauses are satisfied. Clearly has perfect completeness. When , at most clauses are satisfied. Hence, the probability that accepts is at most
when . Since , the number of random bits used is , and the number of query bits needed is at most , which is a constant.
3. Max-Clique and the FGLSS Reduction
We give another example of a gap-producing reduction using a PCP verifier as a sub-routine.
The PCP connection refers to the use of a PCP characterization of to show hardness results for optimization problems. This connection was first noticed via a reduction from interactive proofs to Max-Clique in the pioneering work of Feige, Goldwasser, Lovász, Safra, and Szegedy. Since then, the reduction is referred to as the FGLSS reduction.
Consider an -restricted verifier for a language . On input a transcript is a tuple such that is a random string, the and are the queries and corresponding answers that made and received, in that order, given the random string. is an accepting transcript if accepts after seeing the answers.
Two transcripts and are consistent with each other if , i.e. if for the same questions we get the same answers.
On an input which tries to verify whether or not, we will construct a graph in polynomial time such that, for any ,
Let , where represents all accepting transcripts of on and consists of edges connecting consistent pairs of transcripts. It follows that . We can add dummy vertices so that .
Note that the first question asks is deterministic, knowing and . Then, knowing the first answer the second question is known, etc. Thus, the questions in a transcript are in fact redundant for the encoding of transcripts. Consequently, the vertices of with the same random string form a cluster of independent vertices.
If , then there is some proof such that . Consider the set of all transcripts whose answers come from , then all these transcripts are consistent with each other. In other words, they form a clique. The fact that . implies that the clique size is at least . Hence,
Conversely, from a clique of of size , say, we can construct a proof for which accepts with probability . The proof is constructed by taking the union of the answers of the transcripts from the clique, adding dummy answers if they were not part of any transcript in the clique. Consequently, when there cannot be a clique of size more than , otherwise there would be a proof for which accepts with probability more than . Hence, in this case
Remark: The FGLSS reduction runs in time
Lemma 5 If , and if , then Max-Clique is hard to approximate to within for any .
Theorem 6 It is -hard to approximate Max-Clique to within any constant .
Next time, we will see how to “amplify” the gap to prove stronger in-approximation results for Max-Clique.
We have two confirmed talks in the theory seminar for this semester. The first one is on March 2nd and the next one is on May 4th. For the latter please use the comments section to let me know what times work for you to attend the talk (it is the finals week).
Due to the March 2nd theory seminar, the first four student presentation dates have been moved up. I have updated the dates in the presentation schedule accordingly.
Here is the schedule of the first set of presentations:
- Thanh (Feb 16 ): Lecture notes on Parallel repetition from Venkat and Ryan’s course.
- Steve (Feb 18 ) Ben-Sasson, Sudan: Short PCPs wth polylog query complexity.
- Nathan (Feb 23 ): Continue with lectures notes on the Parallel repetition from Venkat and Ryan’s course.
- Swapnoneel (Feb 25 )
- Yang (Mar 4 )
Please let us know once you have chosen your paper so that we can make a note of it above. For your reference here is a link to the suggested list of papers.
The list of suggested papers for the first set of presentations is now up on the webpage. Some of them have some caveats, so read them carefully. These papers are probably much much harder than any you presented last semester, so we highly encourage you to start early on picking/preparing for your presentations.
In today’s lecture Hung and I did a quick recap on what we covered in the last semester. On Wednesday, we will start with Dinur’s proof of the PCP theorem.
I finally put up the summaries of your talks that you sent on the blog. Sorry for the delay.
(Guest post by Swapnoneel Roy)
I presented the paper titled Bounds on 2-Query Codeword Testing by Eli Ben-Sasson, Oded Goldreich, and Madhu Sudan. In the paper, the authors study -query codeword testers. The main results in the paper are the upper bounds on the size of linear (respectively binary) codes that admit such testers (respectively testers of perfect completeness).
In other words, it was showed, if be a -locally testable linear code with minimal relative distance , for , we have .
(Guest post by Steve Uurtamo)
I presented the result (by Noga Alon, Eldar Fischer, Ilan Newman and Asaf Shapira) that using the dense graph model and allowing for two-sided error, the set of graph properties that can be tested for using a constant number of queries to the adjacency matrix of a graph (constant for any fixed error distance ) exactly correspond with those that can be determined using a set of Szemeredi regularity constraints.
Three examples of such reductions are given in the paper; vertex -colorability (testable), co-subgraph isomorphism (testable) and graph isomorphism (not testable).