CS385, Theory of Computation
NP Completeness

This page is located at http://www.cs.bc.edu/~alvarez/Theory/npcompleteness.html


The complexity classes P and NP

Polynomial time solvability

A computational problem encoded as a language L is solvable in polynomial time if and only if there is a (deterministic) Turing decider machine M that decides L and that runs in time O(nk) for some integer k, where n is the length of the input string.

P denotes the set of all languages that are solvable in polynomial time.

Nondeterministic polynomial time solvability

A computational problem encoded as a language L is solvable in nondeterministic polynomial time if and only if there is a non-deterministic Turing decider machine M that decides L and that runs in time O(nk) for some integer k, where n is the length of the input string.

NP denotes the set of all languages that are solvable in nondeterministic polynomial time.

Verifier viewpoint for NP

A verifier for a language L is a deterministic Turing machine V such that, for any string w, w is in L if and only if there is a string z (known as a certificate for w) such that V accepts on input <w, z>.

Theorem. A language L is in NP if and only if there is a verifier machine for L that runs in polynomial time.

Idea of proof: A certificate for w relative to L is essentially equivalent to an accepting computation branch of a nondeterministic decider machine on input w.


NP-completeness

Polynomial time reducibility

If A and B are languages over a common alphabet Sigma, then a polynomial time reduction of A to B is a function reduce: Sigma -> Sigma that can be computed by a Turing machine in polynomial time and such that for any string w over Sigma, w belongs to A if and only if its image reduce(w) belongs to B.

Note: A function f on Sigma can be represented as a language:

Lf = { <w, f(w)> | w is in Sigma }

Saying that f can be computed by a Turing machine in polynomial time just means that the associated language Lf belongs to the complexity class P, that is, it can be decided in polynomial time.

NP-complete problems

A language L is called NP-complete if and only if the following two conditions hold:

  1. L belongs to NP.

  2. Every problem in NP can be reduced to L in polynomial time.


Boolean satisfiability

Consider the following language version of a computational problem:

3-SAT = { phi | phi is a satisfiable Boolean propositional formula in conjunctive normal form, with 3 literals per clause }

Here's an example of a formula phi that belongs to 3-SAT:

phi = (x OR NOT(y) OR z) AND (NOT(a) OR z OR NOT(x)) AND (y OR a OR NOT(z))

This phi belongs to 3-SAT because it is in conjunctive normal form (it's an AND of OR clauses), has 3 literals per clause (for instance, the first clause contains the literals x, NOT(y), z; literal just means a variable or its negation), and phi is satisfiable, that is, there is a truth assignment to the variables of phi that makes phi true.

For instance, the following truth assignment makes phi true:

x -> TRUE
y -> TRUE
z -> TRUE
a -> FALSE

Because phi is in conjunctive normal form, saying that this truth assignment satisfies phi just means that each clause of phi contains at least one literal that is TRUE under this assignment.

There are other satisfying truth assignments as well, but we only care that there is at least one satisfying truth assignment in order for a formula to qualify for membership in 3-SAT.

Here's an example of a formula that does not belong to 3-SAT:

phi = (x OR y OR y) AND (NOT(x) OR NOT(y) OR NOT(x)) AND (x OR x OR x) AND (y OR y OR y)

Cook-Levin Theorem. 3-SAT is NP-complete.


NP-completeness proofs

A general strategy for showing that a language L is NP-complete is the following.

  1. First, show that L is in NP.

  2. Second, show that another problem H that is known to be NP-complete can be reduced to L in polynomial time.
These two steps imply that L is NP-complete because they show that the two conditions in the definition of NP-completeness hold for L. If A is any NP problem, then since H is NP-complete, A can be reduced to H in polynomial time. If H can be reduced to L in polynomial time too, then by performing the two reductions one after the other, A can be reduced to L in polynomial time.

Example: proving that VERTEX-COVER is NP-complete

We will now show that the NP-complete problem 3-SAT can be reduced in polynomial time to the following problem concerning graphs:

VERTEX-COVER = { <G, k> | G is a graph for which there exists a size k vertex cover, that is, some set of k vertices of G such that every edge of G touches one of these vertices }

Notice that the above problem VERTEX-COVER is in NP because a size k vertex cover provides a polynomial time certificate of membership of (G, k) in VERTEX-COVER. Therefore, showing that 3-SAT can be reduced in polynomial time to VERTEX-COVER will prove that VERTEX-COVER is NP-complete (see the strategy described above).

In order to show that a polynomial time reduction of 3-SAT to VERTEX-COVER is possible, we need to "translate" a satisfiable Boolean formula phi into a pair (G, k) such that the graph G has some vertex cover containing k vertices. Do this as follows.

First construct G by defining its vertices and edges. Add a fully connected triangle to G for each clause of phi. Also, add a fully connected pair to G for each variable of phi; label its ends x and NOT(x), where x is the associated variable. For each vertex in each triangle, add an edge to the corresponding end of the dumbbell gadget associated with the variable that appears in that literal.

Next, define k as 2*(# of clauses) + (# of variables).

Finally, given any string s over Sigma, let Reduce(s) be the encoding <G, k> of the objects constructed as described, assuming that s is of the form <phi> for some Boolean propositional formula phi. Otherwise, let Reduce(s) be epsilon. This completes the definition of a function Reduce: Sigma -> Sigma.

I claim that the function Reduce defined above reduces 3-SAT to VERTEX-COVER in polynomial time. We now prove this.

Let's get the running time issue out of the way from the beginning. The running time of Reduce is roughly the number of nodes of G plus the number of edges of G. This is O(# variables + # literals), which is linear (therefore polynomial) in the size of phi. This shows that Reduce can be computed in polynomial time.

Now the hard part. Let's show that Reduce reduces 3-SAT to VERTEX-COVER.

First, assume that phi belongs to 3-SAT. I claim that Reduce(phi) (by which I really mean Reduce(s), where s is the string encoding of phi) belongs to VERTEX-COVER. That is, I claim that if (G, k) is the pair represented by Reduce(phi), then G is a graph that has a vertex cover of size k. Let's construct such a vertex cover. Since phi is satisfiable, there is a truth assignment that satisfies phi. This means that for each variable x of phi there is a choice of either x=TRUE or NOT(x)=TRUE that makes phi true; add the corresponding end of the x-gadget to the vertex cover. Now scan the clause-gadgets. For a given clause, there must be a literal that is true under the given truth assignment; there will therefore be an edge in G connecting that literal to one of the ends of the associated variable-gadget; add the other two literal-nodes of that clause to the vertex cover. Once this has been done for all clauses, the total number of nodes selected will be exactly the value k that's included in Reduce(phi). The set of all selected nodes is a vertex cover for G. Why? Well, each edge of G is either a variable-gadget or else it is one of the three edges of a clause-gadget or else it connects a clause-gadget to a variable-gadget. In the first case, the edge touches the cover because one of the two ends of each variable-gadget belongs to it. In the second case, the edge must touch one of the two nodes of its clause gadget that were added to the cover. In the last case, the edge either connects from one of the two clause nodes that were added, in which case it touches the cover, or else it connects from the special node in the clause gadget that connects to the selected end of its variable-gadget, which was added to the cover. Thus, the node set is indeed a size k vertex cover of G.

Second, we must show that if Reduce(phi) belongs to VERTEX-COVER then phi is satisfiable. To see this, write Reduce(phi) as (G, k). Assume that G has a size k vertex cover, where k is as in the construction of Reduce, that is,

k = 2*(# of clauses in phi) + (# of variables in phi).

I claim that any size k vertex cover of G contains exactly 2 nodes from each clause-gadget and exactly one node from each variable-gadget. To see this, notice that any vertex cover of G must include at least 2 nodes from each clause-gadget; otherwise, one of the three edges in the clause-gadget would not touch the cover. Also, every vertex cover of G must include at least one node from each variable gadget. However, by the definition of k any size k vertex cover can't include any more nodes, which means that its membership is exactly as claimed.

So, pick a size k vertex cover of G. Define a truth assignment to the variables of phi based on this vertex cover as follows. For each variable x, let x=TRUE if the x end of the x-gadget is in the cover, and let x=FALSE if the NOT(x) end of the gadget is in the cover. I claim that this truth assignment makes phi true. This just means that at least one literal is true in each clause of phi. Which one? Easy: the one that is not in the vertex cover, since it connects to the TRUE end of the associated variable gadget. Therefore, phi is true under this truth assignment. This shows that phi is satisfiable, that is, phi belongs to 3-SAT.

This completes the proof that the function Reduce that we defined above reduces 3-SAT to VERTEX-COVER in polynomial time. Therefore, we have proved that VERTEX-COVER is NP-complete.