CS385, Fall 2010
Theory of Computation
Problem Set 8 Solutions


  1. For each of the following context-free languages L, find the smallest pumping length that will satisfy the statement of the context-free pumping lemma discussed in class. In each case, your answer should include a number (the minimum pumping length), a detailed explanation of why that number is indeed a valid pumping length for the given language L, and a detailed explanation of why no smaller number qualifies as a valid pumping length for that particular language L.

    a) L = {w in {a,b}* | w has twice times as many a's as b's}

    Solution

    This L has a minimum pumping length of 3. The argument requires the observation of PS7 task 1a), namely that every nonempty string that has twice times as many a's and b's must contain one of the strings aab, aba, baa as a substring.

    To see that any pumping length of L must be at least 3, consider any nonempty string w of L. Given any split of w as w = uvxyz such that vy is nonempty and the pumped strings u vi x yi z all belong to L, vy must contain at least two a's and one b. Therefore, the length of vxy will be at least 3. Since vxy should have length p (a pumping length) or less, this forces p to be at least 3.

    Let's see whether 3 is a valid pumping length of L. Any w of length 3 or greater will contain one of the substrings aab, aba, baa. Assume for concreteness that w contains aab (the other cases are similar). Split w into uvxyz with

    u=prefix before the selected aab occurrence
    v=aa,
    x=epsilon,
    y=b,
    z=suffix after selected aab occurrence

    Notice for future reference that since vxy contains twice as many a's as b's, so must the rest, uz.

    The middle string vxy has length 3 (which is 3 or less), vy is nonempty, and the pumped string

    u vi x yi z

    equals

    u a2i bi z,

    which belongs to L for all non-negative integers i since both uz and the pumped middle portion contain exactly twice as many a's as b's. Therefore, 3 is indeed a valid pumping length of L. Above we showed that no number smaller than 3 can be a valid pumping length of L. This proves that 3 is the smallest pumping length of L.

    b) L = {w in {a,b}* | w "reads the opposite" in both directions, i.e., the reverse of w is the (bitwise) ones complement of w}

    Solution

    Because of the nonemptiness requirement for the vy portion of the uvxyz split in the statement of the pumping lemma together with the requirement that the middle portion vxy have length p or less, it follows that the pumping length p of any language must be 1 or greater.

    Notice that p=1 is not a pumping length of this language. To see this, consider the string 01. Any split w = uvxyz with |vxy| ≤ 1 will lead to pumping either only the 0 or the 1, yielding strings not in L.

    We will show that p=2 is a pumping length of L. Take any nonempty string w of L. Then w will be of the form s sR', where s is some nonempty string over {a,b} and sR is the bitwise complement of the reverse of s. Write w as t cc' tR', where c is a single symbol of the alphabet (either 0 or 1 here) and c' is the bitwise complement of c (0'=1, 1'=0). We can then split w as uvxyz with u=t, v=c, x=epsilon, y=c', z=tR', which satisfies

    |vxy| = 2

    vy nonempty

    u vn x yn z = s cn c'n sR, which belongs to L.

    This proves that 2 is a pumping length of L, hence 2 is also the shortest pumping length of L.


  2. Use the context-free pumping lemma to show that the language of all strings over the alphabet {0,1,2} that contain the same number of 0's and 1's and exactly as many 2's as 0's and 1's combined (like 21012022, 0212, and 12022120) is not context-free. Be sure to argue your case rigorously, not leaving any gaps nor relying on mere examples that may not cover all of the possible situations. As usual, your argument must be sufficiently clear and general so as to convince a reasonable but skeptical jury of its correctness.

    Solution

    Assume that the target language L dscribed in the statement is context-free. Then, by the CF pumping lemma, L has some finite pumping length, p. Choose the string w = 0p 1p 2p, which has length p or greater and belongs to L. If w is split as uvxyz, with |vxy| ≤ p and vy ≠ ε, then vxy will only be long enough to contain at most two different symbol types, either 0's and 1's, or 1's and 2's (or just one symbol type). Thus, pumping v and y will necessarily lead to strings that do not belong to L: if 2's are omitted from vxy, then pumping will increase the number of 0's and/or 1's without balancing the increase with a greater number of 2's, while if 0's are omitted from vxy, then pumping will either produce more 1's than 0's, or (if vxy omits the 1's also) a greater number of 2's than 0's and 1's combined. This contradicts the pumping lemma. The contradiction shows that the assumption that L is CF must be incorrect.


  3. For each of the following languages, determine whether the given language is regular, context-free but not regular, or not context-free (choose only one of the preceding). Justify your answer in detail, aiming to convince the usual skeptical but reasonable jury.

    a) L = {w in {0,1,2}* | each 0 in w is immediately followed by a sequence of length 3 or more containing only 1's and/or 2's}. For example, the strings 1211212121, 1210112012121022201211, and 01112012121 belong to L, but 121212011, 01011212, and 01111011 do not.

    Solution

    This language is regular. It is decribed by the regular expression
    (1 U 2)*(0(1U2)(1U2)(1U2)(1U2)*)*.
    

    b) L = {w in {0,1}* | the length of w is a prime number}. For example, the strings 01011, 10, and 1110001 belong to L, but the strings 0101, 11110000, and 110010 do not.

    Solution

    This language L is not context-free. We show this by using the Context-Free Pumping Lemma. If L were context-free, it would have some finite pumping length, p. However, consider the specific string w = 0q, where q is a prime number not smaller than p. This string w belongs to L since its length is prime. By the Context-Free Pumping Lemma, it should be possible to split w as w = uvxyz, where vxy has length p or less, vy is nonempty, and the pumped strings uvixyiz belong to L for all non-negative powers i. Let a denote the length of vy, so that the length of uxz is q - a. Then, for any given i, the corresponding pumped string uvixyiz has length ai + (q - a) = a(i-1) + q. I claim that this length fails to be prime for some value of i. The reason for this is that the values a(i-1) + q are evenly spaced as i varies: the distance between the ith and the (i+1)st values is exactly a. On the other hand, there are pairs of consecutive primes that are arbitrarily far apart. Therefore, eventually (for sufficiently large i), the distance between consecutive pumped string lengths will not be large enough to bridge the gap between consecutive primes, leading to a pumped string that does not belong to L. Some details to back up the preceding intuitive sketch: pick i to be the specific value i = q+1. The corresponding pumped string has length aq + q = (a+1)q, which is a product of nontrivial factors (since a is nonzero) and hence is not prime, so that this pumped string does not belong to L. This contradicts the Pumping Lemma. We conclude that L is not context-free, as claimed.

    c) L = {w in {0,1,2}* | each maximal subsequence s of consecutive 0's in w is immediately followed by a sequence of only 1's and/or 2's of exactly the same length as s}. For example, the strings 121212121, 01000012120022, and 12100220000021211 belong to L, but the strings 00110, 12011, and 121000 do not.

    Solution

    This language L is context-free but not regular.

    To see that L is not regular, apply the Pumping Lemma. For a would-be pumping length p, consider the string w = 0p 1p. If w = xyz, with |xy| ≤ p and y nonempty, then y consists of 1 or more 0's. Hence, pumping increases the number of 0's without changing the number of 1's, leading to strings that are not in L. This contradicts the Pumping Lemma, implying that L is not regular.

    The fact that L is context-free may be shown by constructing a CFG that generates L:

    S -> SS | PA
    P -> 1P | 2P | ε
    A -> 0A1 | 0A2 | ε
    


  4. Follow the depth-first parsing procedure discussed in class to find a derivation in the context-free grammar below of the string "the red car is big". Describe each step in detail.
    	S -> NVD
    	N -> AM
    	M -> DL | L
    	L -> car | house
    	A -> the | a
    	D -> big | red
    	V -> is | looks
    

    Solution

    Here's a summary that shows the string q from the pseudocode during the derivation. At each stage, the first available rule for the leftmost variable in q is chosen.
    push (S, rules minus S -> NVD) onto stack
    S => NVD
    push (NVD, rules minus N-> AM) onto stack
    NVD => AMVD
    push (NVD, rules minus A-> the) onto stack
    AMVD => theMVD
    push (NVD, rules minus M-> DL) onto stack
    theMVD => theDLVD
    push (theDLVD, rules minus D-> big) onto stack
    theDLVD => the big LVD
    (prefix doesn't match input string - backtrack)
    pop (theDLVD, rules minus D-> big) 
    push (theDLVD, rules minus D-> red) onto stack
    theDLVD => the red LVD
    push (the red LVD, rules minus L-> car) onto stack
    the red LVD => the red car VD
    push (the red car VD, rules minus V-> is) onto stack
    the red car VD => the red car is D
    push (the red car is D, rules minus D-> big) onto stack
    the red car is D => the red car is big
    


  5. The following bottom-up parsing procedure was discussed in class.
    Inputs: CFG G, target string w[0...n-1].
    Outputs: true if w is in L(G), false otherwise.
    Procedure:
    
    allocate table[0...n, 0...n]
    set each cell value of table to an empty set
    for i=0 to n-1
    	table[i,i+1] = set of all variables A of G for which there is a rule
    	A -> w[i] that has the i-th symbol of w on the right-hand side
    for length=2 to n
    	for first=0 to n-length
    		for split=first+1 to first+length-1
    			for each rule A -> BC of G
    				if table[first,split] contains B and
    				table[split,first+length] contains C
    					add A to table[first,first+length]
    return truth value of statement "table[0,n] contains the start symbol of G"
    
    Consider the following CFG G.
    S -> AC | BD | a | b
    A -> a
    B -> b
    C -> SA
    D -> SB
    
    For each part below, follow this procedure step by step for the grammar G and the given target string w. Document each step in detail. Based on the results, can the given string be generated by G? Explain.

    a) w = abba (notice that this string is slightly different than in the original statement)

    Solution

    The table resulting from following the procedure is shown below. Details of the procedure will be given in class.
    	a	b	b	a
    
    a	S,A	D	empty	empty
    
    b		S,B	D	empty
    
    b			S,B	C
    
    a				S,A
    
    
    Since S does not appear in the top right corner of the table, the string abba cannot be generated by G.

    b) w = babab (again, this is not exactly the same string as in the statement)

    Solution

    	b	a	b	a	b
    
    b	S,B	C	S	C	S
    
    a		S,A	D	S	D
    
    b			S,B	C	S
    
    a				S,A	D
    
    b					S,B
    
    Since S appears in the top right corner of the table, the string babab can be generated by G.