CS 1102 Computer Science II
Fall 2016

Computer Science Department
The College of Arts and Sciences
Boston College

About Staff Textbooks Grading Schedule Vista
Piazza algs4 Resources Java APIs Problem Sets
Problem Set 8: Puff

Assigned: Wednesday November 16, 2016
Due: Friday December 2, 2016, midnight
Points: 10

In the previous problem set you teamed up with one other person to design and develop a program Huff.java that performed Huffman encoding of text files. If you didn't get that program working You can download one right here.

Huffman coding is a lossless compression algorithm that achieves a very respectable rate of compression. In this problem set you are to design and develop the inverse program Puff. Your team's pair of programs should have the property that for every text file F.txt, feeding F.txt as a command line argument to Huff should produce a compressed file F.zip. Then feeding F.zip as a command line argument to Puff should produce a file that is indistinguishable from the original F.txt.

Refer to the statement of problem set 7 for a description of the data structures required for Huffman coding.

Puff : the Huffman Decoding Algorithm

The Huffman algorithm has a coding part and a decoding part. The basic ingredients needed for decoding have all been described in the statement of problem set 8. The algorithm works as follows.
  1. Open the zip file provided as a command-line argument. Confirm that the file was produced by the associated Huff program by reading two bytes and confirming that they have the integer value 0x0BC0. If not, print an error message and exit.

  2. Read one 4-byte integer from the zip file. For the purposes of this discussion, let's call it N. The integer N specifies the size of the character frequency table in the zip file. Each entry has a 1-byte character code followed by a 4-byte integer representing the frequency of the character in the input text.

  3. Create a new symbol table. For each of the N entries in the table in the zip file, read character c and frequency f. Enter c and f in the symbol table.

  4. Use the symbol table to construct the Huffman coding tree. (See the description of the Huff algorithm above.) For the purposes of this discussion let's call it hct.

  5. Open the output text file.

  6. For each coded letter, do the following:
    1. Set temporary variable t to point to the root of the Huffman coding tree hct.

    2. Until t is pointing to a leaf node, do the following.
      1. Read one bit b from the zip file.

      2. Advance t to either the left or the right child depending on b and the convention that you adopted. (I.e., left = 1, right = 0 or vice-versa).

    3. Write the character found at the leaf node to the output file.

  7. Close the output file.

You can test your Puff program to see if it is working correctly by trying to decompress this file.
Created on 11-14-2016 14:03.