Problem Set 8: Puff
Assigned: Wednesday November 16, 2016
Due: Friday December 2, 2016, midnight
Points: 10
In the previous problem set you teamed up with one other person to
design and develop a program Huff.java that performed
Huffman encoding of text files. If you didn't get that program
working You can download one right
here.
Huffman coding is a lossless compression algorithm that
achieves a very respectable rate of compression. In this problem
set you are to design and develop the inverse program Puff.
Your team's pair of programs should have the property that for
every text file F.txt, feeding F.txt as a command
line argument to Huff should produce a compressed
file F.zip. Then feeding F.zip as a command line
argument to Puff should produce a file that is
indistinguishable from the original F.txt.
Refer to the statement of problem set 7
for a description of the data structures required for Huffman
coding.
Puff : the Huffman Decoding Algorithm
The Huffman algorithm has a coding part and a decoding part. The basic
ingredients needed for decoding have all been described in the
statement of problem set 8. The algorithm works as follows.
- Open the zip file provided as a command-line argument. Confirm
that the file was produced by the associated Huff program
by reading two bytes and confirming that they have the integer
value 0x0BC0. If not, print an error message and exit.
- Read one 4-byte integer from the zip file. For the purposes of
this discussion, let's call it N. The integer N specifies the size
of the character frequency table in the zip file. Each entry has a
1-byte character code followed by a 4-byte integer representing
the frequency of the character in the input text.
- Create a new symbol table. For each of the N entries in the table
in the zip file, read character c and frequency f. Enter c and f in
the symbol table.
- Use the symbol table to construct the Huffman coding tree. (See the
description of the Huff algorithm above.) For the purposes of this
discussion let's call it hct.
- Open the output text file.
- For each coded letter, do the following:
- Set temporary variable t to point to the root of the
Huffman coding tree hct.
- Until t is pointing to a leaf node, do the following.
- Read one bit b from the zip file.
- Advance t to either the left or the right child
depending on b and the convention that you
adopted. (I.e., left = 1, right = 0 or vice-versa).
- Write the character found at the leaf node to the output file.
- Close the output file.
You can test your Puff program to see if it is working
correctly by trying to decompress this file.
|