Cosc 2P03 Fall 2011
Assignment#4
(Due date for assignment -->TBA)
LZW compression
For this assignment you will be implementing the LZW compressor and
de-compressor. To make this a realistic assignment, consider
compressing a text file to a stream of binary short integers (16
bits). The code book will need to be initialized with the
standard ASCII alphabet, this will give 127 possible initial
characters. To aid in the look-up of the strings we will use a hash
table where an entry to the table will consist of a record (object)
consisting of the string S
(String) and the code C
(short). This will allow efficient look-up of the character strings.
To cover the range of a short int
make the table 65537 which should be prime, open addressing
can be used (k==+1).
Design a hash function f(S)
to map the string S into the hash table. Once possible function can
be:
f(S) = [1*char(1) + 2*char(2) + 3*char(3) etc....] mod table_size.
Implement the compressor, reading from some text file character by
character. Note that you will need also treat end of line and
carriage returns as characters to be encoded. These special
characters will need to be escaped when added into the hash table.
For each S encoded, output C as a short to a binary file.
The decompressor will read shorts from the compressed file and build
the original text file. The code book will be the ASCII char set.
This code book will need to be 65536 in length to accommodate all
possible codes.
Show that your solution works by downloading some text files from
the internet, compressing them and decompressing them.
Submission Requirements:
- Cover Sheet completely filled out, available
from: "http://www.cosc.brocku.ca/coverpage"
Note: your assignment will not be marked unless one is submitted
with the assignment on the assignment due date.
- Commented and properly documented source code
listing.
- Listing of any input you used to test your program.
- Listing of your output. Be sure to adequately show that the
input, operations and output show the marker that your program
is working. Don't be afraid to tell the marking if something is
not working correctly.
- Source code is to be Java.
- Disk, CD containing source, input, output and executables.
- Electronic submission, run the script "submit2p03" from
sandcastle.
- Statement on coversheet with following information.
- Platform, e.g. Mac PP6100, PC, Commodor 64, my Java enabled
wrist watch.
- Compiler Version, e.g. Java 1.5, Java 1.6 e.g.
Good Luck