homework #5

11
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein

Upload: xanthus-ramos

Post on 01-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Homework #5. New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein. Homework #4 Review. Huffman coding is a variable-length binary encoding for text We implemented Huffman's optimal code finding algorithm (book 389-395) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Homework #5

Homework #5

New York UniversityComputer Science Department

Data Structures Fall 2008Eugene Weinstein

Page 2: Homework #5

Homework #4 Review

• Huffman coding is a variable-length binary encoding for text

• We implemented Huffman's optimal code finding algorithm (book 389-395)o Builds tree representing shortest possible code

• Input for HW#4: letters, frequencies:o A 20 E 24 ...

• Construct Huffman tree• Navigate tree to find code:

o c: 0, a: 10, b: 11

Page 3: Homework #5

Homework #5 Overview

• Given a documento Calculate letter frequencieso Construct Huffman codeo Encode documento Calculate memory savings of Huffman binary

encoding vs 8-bit ASCIIo Correctly decode document

• We can use Huffman code building algorithm from HW#4o So we will keep HuffmanTree and HuffmanNode

Page 4: Homework #5

Organization

• The new code for this assignment should go into HuffmanConverter.javao The filename of file to encode is passed as a

parameter on the command lineo So if my file is foo.txt, I should be able to run

java HuffmanConverter foo.txto Then foo.txt show up in args[0]o If you use an IDE, specify command-line options

through the menus• Test inputs and outputs linked from assignment

page (2007 version)

Page 5: Homework #5

HuffmanConverter Instance Vars

• String contents - stores file to processo Lines are separated by '\n' - line break charactero e.g., twoLines = line1 + '\n' + line2;

• HuffmanTree huffmanTree - output of HW4 • int count[] - frequencies in input file

o Indexed on ASCII value of characters, e.g., count[(int)'a'] is frequency of 'a'

• String code[] - binary string per charactero Also indexed on ASCII value, e.g., code[(int)'a']

== "10001"

Page 6: Homework #5

To Implement

• readContents() - reads in a file and stores in String contents

• recordFrequencies() - process file stored in contents and store frequencies in count[]

• frequenciesToTree() - use HW4 code to produce Huffman tree

• treeToCode() - slight modification of HW4: traverse Huffman tree and populate code[]

• encodeMessage() - use code[] to encode• decodeMessage() - use inverse of code[]

Page 7: Homework #5

Implementation Notes

• readContents() can use Scannero Read a line at a time, and append to contents

inserting '\n' to separate lines• recordFrequencies(): iterate over contents one

character at a time• frequenciesToTree()

o Very similar to main() method of HW4 o Create a BinaryHeap objecto For every non-zero-count letter, create a

HuffmanNode object, insert into heapo Then run Huffman algorithm

Page 8: Homework #5

Implementation Notes, Cont'd

• treeToCode()o Similar to printCode() of HW4o Instead of printing code, store in code[]

• encodeMessage()o For each character of contents, look up its binary

string in code[], append

Page 9: Homework #5

Implementation Notes, Cont'd

• decodeMessage()o Need to implement inverse mapping of code[]:

binary strings to characterso Several possible implementations

Traverse Huffman tree as you read binary string, output character when you reach a leaf

Build HashMap mapping strings to ASCII values of characters

Page 10: Homework #5

HashMap

• An array maps integers to Objectso e.g., String args[]: args[i] returns ith String

• A HashMap maps Objects to Objects• Access with put() and get(), e.g.,

o HashMap ids = new HashMap();o ids.put("Alice", 123456789);o ids.put("Ben", 321654987);o int id = (Integer) ids.get("Alice"); o // id gets 123456789

• For decode, map bit Strings to characters

Page 11: Homework #5

Homework #5 Tips

• Keep checking intermediate results• Make use of sample outputs here• Print out intermediate results!• You might need special cases for newline ('\n')• Your encoding might differ from the examples

o Depends on the BinaryHeap implementationo Same-frequency items are returned in arbitrary

order (e.g., in love_poem_58, 'N', '-', '.', 'W', and 'p' all have frequency one)

• However, Huffman encoding length must match!o Guaranteed to be shortest-length encoding