lecture 21: indexed files
DESCRIPTION
CSC 213 – Large Scale Programming. Lecture 21: Indexed Files. Today’s Goals. Look at how Dictionary s used in real world Where this would occur & why they are used there In real world setting, what problems can/do occur Indexed file usage presented and shown - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/1.jpg)
LECTURE 21:INDEXED FILES
CSC 213 – Large Scale Programming
![Page 2: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/2.jpg)
Today’s Goals
Look at how Dictionarys used in real world Where this would occur & why they are
used there In real world setting, what problems can/do
occur Indexed file usage presented and
shown How & why we split index & data files Formatting of each file and how they get
used Describe what problems solved using
indexed files Java coding techniques that simplify using
these files Idea needed when using multiple
indexes shown
![Page 3: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/3.jpg)
Dictionaries in Real World
Often need large database on many machines Split search terms across machines Updating & searching work split between
machines Database way too large for any single
machine If you think about it, this is incredibly
common Where?
![Page 4: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/4.jpg)
Split Dictionaries
![Page 5: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/5.jpg)
Split Dictionaries
![Page 6: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/6.jpg)
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
![Page 7: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/7.jpg)
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
![Page 8: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/8.jpg)
Index & Data Files
Split information into two (or more) files Data file uses fixed-size records to store
data Index files contain search terms & data
locations Fixed-size records usually used in data
file Each record will use exactly that much
space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &
rebuild index
![Page 9: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/9.jpg)
Index File Format
No standard format – depends on type of data Often variable sized, but this not specific
requirement Each entry in index file begins with exact
search term Followed by position containing matching
data As a result, often find indexes smushed
together Can read indexes at start of program
execution Reasonably assumes index file smaller than
data file Changes written immediately, however
When program starts, do NOT read data file
![Page 10: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/10.jpg)
Never Read Entire Data File
![Page 11: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/11.jpg)
Indexed Files
Enables splitting search terms across computers Alphabetical split searches faster on many
serversA - C
D-E
F-HI-P
Q-R
S-T
U-X Y-Z
![Page 12: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/12.jpg)
Indexed Files
Enables splitting search terms across computers Create indexes for different types of
searchingSong name
SongLength
![Page 13: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/13.jpg)
How Does This Work?
Using index files simplified using positions Look in index structure to find position of
data in file With this position can then seek to specific
record Create instance & initialize by reading data
from file
![Page 14: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/14.jpg)
Starting with Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 2 F
F 224IBM 0T 112
![Page 15: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/15.jpg)
Where Was "Searching" Used?
Indexed files used in Maps and Dictionarys Read data into searchable object after
opening file For each record, Entry uses indexed data as
its key Single data file has multiple indexes to
search it Not a problem, each index has own Collection
Cannot have multiple instances for each data item
Cannot have single instance for each data item
Then how can we construct each Entry's value?
![Page 16: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/16.jpg)
Proxy Pattern For The Win!
![Page 17: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/17.jpg)
Proxy Pattern For The Win!
Create proxy instances to use as Entry's value Proxy pretends has data by defining getters
& setters Data's position & file only fields these
objects have Whenever method called looks up &
returns data Other classes will think proxy has fields
declared Simplifies using class & ensures up-to-date
data used But little memory needed, since data
resides on disk!
![Page 18: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/18.jpg)
Starting with Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T
F 224IBM 0T 112
Ford 12 F
![Page 19: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/19.jpg)
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
![Page 20: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/20.jpg)
Coding
public class Stock {private static final int NAME_OFF = 0; private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Fixed max. sizeof each field
Fixed size of a record in data file
![Page 21: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/21.jpg)
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Offset in recordto field start
![Page 22: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/22.jpg)
Coding
public class Stock { // Continues from last time
public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt();}public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price);}public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym);}// More getters & setters from here…
![Page 23: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/23.jpg)
Visualizing Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
F 224IBM 0T 112
IBM 106
IBM AT & T 23 T Ford 12 F
![Page 24: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/24.jpg)
How Do We Add Data?
Adding new records takes only a few steps Add space for record with setLength on
data file Update index structure(s) to include new
record Records in data file updated at each
change
![Page 25: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/25.jpg)
Adding New Data To The Files
C 336F 224IBM 0T 112
0 Ø
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 12 F
![Page 26: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/26.jpg)
Adding New Data To The Files
C 336F 224IBM 0T 112
Citibank -2 C
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 12 F
![Page 27: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/27.jpg)
How Does This Work?
Removing records even easier To prevent using record, remove items from
indexes Do NOT update index file(s) until program
completes Use impossible magic numbers for record in
data file
![Page 28: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/28.jpg)
Removing Data As We Go
C 336F 224IBM 0T 112
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
Citibank -2 CIBM 106
IBM AT & T 23 T Ford 12 F
![Page 29: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/29.jpg)
Removing Data As We Go
C 336IBM 0T 112
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Citibank -2 CIBM 106
IBM AT & T 23 T 0 Ø
![Page 30: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/30.jpg)
Using Multiple Indexes
Multiple indexes for data file very often needed Provides many ways of searching for
important data Since file read individually could also create
problem Multiple proxy instances for data could
be created Duplicates of instance are created for each
index Makes removing them all difficult, since not
linked Very easy to solve: use Map while loading
index Converts positions in file to proxy instances
to solve this
![Page 31: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/31.jpg)
Linking Multiple Indexes
Use one Map instance while reading all indexes For each position in file, check if already in Map
Use existing proxy instance, if position already in Map
If a search in Map returns null, create new instance
Make sure to call put() when we must create proxy
![Page 32: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/32.jpg)
What to Study for Midterm
Study your Maps and Dictionarys When would we use each of the ADTs? Why?
What do their methods do? Why do they differ?
Consider each implementation of these ADTs Explain why method has its given big-Oh
complexity Why use an implementation? Where is it
used? What are negatives or limitations of
implementation? What fields needed by implementation?
Why is this?
![Page 33: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/33.jpg)
What to Study for Midterm
Hash tables How do hash functions work? What does
mod do? How do we add & remove data from hash
table? What are collisions & how do we handle
them? What is real & pretend big-Oh complexity?
Why? Binary Search Trees
How do we add, remove, & search in these trees?
How are data in BSTs organized? Tricks to their use?
How do we code & use BSTs? What methods exist?
![Page 34: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/34.jpg)
What to Study for Midterm
List-based approaches – Why? When? Hash tables
How do hash functions work? What does mod do?
How do we add & remove data from hash table?
What are collisions & how do we handle them?
What is real & pretend big-Oh complexity? Why?
Binary Search Trees How do we add, remove, & search in these
trees? How are data in BSTs organized? Tricks to
their use? How do we code & use BSTs? What
methods exist?
![Page 35: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/35.jpg)
What to Study for Midterm
AVL Trees How do we add, remove, & search in these
trees? How are data in them organized? Tricks to
their use? When must we reorganize tree? How is this
done? Splay Trees
How do we add, remove, & search in these trees?
For each method is node splayed & which one?
How to chain splayings together? When do we stop?
![Page 36: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/36.jpg)
What to Study for Midterm
Class selection & design Where do classes come from? How do we
know? When to use each connection between
classes? How to list methods & fields in UML class
diagram? Comments & Outlines
When, where, and how much? What should & should not be included?
![Page 37: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/37.jpg)
Midterm Process
Open-book & open-note test; do not memorize But have methods & information at your
fingertips Use my slides ONLY with note(s) on that day's
slides Cannot use daily or weekly activities Must submit all printed pages along
with test Problems resembles tone of those
already seen All new problems, however; do not memorize
answers Includes tracing, showing state of ADT,
method returns Coding, big-Oh analysis, and more can be
asked
![Page 38: Lecture 21: Indexed Files](https://reader036.vdocument.in/reader036/viewer/2022062315/56815d8c550346895dcb9877/html5/thumbnails/38.jpg)
For Next Lecture
Midterm #1 in class week on Friday
Project #2 available on Angel on Friday, too
Lab phase #2 due on Friday at midnight I still will be out of town, but lab activity will be posted Due week from Friday; chance to use indexed files
No class on Monday; take some time to relax I will be out-of-town serving on an NSF grant panel Updated schedule on Angel accounts for change