long tails and archive systems elliot jaffe fdis 2005

Post on 04-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Long tails and Archive systems

Elliot Jaffe

FDIS 2005

Archive Metrics

• What– Distribution of file sizes– Distribution of occupied storage– How are files accessed

• Why– System architecture– Scaling for access

File size studies

UFS93 (1993)

• 12 million files

• UNIX only

• Avg. file size is 2k

• 90% of storage in

11% of files

HUJI (2005)

• 4 million files

• UNIX + Windows

• Avg. file size is 8k

• 90% of storage in

5.5% of files

What’s Changed

Then

JAWS, NOW

Online was expensive

Offline tape storage

Now

Central File Servers

Digital Libraries

Online is cheap

No offline storage

XML

Multimedia

Empirical Data

Questions

• What is the future of these distributions?

• Are the changes extensions of the tails with power laws, so that 10/90 and 20/80 rules no longer work and are the wrong way to think about them?

• Are the changes based on external factors that are unpredictable?

The Long Tail

• Chris Anderson (2004)– http://www.wired.com/wired/archive/12.10/tail.html

• The long tail of a distribution has tremendous mass and creates new market opportunities

• Amazon, Netflix, Wikipedia

Today’s landscape

NOW

File Servers

Sarbanes Oxley

Digital Libraries

Storage Capacity

Access Frequency

Next Steps

• Collecting data from large storage systems– File Sizes, Created, Last Modified, Last

Access, Frequency of Reads

• Goal: New architectures for Digital libraries– Focus on Operations– Store large and small files differently– Store very-low access files in slow access

top related