long tails and archive systems elliot jaffe fdis 2005

9
Long tails and Archive systems Elliot Jaffe FDIS 2005

Upload: gabriella-higgins

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long tails and Archive systems Elliot Jaffe FDIS 2005

Long tails and Archive systems

Elliot Jaffe

FDIS 2005

Page 2: Long tails and Archive systems Elliot Jaffe FDIS 2005

Archive Metrics

• What– Distribution of file sizes– Distribution of occupied storage– How are files accessed

• Why– System architecture– Scaling for access

Page 3: Long tails and Archive systems Elliot Jaffe FDIS 2005

File size studies

UFS93 (1993)

• 12 million files

• UNIX only

• Avg. file size is 2k

• 90% of storage in

11% of files

HUJI (2005)

• 4 million files

• UNIX + Windows

• Avg. file size is 8k

• 90% of storage in

5.5% of files

Page 4: Long tails and Archive systems Elliot Jaffe FDIS 2005

What’s Changed

Then

JAWS, NOW

Online was expensive

Offline tape storage

Now

Central File Servers

Digital Libraries

Online is cheap

No offline storage

XML

Multimedia

Page 5: Long tails and Archive systems Elliot Jaffe FDIS 2005

Empirical Data

Page 6: Long tails and Archive systems Elliot Jaffe FDIS 2005

Questions

• What is the future of these distributions?

• Are the changes extensions of the tails with power laws, so that 10/90 and 20/80 rules no longer work and are the wrong way to think about them?

• Are the changes based on external factors that are unpredictable?

Page 7: Long tails and Archive systems Elliot Jaffe FDIS 2005

The Long Tail

• Chris Anderson (2004)– http://www.wired.com/wired/archive/12.10/tail.html

• The long tail of a distribution has tremendous mass and creates new market opportunities

• Amazon, Netflix, Wikipedia

Page 8: Long tails and Archive systems Elliot Jaffe FDIS 2005

Today’s landscape

NOW

File Servers

Sarbanes Oxley

Digital Libraries

Storage Capacity

Access Frequency

Page 9: Long tails and Archive systems Elliot Jaffe FDIS 2005

Next Steps

• Collecting data from large storage systems– File Sizes, Created, Last Modified, Last

Access, Frequency of Reads

• Goal: New architectures for Digital libraries– Focus on Operations– Store large and small files differently– Store very-low access files in slow access