an experiment to characterize videos on the web soam acharya brian smith cornell university mmcn...
TRANSCRIPT
An Experiment To Characterize Videos On
The Web
Soam Acharya
Brian Smith
Cornell University
MMCN 1998
Overview
• Designed and implemented an experiment to search and analyze videos on the web
• 22500 HTML documents
• 57000 movies
• 100 Gbytes of data
www
www
www
www
Why?
• Codec Designers
• Network Engineers
• Other Multimedia Researchers• MM file systems
• Webmasters
• How many movies are out there?
• What are their basic properties?
• What compression formats are popular?
• How well do the formats compare?
• Are standard modem rates enough?
Questions We Asked
Not all that many. We found 57,000.
90% last 45 seconds or less. 1.1 Mbytes is their median size
QuickTime is about 53%, followed by MPEG (30%) and AVI
MPEG compresses best. QuickTime and AVI are similar.
28.8 - 128 Kilobits/sec (Kbps) are useless for real-time download and display of movies.
Roadmap
• Data Collection Methodology
• Analysis
• Results
• Conclusion
• Future Work
• Open Questions
Data Collection Methodology
• Hunting Phase– get links to movies
• Gathering Phase– download movies and gather raw statistics
• Sifting Phase– eliminate outliers
Early April 1997 -Hunting Phase
• Milked AltaVista for documents dated– January 1995 - March 1997
• looked for MPEG, QuickTime, AVI• no streaming video format
Gathering Phasemid April 1997 - May 1997
LP11. http://www.eg.com/movie.html
LDG: movie link distributor/gathererLP: link processor
www.eg.com
2. movie.html
www.vid.com
3. my.mov4. summary statistics
LP0
LP2
LDG
Http://www.eg.com/movie.html
http://www.cnn.com/pepe.html
…..
Sifting Phase
• Processed 100 Gbytes of data and 57,000 titles– used mpegstat and modified xanim
• 4 < frames/sec < 40 {5000 titles}
• duration > 0.5 seconds {1000 titles}
• 0.6 < aspect ratio < 1.667 {1000 titles}
• bitrate < 10 Mbps {1000 titles}– bitrate = (movie size)/(movie duration)
• duplicate URL detection {1500 titles}
Analysis• 47500 titles remained
– 53% QuickTime, 30% MPEG, 17% AVI
• Can be divided into two categories– Distributions:
• by date• fps• size• duration• aspect ratio• bitrate
– Comparing movie formats against each other
Roadmap
• Data Collection Methodology
• Analysis
• Results
• Conclusion
• Future Work
• Open Questions
Movie Growth
0
500
1000
1500
2000
2500
3000
3500Ja
n-94
Apr
-94
Jul-9
4
Oct
-94
Jan-
95
Apr
-95
Jul-9
5
Oct
-95
Jan-
96
Apr
-96
Jul-9
6
Oct
-96
Jan-
97
Apr
-97
Month
Nu
mb
er
of
mo
vie
s
Breakdown of Movie Growth By Type
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Jan-94
Apr-94
Jul-94
Oct-94
Jan-95
Apr-95
Jul-95
Oct-95
Jan-96
Apr-96
Jul-96
Oct-96
Jan-97
Apr-97
Month
Nu
mb
er o
f m
ovi
es
QuickTime
MPEG
AVI
FPS Distribution
0
2000
4000
6000
8000
10000
12000
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Frame Rate
Nu
mb
er o
f m
ovi
es
AVI
MPEG
QuickTime
Movie Size (In bytes)
• 70% of movies are 2Mbytes or less
• Median movie size is about 1.1 MBytes
90% of the movies are 45 sec or less, 50% < 15 sec
Overall Duration Distribution
0
2000
4000
6000
8000
10000
12000
5 15
25
35
45
55
65
75
85
95
10
5
115
Length (in seconds)
Nu
mb
er o
f M
ovi
es
Aspect Ratio
• 74% of all files had an aspect ratio of 1.333– 320 x 240– 160 x 120
• 89% had aspect ratios of 1.2 - 1.5
• Movie Bitrate = movie size / movie duration
Overall Average Bitrate Distribution
0
1000
2000
3000
4000
5000
6000
28
.8
30
0
70
0
110
0
15
00
19
00
23
00
27
00
31
00
35
00
39
00
70
00
Mo
re
Kbits/sec
# o
f m
ov
ies
0%
10%
20%30%
40%
50%
60%
70%80%
90%
100%
Frequency
Cumulative %
So Far ...
• Distributions:– by date– fps– size– duration– aspect ratio– bitrate
• Comparing movie formats
AVI/QuickTime Comparison
Video Codecs AVI QuickTime
Radius Cinepak 43% 60%Intel Indeo R3.2 25% 2%Microsoft Video I 26% 0%Apple Video-RPZA 0% 22%
• 25% of AVI, 33% of QuickTime: video only
AVI QuickTimeAudio Codec PCM PCM
MS-ADPCM TWOS
How Compare Compression?
• Bits/pixel = (video size in bits)__
(width * height * # of frames)
Mean Median (bits/pixel)
AVI 2.51 2.14QT 2.16 1.82MPEG 0.72 0.51
MPEG Bits/pixel Distribution
• Size of I:P:B frames ~ 1: 2 : 5
• 90% of MPEG files were video only
Frame Type Mean bits/pixel Median bits/pixel
I 1.25 1.10P 0.76 0.54B 0.31 0.19
MPEG Frame Patterns
Frame Pattern % Distribution Mean bits/pixel
I 27.1 1.17IBBPBB 15.7 0.7IBBPBBPBBPBBPBB 10.4 0.31IBBPBBPBBPBB 8.1 0.5IBBBPBBBPBBB 4.4 0.66IPBBIBB 4.2 0.39IIP 3.5 0.7
80% of MPEG: some recurring pattern
Recap• Number of movies coming online - exponential, then
flat• MPEG higher fps, QuickTime/AVI lower• Median size of movies: 1.1 Mbytes• 90% of movies last 45 seconds or less• 1.333 is the most common aspect ratio• 28.8 - 128 Kbps modem rates useless for real-time
downloads• Radius Cinepak is widely used by QuickTime and AVI• MPEG compresses better than QuickTime and AVI• 80% of MPEGs have some sort of recurring pattern
Conclusion• Existing compression technologies not
enough for transmission over standard modems– explains rise of streaming video technologies– users cope by making file sizes, duration
smaller– but not by throttling the bitrate
– perceptual threshold?
Future Work
• How do videos age?
• Another study to confirm findings– Brewster Kahle,– www.archive.org
• Develop tools to automate the process
Open Questions
• What are video access patterns on the Web?
• How to analyze streaming video files?