characterizing user access to videos on the world wide web mmcn 2000 brian smith department of...
TRANSCRIPT
Characterizing User Access To Videos On The World Wide Web
MMCN 2000
Brian Smith
Department of Computer Science
Cornell University
Ithaca, NY
Peter Parnes
Center For Distance Spanning Technology
Luleå University of Technology
Sweden
Soam Acharya
Inktomi Corporation
Foster City, CA
Overview
• Analysis of traces from an ongoing VoW trial (VoD over the Web)
• 2 year period
• 13100 requests
• 246 titles
Why?
• Audio/Video content:– coming online rapidly– constitute a large percentage (17%) of bytes
transferred online
• Useful to:– Cache Designers– Codec Engineers– Network Engineers– Other Multimedia Researchers:
• MM Storage Systems
Questions We Asked• Do accesses to videos exhibit temporal
locality?
• How frequently are videos accessed?
• Do users exhibit specific browsing patterns when viewing videos?
• What are the file size trends?
Roadmap
• VoW Setup
• Analysis Methodology
• Results
• Conclusion
• Future Work
VoW Setup
Videoserver
cdt.luth.se campus.luth.sesm.luth.se
luth.se
others
• Lulea University, Sweden• Center for Distance Spanning
Technology• High speed network (34 Mbps)• mMOD software system
VoW Setup II• Two years (end of Aug ‘97 - mid Oct ‘99)
• 246 video titles– encoded using H.261 (CIF - 320x240)
• ~ 500 campus machines involved in access, ~1400 outside
• title categories– general
• movies
– educational • courses• tutorials, seminars
Analysis
• Video file characteristics– size– duration– bitrate distribution
• Trace access analysis– Trace refinement– Actual analysis on refined data
Median Movie Size: 96 MBytes
Lulea University File Size Distribution
0
10
20
30
40
50
25 50 75 100 125 150 175 200 225 250 275 300 325
Movie Size (in Mbytes)
Nu
mb
er o
f M
ovi
es
Median Duration ~ 70 minutes
Duration Distribution of Lulea Univ. Movies
0
10
20
30
40
50
10
20
30
40
50
60
70
80
90
10
0
11
0
12
0
13
0
14
0
Movie Length (minutes)
Nu
mb
er
of
Mo
vie
s
Video Bitrate Distribution
0
20
40
60
80
100
120
50 100 150 200 250 300 350 400 450 500 550
kBits/sec
Fre
qu
ency
• Quality of video streams deliberately kept low (for external users)
• Compression scheme designed to produce lower bitrates
Trace Access Analysis - Log Filtering
• Initially eliminate from the trace:– HTML documents– Java applet requests– images– Joining a session already in progress
02:01:33 salt.cdt.luth.se GET Movie102:03:23 spock.cdt.luth.se GET TVSerial_97020603:04:12 aniara.cdt.luth.se GET Movie203:10:11 aniara.cdt.luth.se STOP Movie2
Log Filtering II
• Eliminate from trace:– requests from demo machines– resolve IP addresses for machine names– reduce user errors
• hitting STOP button too many times• hitting GET requests too many times
• Removed 1160 requests, 11965 remaining
Trace Analysis Methodology• General:
– How do video requests vary by day?– Mathematical distributions?– Do some machines request more than
others?
• Pattern Detection:– Inter-access times– Do users access videos all the way?– Type of file– Temporal locality
11965 accesses over twenty five months
Overall Accesses To The Lulea Server
0
50
100
150
200
250
Au
g-9
7
Se
p-9
7
Oct-9
7
No
v-97
De
c-97
Jan
-98
Fe
b-9
8
Ma
r-98
Ap
r-98
Ma
y-98
Jun
-98
Jul-9
8
Au
g-9
8
Se
p-9
8
Oct-9
8
No
v-98
De
c-98
Jan
-99
Fe
b-9
9
Ma
r-99
Ap
r-99
Ma
y-99
Jun
-99
Jul-9
9
Au
g-9
9
Se
p-9
9
Month
Da
ily A
cc
es
se
s
Movie Popularity
Movie popularity did not follow Zipf’s law -- P ~ 1/(p1-t )P = freq. of access to a document, p = its rank in popularity
Popularity Ranking
1
10
100
1000
1 10 100 1000
Rank of Movie
# o
f a
cc
es
se
s
Distribution of Requests By Machine
• About 73% of all requests from campus and surrounding community
• For requests from within campus:– 2% of all machines (11) => 21% of requests– 10% of machines (53) => 50% of requests
• Lab machines
Inter-Access Time
0
500
1000
1500
2000
2500
3000
10
0
30
0
50
0
70
0
90
0
11
00
13
00
15
00
17
00
19
00
21
00
23
00
25
00
27
00
29
00
31
00
Seconds Bin (100 seconds each)
Nu
mb
er
of
Ac
ce
ss
es
.%
10.%
20.%
30.%
40.%
50.%
60.%
70.%
80.%
Partial Access• 61% of accesses went to completion
– 39% stopped early• Suggests browsing pattern
Percentage of Movie Seen
0
500
1000
1500
2000
2500
5 15 25 35 45 55 65 75 85 95
Percentage
Nu
mb
er
%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
File Category Variations• Access patterns vary by file category
– Lectures have temporal locality of access• Many accesses shortly after going online
– Entertainment videos do notAccess Patterns of Various Titles
0
10
20
30
40
50
0 20 40 60 80 100 120 140 160 180 200
Number of days elapsed
Nu
mb
er
of
ac
ce
ss
es
FeatureFilm1
SMD074_980210
SMD104_971028
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Trace Stack Previous Stack Position Counter
123
000
Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie1123
000
Trace Stack
Position Counter
Previous Stack Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie2Movie1
123
000
Trace Stack
Position Counter
Previous Stack Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie2Movie1
123
100
Trace Stack
Position Counter
Previous Stack Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie2Movie1
123
200
Trace Stack
Position Counter
Previous Stack Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie3Movie2Movie1
123
200
Trace Stack
Position Counter
Previous Stack Position Counter
(increment previous location of currently referenced document)
Temporal Locality
• LRU stack analysis
GET Movie1GET Movie2GET Movie2GET Movie2GET Movie3GET Movie1 : :
Movie1Movie3Movie2
123
201
Trace Stack
Position Counter
Plot this after running through the entire trace
Previous Stack Position Counter
Temporal Locality: Result
Temporal Locality Characteristics
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70
Position in LRU Stack
Per
cen
tag
e o
f R
efer
ence
s
Conclusion
• Videos are relatively large (to capture entire lectures, movies)
• Users browse portions of video
• A small number of machines accounted for a large number of accesses
• High temporal locality of trace accesses
Future Work
• Further analysis on inter-access patterns
• Repeat analysis on traces from other VoW type experiments, cache traces ...