hpl- 10/3/2015 lucy cherkasova h 1 characterizing locality, evolution, and life span of accesses in...

37
HPL- 03/22/22 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova and Minaxi Gupta Hewlett-Packard Labs

Upload: nicholas-randall

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH1

Characterizing Locality, Evolution, and Life Span of Accesses in

Enterprise Media Server Workloads

Ludmila Cherkasova and Minaxi Gupta

Hewlett-Packard Labs

Page 2: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH2

Introduction

Streaming media – a new wave of rich Internet content Video is popular for:

News Sports Entertainment Education Training

Enterprise media servers: Online advertisement Web marketing Customer interaction centers Collaboration Training

Page 3: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH3

Challenges

Streaming media delivery challenges: Real time High bandwidth Magnitude amount of storage Sensitivity to network congestion

Understanding the nature of media server workloads is crucial for properly provisioning current and future services

Page 4: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH4

Related Work

Studies of educational workloads Non streaming multimedia stored on web servers (Acharya et al.,

1998) mMod (multicast Media on demand) with mix of educational and

entertainment content (Acharya et al., 2000) eTeach and BIBS (Almeida et al., 2001)

Media proxy analysis (University of Washington, Chesire at al., 2001) Results showed very little locality: 78% of files are accessed once

Page 5: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH5

Goals of Our Study

Characterize access patterns for enterprise media servers Extract some QoS related metrics for media sites (from the logs) Characterize locality properties and compare them with

traditional web workloads characterization Characterize evolution of site content and rate of changes on

the site Two new metrics: new files impact and life span

Characterize dynamics of the sites and growth trends Design a tool (MediaMetrics) for service providers

Page 6: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH6

Data Collection Sites

HP Corporate Media Solutions server (HPC), for over 2.5 year: November, 1998 to April, 2001 (Windows Media Server) Video coverage of major events Keynote speeches, addresses, and presentations Meetings with industry analysts Promotional events and product introduction Demos of product usage

HPLabs Media Server (HPLabs), for 1 year 9 months: July, 1999 to April, 2001 (RealServer G2), internal server Coffee talks, prominent presentations, seminars, meetings Cooltown videos HP wide business events, etc

Page 7: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH7

Media Server Log Formats

Media access logs record information about all request and responses processed by media server

Windows Media Server and RealServer G2 have different log formats

Typical (common) fields: Client IP address Timestamp of the request File name of the requested video The advertised duration of video (in sec) The size of requested file (in bytes) The elapsed time of the requested media file when the play ended The average bandwidth available to a client in Kb/sec

(during the session) Number of bytes sent by the server Number of bytes received by the client, etc.

Page 8: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH8

Media Sessions

Clients can pause, rewind, fast forward, skip using a slide bar A session is a sequence of client requests corresponding to the

same file access Windows Media Server Logs contain a separate entry for each

client request (a session = multiple requests) RealServer log did not have this information

Page 9: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH9

Summary Statistics HPC HPLabs

Duration 29 months 21 months

Total sessions 666,074 14,489

Total requests 1,179,814 NA

Unique files 2,999 412

Unique clients 131,161 2,482

Storage requirement

42 GB 48 GB

Bytes transferred

2,664 GB 172 GB

In HPC, 471 files corresponded to live streams: we excluded them from further analysis

Page 10: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH10

Files and Session Characteristics

Distribution of stored videos and percentage of corresponding client accesses to those files

42% - short videos (less than 10 min)23% - medium video group (10-30 min)34% - long video (longer than 30 min)

Page 11: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH11

11% - short videos (less than 10 min)10% - medium video group (10-30 min)79% - long video (longer than 30 min)

Interesting observation: the client accesses are almost uniformly distributed across the 6 analyzed classes for both workloads

This is a very useful property for synthetic workload generation.

Page 12: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH12

Session Duration Characterization

77-79% of sessions were less than 10 min7-12% of sessions were 10-30 min long6-13% of sessions longer than 30 min. In spite of a significant difference in the type of content for both workloads (in terms of file duration distribution) the client viewing behaviors were almost identical for both workloads: browsing nature of client behavior

Page 13: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH13

Client Interactivity

Percentage of sessions with interactive requests for different file size classes.

99.9% of sessions with interactive requests were high-bandwidth sessions with available bandwidth greater than 56 Kb/s

15.3% of interactivity for short sessions, 22.6% - for medium sessions,62.2% of sessions - for long sessions.

Page 14: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH14

Encoding Rates and Available Bandwidth

59% of files encoded at 56Kb/s and lower.1999 year: 1.7% of the files encoded at a rate between 128-256Kb/s2001 year: 27.8% of the files encoded at a rate between 128-256Kb/s

Most of the files and the corresponding average bandwidth available to the user show a good alignment.

Page 15: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH15

67% of the files are encoded at 256Kb/s and higher.

The gap between the demand and and available bandwidth per session is very high.

The information provided by MediaMetrics could be used by service providers for choosing the right encoding rates.

Page 16: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH16

Completed and Aborted Sessions

Completed sessions: 29% for HPC 12.6% for HPLabs

However, difference in bandwidth was not too much different between completed and aborted sessions.

Most of the aborted sessions accessed initial segments of media files.

Incompleted sessions accessing any other segment (other than beginning): 1.5% in a short video group 2.4% in a medium video group 4-7% in a long video group

Page 17: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH17

QoS Related Observations

Media access logs report Number of bytes sent by the server Number of bytes received by the client

MediaMetrix estimates the percentage of bytes lost during the file transfer to implicitly judge about QoS observed by the client

Lost bytes estimates produces useful results when data transmitted over UDP (HPC server is using UDP, HPLabs server -- TCP)

It might be less accurate for data transmitted over TCP: in presence of congestion, media server will retransmit part of data to compensate

for lost packets the difference in server sent bytes and clients received bytes not always explicitly

result in worse QoS (due to buffering on a client side)

Two groups of media sessions• low-bandwidth sessions (with available bandwidth less than 56 Kb/s)• high-bandwidth sessions (with available bandwidth greater than 56 Kb/s)

Page 18: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH18

QoS Related Observations

• HPC had 61% of high-bandwidth sessions• HPLabs had 23% of high-bandwidth sessions• High-bandwidth sessions transferred 4-6 times more bytes• HPC workload : QoS observed by low- and high-bandwidth

sessions was practically the same: • 96.5% of low-bandwidth sessions had 0-5% of bytes loss per

session• 97.1% of high-bandwidth sessions had 0-5% of bytes loss per

session• HPLabs workload QoS :

• 64.6% of low-bandwidth sessions had 0-5% of bytes loss per session

• 88.8% of high-bandwidth sessions had 0-5% of bytes loss per session

• It stresses the essential role of available bandwidth for media sessions over TCP

Page 19: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH19

Locality Characterization

Locality invariant for web server workloads: 10% of most popular files account for 90% of all requests and 90% of all bytes transferred

HPC: 90% of media sessions target 14% of the filesHPLabs: 90% of media sessions target 30% of the files

HPC: sessions to 14% of most popular files transfer 94% of bytesHPLabs: sessions to 30% of most popular files transfer 92% of bytes

Conclusion: locality invariant is applicable for media workloads too!

HPC: 14% of the most popular files are accessed by 96% of clientsHPLabs: 30% of the most popular files are accessed by 97% of clients

Page 20: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH20

Locality from System Resource Usage Angle

Let define active storage set as combined size of all the media files accessed in the logs

80% to 88% of sessions are to files that constitute only 20% of active storage set

82% to 92% of all transferred “most popular” bytes are to only 20% of active storage set

These normalized metrics are useful to estimate storage requirements and potential bandwidth savings when designing or applying optimization technique

Page 21: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH21

Zipf or Not a Zipf?

Zipf-like distributions were observed for web servers and web proxies workloads as well as was reported in the recent study for media proxy workload

the popularity of i-th most popular file is proportional to

Distribution of the file access frequencies (file popularities) for entire duration of the log – not a Zipf!

Question: does it depend on log duration?

Page 22: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH22

Web servers: typical value of alpha varies varies between 1.4 – 1.6Web proxies: typical value of alpha is less than 1, it varies varies between 0.64 to 0.83Media proxies: alpha = 0.47

HPLabs media server: six month periods can be approximated with Zipf-like distribution and alpha=1.6

Page 23: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH23

HPC media server: files popularity on a monthly basis can be aproximated with Zipf-like distribution and alpha=1.5

For different months, alpha varies between 1.4 to 1.6.

These observations are very useful for synthetic workload generation.

Page 24: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH24

File Sharing Statistics

Both workloads exhibit high degree of clients’ file sharing access pattern!

HPC: 70 most popular files are accessed by more than 1000 clients, with some most popular files accessed by 10,000-12,000 clients

HPLabs: 17 most popular files are accessed by 113-341 unique clients

Page 25: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH25

Rarely Accessed Files Statistics

Files Requested

up to

1 / 5 / 10 times

Storage

Requirements for

Corresponding Files

HPC 16% / 38% / 47%

10% / 26% / 34%

HPLabs 19% / 45% / 59% 17% / 39% / 52%

• These numbers are lower than compared to similar statistics for web server workloads

• For web server workloads, “onetimers” may account for 20% to 40% of the files and 20% to 40% of the active storage

Page 26: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH26

Dynamics and Evolutions of Media Sites Burstiness

Some days exhibit two orders of magnitude higher number of sessions for both workloads

For enterprise web server workloads, daily traffic amount is much more predictable

Studies of educational media server workloads showed less degree of burstiness, more correlated with the day of the week

Page 27: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH27

New Files Impact (HPC)

We define a file being new if it was never accessed before (based on the information in access logs)

Our intent: to observe the site’ dynamics and evolution due to new files

HPC site has explicit growth trend with respect of total number of files accessed per month, and consistently steady amount of new files added to a site monthly.

Page 28: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH28

New Files Impact (HPLabs)

The growth of total number of files accessed each month for HPLabs is negative!?

We asked the support team: any specific reasons?

Suspicion was is there a significant number of files that “nobody watches”?Or the actual information of new media content on that site decreased over time?

Team confirmed that only limited number of new files was added lately because of a transition plan to upgrade the entire site design and equipment

So, the negative trend was observed correctly.

Page 29: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH29

New Files Impact (Unique Clients)

These graphs are again correlated with the trends of the sessions to new files!

Conclusion: the number of new files added per month plays a crucial role in defining the site dynamics, evolution, and growth rates!

Page 30: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH30

New Trends Over Time

Analysis of HPC workload over time revealed interesting overall trends in site media content and session characteristics

Total number of unique clients accessing media content in each 6 month duration doubled over the duration of our logs.

Total number of sessions in each 6 month duration also doubled over the duration of our logs.

Average file size in each 6 month duration increased from less than 7MB to more than 20MB in our logs.

Bytes transferred per session increased from just over 1MB to over 6MB in our logs.

Page 31: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH31

New Files Impact (conclusion)

• The access pattern of enterprise media servers resembles with the access patterns of new web sites: most of the client monthly accesses (50-80%) target newly added information.

• Dynamics of enterprise web sites exhibits much more stability: only 2% of monthly requests are to the new files.

Page 32: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH32

Life Span of File Accesses

Question: how much does the popularity of the file and frequency of accesses changes over time?

Enterprise media server workloads exhibit high locality of references: 90% of media sessions target only 14%-30% of the files

We define the core-90% as the set of most frequently accessed files that makes up for 90% of all the media sessions (it is performance critical set of files)

Life duration of a file : time between the first and the last accesses to this file in the considered workload.

Page 33: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH33

Life Duration of the Files

High percentage of short-lived files: HPC: 37% of all files live less than a month HPLabs: 50 % of all files live less than a month73% of the files live less than 6 months for both workloadsOnly 8-10% of the files live longer than a year.

Question: what is the density of accesses over time?The plotted histograms for most frequent files had lognormal-like curve with most accesses occurring during first 1-3 weeks after the files introduction.

Page 34: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH34

Life Span Metric

Life span metric: cumulative distribution of accesses to the files since their introduction at a site. HPC HPLabsFirst week: 52% 51%Second week: 16% 10%Third week: 6% 5%4th and 5th weeks: 3% 1%.Enterprise media servers exhibit access patterns similar to news web sites:• most of accesses are to new documents, and • after certain time period these documents are accessed very rare

Page 35: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH35

Rate of Change

Life span is normalized metric: the files could have been individually introduced at different times.

The metric reflects the rate of change of the files during their existence at the site.

Life span metric reflects timeliness of the introduced files: Longer life span means that information at the site is less timely and

has more consistent percentile of accesses over time.

Life span metric allows one to interpolate the intensity of the client accesses over time to the new and existing files over a future period of time.

Page 36: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH36

Conclusion

Media server access logs are invaluable source of information about traffic access patterns and system resource requirements

MediaMetrics was specially designed for service providers and system administrators to understand nature of traffic to their media sites

Our analysis established a set of invariants specific for enterprise media servers workloads and compared them with well known related invariants and observations for web server workloads

Page 37: HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova

HPL- 04/19/23 Lucy CherkasovaH37

Acknowledgments

Both tool and study would not have been possible without media access logs and help provided by Nic Lyons, Wray Smallwood, Brett Bausk, Magnus Karlsson, Wenting Tang, Yun Fu, John Apostolopoulos, and Susie Wee.

Their help is highly appreciated.