presentations - webdocs cs ualberta - university of alberta

MoBed:MoBed: A A MoMobile Test-bile Test-BedBed for Investigating Web for Investigating WebAccess Solutions for J2ME-enabled devicesAccess Solutions for J2ME-enabled devices

ByByMildred AmbeMildred Ambe

Supervisors:Supervisors:Eleni Stroulia, Yannis NikolaidisEleni Stroulia, Yannis Nikolaidis

January 23, 2004January 23, 2004

Presentation OutlinePresentation Outline

Motivation

Introduction

Related Research

Client Baseline Architecture

MoBed Client-Proxy Architecture

Empirical Evaluation

Research contributions

Future work

Conclusion

Life is becoming more mobile!Life is becoming more mobile!

Wireless devicesWireless devices

FeaturesFeatures

Compact,mobile/wireless,

customizable to userneeds, No time/place

restrictions

UseUse

Instant messaging, voicemail, digital/video

recording, calendar,email, Web browsing

(Access to News,Corporate data), etc.

ConstraintsConstraints

Small screen/keypad,limited memory andprocessing speed,

unreliable connections(high error bit rate, low

bandwidth)

ManufacturersManufacturers

Nokia, Motorola,Samsung, Mitsubishi,Palm, Sony Ericsson,

Siemens, etc.

Why J2ME?Why J2ME?

J2ME J2ME (Java II Micro Edition):(Java II Micro Edition):

Version of Java targeting software development for smallerdevices,e.g. pagers, PDAs, phones, etc.

Increasing in popularity and widely adopted as a platform fordelivering wireless Web services. Why?– Uses Java as the programming language for software development

– Addresses a wide range of devices:

Low-end devicesLow-end devices

e.g. Cell phones, pagers, PDAs, etc.

High-end devicesHigh-end devices

e.g. Internet TVs, high-end wirelessentertainment/navigation systems, etc.

Basics of J2ME ArchitectureBasics of J2ME Architecture

J2MEJ2ME

MIDPMIDP

CLDCCLDC

K VMK VM

ProfilesProfiles

CDCCDC

J VMJ VMJ VMJ VM

J2EEJ2EE

J2SEJ2SE

ServerServer DesktopDesktop

High-endHigh-enddevicesdevices

Low-endLow-enddevicesdevices

J2ME Concepts: Profile and Configuration.J2ME Concepts: Profile and Configuration.

Profile:Profile: Vertical device family. Extends a configuration.

Configuration:Configuration: Horizontal grouping of devices

KVM:KVM: Compact JVM, small processor/memory footprint

J2ME vs. Other Java Editions:J2ME vs. Other Java Editions:

MIDPMIDP

CLDCCLDC

K Virtual MachineK Virtual Machine

Operating SystemOperating System

J2ME for low-end devicesJ2ME for low-end devices

The Research ProblemThe Research Problem

Java II Platform, Micro Edition (J2ME)Java II Platform, Micro Edition (J2ME)– Emerges as the standard for the fast-growing wireless Web industry

– Positions itself as the best solution for an extremely wide range of smalldevices [LG-J2ME].

A general test-bed is required for J2MEA general test-bed is required for J2ME– Addressing low bandwidth connectivity.

– Improving performance by locally caching accessed and anticipated web-based content, for future requests.

““an intelligent an intelligent ‘‘Client-Proxy-ServerClient-Proxy-Server’’ test-bed architecture for flexibly test-bed architecture for flexiblycombining caching and prefetching schemes while separating thecombining caching and prefetching schemes while separating the

mobile-resident and proxy-resident functionalitymobile-resident and proxy-resident functionality””

Related ResearchRelated Research

Web Caching:Web Caching:– Soft Caching Proxy System [KKO98] – Caching a ‘lower version’ of a web

object at a proxy.

– Temporal locality exploitation of HTTP requests [Dej99] – Estimatingprobabilities for predicting the time of next access of same web object.

Transcoding Web Data (via distillation/refinement):Transcoding Web Data (via distillation/refinement):– Load balancing resource locator for proxies [Cha95] – Centralized

server maintained for intelligently managing proxies and allocatingtranscoders to different proxies.

– Using real-time distillation to reduce WWW latency and bandwidthrequirements [FB95] – Different data representations created from multipledistillers.

Related ResearchRelated Research

J2ME and the Web:J2ME and the Web:– J2ME and J2EE client-server applications [Hem02] – J2ME applications

that interact with an enterprise server take on interesting challenges thattraditional client/server and browser applications do not face:

• Mobile applications show dependence on the network

• Mobile device constraints: high latency, low bandwidth, poor connectivity.

– Using J2ME / J2EE together [SUN03] – Guidelines for designing wirelessclients for enterprise applications using J2ME and J2EE technology:

• HTTP networking bridges gap between J2ME device and J2EE applicationserver

• Three aspects to networked wireless applications: client-side architecture,messaging and presentation.

Related Research (c.)Related Research (c.)

Web access on Mobile devices:Web access on Mobile devices:– ‘Scalable Browser’ [CM03] – Fetch on-demand; progressive content

rendering to client; display on-demand navigation style

– ‘m-Links’ [STHK03] – Browsing architecture designed to achieve web-navigation via “dig and do” model; partial content delivery via linkhierarchies.

– Text summarization for Web browsing [BGMP00] – Summarizing webpages into ‘STUs’ by extraction/summary techniques; display viaprogressive disclosure.

– Proxies for mobile Web access [Sab97] – Use proxies to controlinformation flow from mobile client to server.

Related Research (c.)Related Research (c.)

Caching and/or prefetching using proxies:Caching and/or prefetching using proxies:

– Prefetching based on content analysis [HBA99] – ‘Proxy-initiated

prefetching’ through collected user access logs and user interest profiles.

– Top-10 approach to prefetching [MC98] – Client-proxy-server framework;

prefetching based on clients’ access profiles & their top 10 popular docs.

– Predictive prefetching [PM96] – Study showing benefits of making

prefetching-related decisions from proxy (with local cache).

– Coordinated data prefetching [CZ01] – Coordinated proxy prefetching

technique using reference access information; prefetching at proxy or server

level.

Client Baseline ArchitectureClient Baseline Architecture A Web browser application (MIDlet) was developed to access and

display web content (static pages and dynamic forms).

The following Browser functionality was completely implemented

on the mobile device:• user interface generation/maintenance• web data retrieval• HTML parsing of data

Very inefficientVery inefficient due to limited device memory and the slowconnections to the network.

Demonstrates that simply porting an application suitable for wireddesktops is not sufficient for wireless clients – re-architecture isneeded.

Assumed as the ““reference pointreference point”” for future research comparisons.

EmulatedEmulatedclientclient

Client Baseline Architecture (c.)Client Baseline Architecture (c.)

J2ME WirelessJ2ME WirelessToolkitToolkit

Browser MIDletBrowser MIDlet

Connect to WWW Server

Retrieve URL

Generate UserInterface

Parse the HTML data

Render parsed HTML datainto displayable format

Update User Interfaceto display Web page

Web ServerWeb Server

Read inRead inURL bytesURL bytes

Request (url)Request (url)

Run theRun theapplicationapplication



c) Web Content displayedc) Web Content displayed

a) Initial User Interfacea) Initial User Interface

d) Possible Browser Operationsd) Possible Browser Operations

Fetch,Fetch,

Parse,Parse,

RenderRenderWebWeb

ContentContent

Enter URLEnter URL

b) Interface with requested URLb) Interface with requested URL

ContentContent

navigationnavigation


HTML ParserHTML Parser Easily adaptable, open source Java HTML Parser from Kizna Corp.

[HP1.1]

ν Main classes:–– HTML Node interfaceHTML Node interface

• Implemented by all types of HTML tags

–– HTML Parser & HTML Reader classesHTML Parser & HTML Reader classes•• ParserParser ⇒ Opens connection to resource; invokes HTML Reader

•• ReaderReader ⇒ Reads data from input stream; invokes HTML Tag class

–– HTML Tag classesHTML Tag classes• Represents given HTML tags e.g. links, images, titles, etc.

–– HTML Scanner classesHTML Scanner classes• Works with HTML Tag classes – identify & extract relevant information from tag

HTML Parser (c.)HTML Parser (c.) Each HTML scanner class is matched to its corresponding HTML

Tag class. Parse through all tags in the requested HTML page source,

identify the tags, extract information.

ExampleExample

<a <a href href = = http://www.ualberta.cahttp://www.ualberta.ca> University of Alberta </a>> University of Alberta </a>

HTML Link tagHTML Link tag

Parsed, informationParsed, informationextractedextracted

HTML Link NodeHTML Link Node

Tag Contents:Tag Contents:

““University ofUniversity ofAlbertaAlberta””

Link Destination:Link Destination:

““http://www.ualberta.cahttp://www.ualberta.ca””

Baseline Architecture EvaluationBaseline Architecture Evaluation Dataset:Dataset:

– Small set of 116 distinct URLs obtained from server logs from the U. ofAlberta, CS website.

Experiment:Experiment:–– ‘‘Driver MIDletDriver MIDlet’’: designed to show the state of an emulated mobile client

after a request is issued from dataset.

– MIDlet monitored client available heap sizeavailable heap size before/after resource fetchingand rendering on-device.

Starting Driver ... 482568 bytes

URI = http://www.cs.ualberta.ca/Number of HTMLNodes = 389

Start of FETCH method = 464848 FREE bytesEnd of FETCH Method = 209932 USED bytesAfter Fetch and Render = 259916 USED bytes.

MIDlet startup: Heap MIDlet startup: Heap ≅≅ 500 kB500 kB

““Parse-ableParse-able”” tags from requested tags from requestedpagepage

Bytes consumed in FETCH processBytes consumed in FETCH process = (464,848 = (464,848 –– 209,932) 209,932) B = B = 254,916 B254,916 B

Bytes consumed in RENDER processBytes consumed in RENDER process= (259,916 = (259,916 –– 209,932) 209,932) B = B = 49,984 B49,984 B

Requested URL (from dataset)Requested URL (from dataset)

Baseline Architecture Evaluation (c.)Baseline Architecture Evaluation (c.)

Results and ObservationsResults and Observations

Required Heap size Required Heap size == ƒƒ(# HTML nodes)(# HTML nodes) Fetch&Render Time Fetch&Render Time == ƒƒ(# HTML nodes)(# HTML nodes)

Required memory increases with #HTML nodes processed for each page.

More memory required for “fetch”process as compared to “render” stage.

Heap size → only an estimate ofmemory cost.

Required “fetch/render” time increaseswith # HTML nodes for each page.

‘Render time’→ Much less than ‘fetchtime’ (only subset of created HTMLNodesare useful for display).

RequestRequestDispatcherDispatcher


GUI BuilderGUI Builder

Mobile ClientMobile Client

MoBed Client-Proxy ArchitectureMoBed Client-Proxy Architecture

Two main components: Mobile clientMobile client and Proxy server.Proxy server.

(a) Mobile Client:(a) Mobile Client:

J2ME WirelessJ2ME WirelessToolkitToolkit Proxy ServerProxy Server

PP

Display browser screen Display browser screen →→GetGetURLURL

Transcoded content bytesTranscoded content bytes

Decode & render page bytes Decode & render page bytes →→UI updateUI update

RequestRequest(URL)(URL)

Transcoded bytesTranscoded bytesEmulatedEmulated

clientclient

MoBed Client-Proxy ArchitectureMoBed Client-Proxy Architecture

(b) Proxy Server:(b) Proxy Server:

Web ServerWeb ServerEmulatedEmulated

clientclient

Proxy ControllerProxy Controller

HTML ParserHTML Parser

TranscoderTranscoder

RequestRequest(URL)(URL)

ClientClientSessionSessionRegistryRegistry ProxyProxy

CacheCache

Data bytesData bytes

Request (URL)Request (URL)

Retrieve URLRetrieve URL

• Open socket connection to client.

•Listen, retrieve client requests

•Initiate Cache, Client Registry

Check Cache(URL)

UpdateClient

Session

Session TrackerSession Tracker

Obtain ClientIP

NY

• Init Parser → Connect to Web

HTMLNodes creation →Collection

• Transcode HTML Nodes

TranscodedTranscodedrequestedrequested

bytesbytes

TranscodedTranscodedrequestedrequested

bytesbytes

CacheCacheUpdatUpdat

ee

ProxyProxy

Proxy TranscoderProxy Transcoder

““TranscodingTranscoding”” – conversion of selected HTML nodes into a

compressed representation for mobile client

HTML NodeHTML NodeHTMLHTMLParserParser

Parse data

Build HTMLnodes

IN:IN:data bytesdata bytes

Out:Out:HTMLHTMLNodeNode

TranscoderTranscoder

OUT: persistedOUT: persistedbyte[ ]byte[ ]

If ( HTMLNode is ‘useful’ )

Extract from HTML Node

Build list of ‘PageNodes’

‘Persist’ the list

- Small package ofclasses residing onClient & Proxy.

- Every Page node →corresponds to a givenHTML node.

- Page nodes → stored inVector

- Serialize (Vector) → writecontents to byte array outputstream → persisted byte[ ].

CacheCache

Proxy CacheProxy Cache

Collection of previously accessed resources from client requests– Updated with new client requests over time

– Could be managed by various cache replacement policies (e.g. LRU, LFU)

– Resources stored → static documents, images, NO dynamic content

Adding an ‘element’ to cache:

““http://www.cs.ualberta.ca/http://www.cs.ualberta.ca/

Transcoded bytes:Transcoded bytes: 2342 2342““7j7gh7f755f554g6d47j7gh7f755f554g6d4””MD5MD5

HasherHasherURL

Storehash→bytes

Hash

value

““7j7gh7f54g6d47j7gh7f54g6d4””... …

ByteByte[2342][2342]

Hashing algorithm ⇒ convertsURL to unique hash code.

Proxy CacheProxy Cache

Cache management:Cache management:

– Cope with limited resources of a caching proxy using cache eviction policiescache eviction policies→ to determine what documents should be replaced when cache is full.

– Variety of eviction parameters, e.g. document size, access frequency, etc.

– 2 replacement schemes investigated in MoBed:

•• Least Recently Used (LRU)Least Recently Used (LRU)

– When cache is full, the least recently used document is replaced with the new

document to be stored.

– Cache → collection of documents sorted by cache access times.

•• Least Frequently Used (LFU)Least Frequently Used (LFU)

– When cache is full, the least accessed document is replaced with the new

document to be stored.

– Cache → collection of documents sorted by total number of accesses.

Session TrackerSession Tracker

Maintains a registry of all clients and their corresponding accesshistories (sessions) within a pre-defined time frame.– Useful heuristic for prefetching based on users’ access histories.

Example:Example: Updating a user’s access history:

<<<< IP - 123.4.567IP - 123.4.567; time - t; time - t11; ; URL 4URL 4 >>>>

IP_SessionIP_Session_Registry_Registry

IP_Time_RegistryIP_Time_Registry

Check for IP:Check for IP:123.4.567123.4.567

UpdateUpdate

Client IPClient IP SessionSession23.45.12323.45.123 34 56 78 34 56 78123.4.567 123.4.567 100 100

Client IPClient IP SessionSession23.45.12323.45.123 34 56 78 34 56 78

123.4.567 123.4.567 100 4 100 4

FromFrom

ToToUpdate the Update the latestlatestrequest timerequest time for for

IPIP

IP IP 123.4.567123.4.567requests requests URL # 4URL # 4

at time at time tt11

Prediction EnginePrediction Engine

PredictionPrediction → An informed guess of future request(s).

Prefetching scheme investigated - Prediction using Path ProfilesPrediction using Path Profiles [SKS98].

Algorithm → generates predictions based on efficient path profiles built from

previous user requests (obtained from standard HTTP logs).

νν Algorithm terminologyAlgorithm terminology::

–– PathPath → ordered sequence of URLs accessed by a single user

–– User sessionUser session → Path describing ordered, full set of requests within time frame

–– Path profilePath profile → Set of pairs, ∃ pair - path and its occurrence frequency

–– Path TreePath Tree → Generated from user paths.

RecallRecall ““One main goal of MoBed One main goal of MoBed →→ determine how prefetching can improvedetermine how prefetching can improvemobile Web access mobile Web access →→ implement a prediction scheme to achieve this. implement a prediction scheme to achieve this.””

Prediction EnginePrediction Engine

3 main steps3 main steps in using this prediction scheme:

Generate user sessions

Construct Path tree from user sessions (storing profiles).

Make predictions using path tree

Condense path profiles from path tree

Use condensed path profiles for predictions

User1User11.html3.html2.html

User2User21.html3.html3.html

UserUsersessionssessions

Session tracesSession traces

User1: 1→3→2User2: 1→3→3

Path ProfilePath Profile((∀∀ session traces)session traces)

PathPath FrequencyFrequency1→3 21→3→2 11→3→3 1

Build path profile fromall user sessions

Combinerequests

ExampleExample illustrating user sessions and path profiles:

Step 1: Generating User sessionsStep 1: Generating User sessions

Perform server-side profilingserver-side profiling → generate sessions from standard HTTP

server logs with large number of user accesses.

ν Most HTTP logs requests contain → date, access time, Client IP address,date, access time, Client IP address,

requested file name, request parameter fields.requested file name, request parameter fields.

ν Requested URLs are compressed → mapped to a unique integer (Session =

collection of integers).

ν Concept of a ‘‘useruser’’ → clients identified by their IP addresses.

νν User session generationUser session generation → all requests separated by 30 minutes are NOT

included in the same session:

–– Why?:Why?: Handle cases where users browse pages on other sites in between

accesses to the server.

Step 2: Constructing the Path treeStep 2: Constructing the Path tree

Use the generated user sessions to construct the path tree (shown in [SKS98]).

Path tree characteristicsPath tree characteristics:

– Begins with root node, and holds nodes of varied number of children.

– Downward traversal of path tree ≡≡ walking through a path of URLs

– To control tree size → set a threshold value (T-valueT-value) that restricts node expansion

31

3 2

5 12 1

5 1

Path:Path: 1→3→5OccurenceCountOccurenceCount: 1

Path:Path: 3→5OccurenceCountOccurenceCount: 1

PathTreePathTree

Root*

Path Profile Path ProfilePathPath FrequencyFrequency1→3 21→3→2 11→3→5 13→5 1

Session tracesSession tracesUser1: 1→3→2User2: 1→3→5User3: 3→5

Build profileBuild profile

ProfileProfilestored instored inPath TreePath Tree

Example:Example:

Step 2.1 - Condensing Path profilesStep 2.1 - Condensing Path profiles

Motivation:Motivation: Remove paths not needed for prediction.

νν Method:Method:– Extract all profiles from path tree [ path + occurrence count path + occurrence count ]

– ∀ paths → separate last URL (prediction)last URL (prediction) from rest of path (history path).rest of path (history path).

– Reverse the history paths & sort all entries by reversed path

–– If (2 entries have same history path)If (2 entries have same history path) → remove path with smaller count.

PathPathprofilesprofiles

a b c d, 5z b, 20d e f, 7

Paths Paths CountCount

SeparateSeparatepredictionpredictionfrom pathfrom path

HistoryHistorypathspaths PredictionsPredictions

a b c, d, 5z, b,20d e, f, 7

c b a, d, 5z, b,20e d, f, 7

ReverseReversehistoryhistorypathspaths

Sort allSort allentriesentries

CondenseCondensed Pathd Path

profilesprofiles

c b a, d, 5e d, f, 7z, b, 20

Example:Example:

Step 2.2 Step 2.2 –– Predicting using Path profiles Predicting using Path profiles

νν Use condensed path profiles for making predictions:Use condensed path profiles for making predictions:

– Obtain user’s current session

– Reverse the current session trace

– Find path in list of condensed profiles, which matches the most consecutivecharacters in reversed user session trace ⇒⇒ best guessbest guess.

– Binary search finds best path in O( logO( log22( # Paths))( # Paths))

Example:Example:

MoBedMoBedSessionSessionTrackerTracker

IN:IN:

ClientClient’’s IPs IP

OUT:OUT:

Current user Current user session for IP session for IP

x y z

(z y r, g, 1)

Reverse userReverse usersessionsession

z y x

Condensed Profile ListCondensed Profile List(z, a, 20)(z o j, k, 11)

(z y r, g, 1) (z z z, e, 10)

( … )

Search the condensedSearch the condensedprofiles for best matchprofiles for best match

SelectSelectchosen entrychosen entry

Predict:Predict:g

Proxy dataProxy datacompressioncompression

Caching atCaching atProxy levelProxy level

Empirical evaluationEmpirical evaluation

MoBed ExperimentsMoBed Experiments – simulation studysimulation study used to study performance below:

ExperimentExperiment Factor (s)Factor (s) Response variable (s)Response variable (s) # Runs# Runs

11

Proxy location

Caching scheme Proxy cache size

Proxy-processing latency8

22 Original number of databytes from Web Server

Number of Proxy-Transcodeddata bytes for client

N/A

333-13-1

Client cache size

% of Proxy accesses; % of Prefetch-interrupts from

large predicted files; # files prefetched to clients; # clients prefetched to. 51

3-23-2

- PathTree size, vary T-value- Using a Retraining phase- Workload size

% of Predicted File Hit Ratios % of Predicted Byte Hit Ratios PREDICTION ACCURACY

Client Caching &Client Caching &Proxy PrefetchingProxy Prefetching

Experiment 1Experiment 1

Caching at Proxy level:Caching at Proxy level:– Measure → delay at proxy-level only, when satisfying a client request

Workload:Workload:

– HTTP server logs from C301 website (14, 000 requests14, 000 requests with 116 distinct URLs116 distinct URLs)

–– RequestsRequests: URL + timestampURL + timestamp ← Fired from J2ME client

Experiment design:Experiment design:

–– ‘‘Passive clientPassive client’’: it fired off requests to proxy

– Proxy: Maintains a cache managed by LRULRU and LFULFU eviction policies

– 3 factors under evaluation:• Physical location of proxy server

• Cache eviction policy used

• Size of proxy cache


Factor 1: Physical location of the proxy serverFactor 1: Physical location of the proxy server

MobileRequest

Request

Response HTMLcontent

SimplifiedHTML content

““LocalLocal”” Provider Provider

High LatencyHigh Latency

MobileMobileClientClient

ProxyProxyServerServer

WebWebServerServer

““RemoteRemote”” Provider Provider

RemoteRemoteProxyProxyServerServer

MobileRequest

Request

Response: HTMLcontent

SimplifiedHTML content

““RemoteRemote”” Provider Provider

MobileMobileClientClient

ProxyProxyServerServer

WebWebServerServer

Proxy onProxy onWebWeb

ServerServermachinemachine


Factor 2: Cache eviction policiesFactor 2: Cache eviction policies

– LRU and LFU

Factor 3: Cache sizeFactor 3: Cache size

– Cache: designed to store ‘‘CacheableCacheable’’ objects (URL + transcoded Web content)

– 2 cache sizes evaluated: 50 and 100.

Simulation runs for Experiment 1:Simulation runs for Experiment 1:

Proxy Caching CacheLocation Policy Size

1 Remote LRU 502 Remote LRU 1003 Web server LRU 504 Web server LRU 1005 Remote LFU 506 Remote LFU 1007 Web server LFU 508 Web server LFU 100

Run

Factors

Proxy on Web server, Proxy on Web server, reducedreducedlatencylatency (resources are closer (resources are closer

to Proxy)to Proxy)

Under Under __ second latency second latencyfor over 99% of requestsfor over 99% of requests

Experiment 1 - ResultsExperiment 1 - ResultsTable shows: % of proxy process times that fall within the specified time range% of proxy process times that fall within the specified time rangeRecall: Recall: Measure → delay at proxy-level only, when satisfying a client request

Proxy Latency: Proxy Latency: Delays → due to fetching (from cache/Web server) and transcoding webcontent

HigherHigherlatencieslatencies

LFULFUoutperformsoutperforms

LRULRU

LowerLowerlatencieslatencies

RUNS Description (< 1 ms)(1 to 10

ms)(<= 500ms) (> 500ms)

Average latency

1 LRU_Remote_50 14.5 67.9 99.8 0.2 23.052 LRU_Remote_100 3 77.9 99.7 0.3 54.343 LRU_WebServer_50 0 96.2 99.9 0.1 10.034 LRU_WebServer_100 0 93.5 99.9 0.1 11.65 LFU_Remote_50 5.5 73.8 99.8 0.2 22.176 LFU_Remote_100 11.5 69.7 99.8 0.2 22.787 LFU_WebServer_50 0 91.9 99.8 0.2 12.328 LFU_WebServer_100 0 93.7 99.8 0.2 11.42

Percentage of Proxy processing latencies within the given time range

Experiment 1 - ResultsExperiment 1 - ResultsLRU LRU Proxy separate from serverProxy separate from server

LFU LFULatency encountered whenpage is not cached

Experiment 1 - ResultsExperiment 1 - Results

LFU LFU

LRU LRU Proxy on same machine as serverProxy on same machine as server


Data compression using Proxy Transcoder:Data compression using Proxy Transcoder:– Measure → Usefulness of Transcoder as a data compressing tool

– Calculate the difference between: # bytes from Web server vs. # bytestranscoded

Workload:Workload:– HTTP server logs from C301 website (116 distinct URLs116 distinct URLs)

Experiment design:Experiment design:

– Gather data based on:

• Byte size received from Web server

• Transcoded bytes for the client


Comparison to Client baseline Architecture:Comparison to Client baseline Architecture:

Original bytes fetched from WWW servers VS. bytes processed by Proxy server for Mobile client

-20000

0

20000

40000

60000

80000

100000

120000

140000

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

109

115

Number of URLs

Num

ber

of B

ytes

Proxy-generated bytes Original bytes from WWW

Observed trend:Observed trend:

‘OriginalOriginal’’ data content size > 20 kB data content size > 20 kB ⇒transcoding proved to be more noticeable.

Smaller data content sizesSmaller data content sizes ⇒transcoded byte sizes can sometimes belarger than the original

Encapsulation overheadEncapsulation overhead – due to therepresentation of transcoded content formobile clients (e.g. use of String objects)

RecallRecallIn baseline Architecture In baseline Architecture ⇒ HTML Parser was resident on the mobile client.

¬ Parser generates HTML nodes that are potentially useless to client e.g.HTML remark and end tags.

¬ Proxy Transcoder filters out such nodes at proxy.

Experiment Factors Levels Response variables- % of Proxy accesses;

Client cache size 8, 30, 60 kB - % of Prefetch-interrupts from large predicted files;

- # files prefetched to clients;

- PathTree size (determined

by T-value)T = 2, 3

- % of Predicted File Hit Ratios

- Using a Retraining Y (yes), N (no) - % of Predicted Byte Hit Ratios

- Workload size Partitions: C301/1_2_3 & CS/1_2

3-1

3-2


Client caching & Prefetching Proxy:Client caching & Prefetching Proxy:

– Experiment 3 is sub-divided as follows:

Workload:Workload: (2 workloads used in both 3-1 and 3-2)

–– C301C301 ⇒ C301 website (over 140,000 requests140,000 requests with 405 distinct URLs405 distinct URLs)

Small course website, small client population, less dynamic content

–– CSCS ⇒ CS Dept. website (60,000 requests60,000 requests with 24,772 distinct URLs24,772 distinct URLs)

Busier of 2 servers, much larger group of client, more dynamic content

Accuracyof

PredictionEngine

Experiment 3 - WorkloadsExperiment 3 - Workloads Each workload:Each workload:

– Divided into varied-sized partitionspartitions. Why?

– Each partition: used to create TrainingTraining and Testing setsTesting sets.• Training traces were recorded before testing traces

• Testing set = 1 / 3 of Training set size

81 405 38 322 1 1 121 74 101 97 85 971 1 74 101 87…

SessionSessionfor onefor oneIPIP

TrainingTrainingsetset 216.35.103.58 [01/Sep/2001:03:01:42 -0600] 59

216.35.116.89 [01/Sep/2001:04:15:28 -0600] 159216.35.103.74 [01/Sep/2001:06:51:57 -0600] 306

…

IP address Request timestamp Unique URL# IP address Request timestamp Unique URL#

TestingTestingsetset

Workload partitions

Partition Size(# of Requests)

# TEST Requests

1 6,276 4,677 1,599 3014 898 2512 48,704 36,528 12,176 17,286 5944 1196

# TRAIN requests

# Unique URLs from Training

set

MAX # Unique

URLs from Testing set

Total # Clients

CSCS

Workload partitions

Partition Size(# of Requests)

# TEST Requests

1 1211 909 302 77 49 652 69,580 52,185 17,395 263 227 11293 109,523 82,356 27,167 307 241 14584 137,186 103,162 34,024 318 276 1416

# TRAIN requests

# Unique URLs from Training

set

MAX # Unique

URLs from Testing set

Total # Clients

C301C301

Experiment 3 - DesignExperiment 3 - Design

Time sequence illustration:Time sequence illustration:Captures transfer of a requested/prefetched filetransfer of a requested/prefetched file to client before another client request.

Mobile ClientMobile Client

Proxy ServerProxy Server

WWW ServerWWW Server

REQ (URL)

REQ (url, M, t)

( 1 ) t + request / 9.6( 1 ) t + request / 9.6

Response(bytes)

( 2 ) t + request / 9.6 +( 2 ) t + request / 9.6 +doc_transfer_timedoc_transfer_time

Response(L bytes) ( 3 ) t + request / 9.6 +( 3 ) t + request / 9.6 +

doc_transfer_time doc_transfer_time ++proxy_transform_time +proxy_transform_time +

L/9.6L/9.6

In parallel withsending the

response, Proxystarts prefetching

( 4 ) t + request / 9.6 +( 4 ) t + request / 9.6 +doc_transfer_time +doc_transfer_time +

proxy_transform_time + L/9.6 proxy_transform_time + L/9.6 ++prefetch_doc_transfer_time +prefetch_doc_transfer_time +

prefetched_doc_transform_timprefetched_doc_transform_time + prefetched_doc_size/9.6e + prefetched_doc_size/9.6

Experiment 3 - DesignExperiment 3 - Design

Complete Simulation Methodology on MoBed Proxy:Complete Simulation Methodology on MoBed Proxy:

- Read user sessions from file- Build Path Tree- Condense Path profiles

TRAIN

- Read in test requests from file- Build MobileRequest objects

Initialize:- Cache- Session tracker- Mobile Cache Manager

StartStartSetup

Testingreqs.

- Iterate for all mobile reqs.- Check cache for request- If Proxy is Idle, start prefetch- Interrupt prefetch if necessary

TEST

- Use test reqs from last interval⇒ New Training set- Build profiles from new Train set- Update current path profile list- Build new Condense Path profiles- Resume testing phase

RE-TRAIN

IterateIterate∀∀requestsrequestsin Testingin Testing

setset

If (re-training = T),If (re-training = T),after after X time intervalX time interval,,

suspend Testingsuspend Testing

Best results!Best results!

Experiment 3 - ResultsExperiment 3 - ResultsExperiment 3-1: C301 workloadExperiment 3-1: C301 workload (Recall: Traces obtained from small, course website)(Recall: Traces obtained from small, course website)

5.3 7.2 42 3523.7 25.9 1771 88325.4 32 2624 11515.3 7.2 42 3523.8 28.2 1786 86525.9 32.6 2628 11515.6 1.5 45 3725 6.1 2155 911

T=3 27.4 8.5 3336 12135.6 1.5 45 3725.4 9.3 2158 89228.1 9 3357 12135.6 1.5 45 3725 1.6 2236 93227.7 3.5 3464 12445.6 1.5 45 3725.7 2.2 2276 93828.4 3.8 3487 1244

C301% Of

T-ValueClient

cache Size (kB)

Workload Partition

% Of NO-Proxy

Accesses

Prefetch-interrupts from large

files

Total # of

Clients

1 5.3 8 43 362 23.1 29.1 1762 8963 25.2 34.8 2695 11601 5.3 8 43 362 23.3 30 1803 8833 25.6 35.4 2724 11561 6 1.4 46 382 24.1 12.7 2147 9203 27.9 11.4 3434 12221 6 1.4 46 382 24.5 15.8 2149 9153 28.4 12 3483 12231 6 1.4 46 382 25 3.1 2293 9423 28.3 3.6 3652 12531 6 1.4 46 382 25.4 3.4 2379 9513 28.8 4 3708 1254

Y

60N

Y

T=2 30N

Y

Retrain

Total # of files

prefetched to all

clients

8N


Experiment 3-1: CS workloadExperiment 3-1: CS workload (Recall: Traces obtained from large, dense departmental server.)(Recall: Traces obtained from large, dense departmental server.)

Prefetched 1 file to 1 client, filePrefetched 1 file to 1 client, filerepeatedly requestedrepeatedly requested

Prefetch-interrupts werePrefetch-interrupts werenot from large files butnot from large files butlittle/ no proxy idle timelittle/ no proxy idle time

•• Smaller partition Smaller partition

•• Little chance for Little chance forprefetching prefetching –– busy busyserver!server!

CS% Of

T-ValueClient

cache Size (kB)

Workload Partition

% Of NO-Proxy

Accesses

Prefetch-interrupts from large

files

Total # of Clients

prefetched to

1 23.2 0 1 12 0.04 0.6 24 231 23.2 0 1 12 0.04 0.1 24 231 23.2 0 1 12 0.04 0 24 23

1 23.2 0 1 12 0.07 0.7 22 22

Y 2 0.07 0.8 22 221 23.2 0 1 12 0.07 0.1 22 22

Y 2 0 0.07 22 221 23.2 0 1 12 0.07 0 22 22

Y 2 0 0 22 22

Retrain

Total # of files

prefetched to all

clients

8 N

T=230 N

60 N

60 N

8 N

T=330 N

Experiment 3.1 - SummaryExperiment 3.1 - Summary

Impact of client caches: Impact of client caches:– Measure → with respect to: % requests satisfied from client caches, % prefetch-interrupts from large files, # clients serviced, # files prefetched.

Observations: Observations:

– From the 2 workloads used, C301 dataset is more closely related to possibleaccess behavior for mobile clients than CS dataset ( less dense URL structure;sparser access pattern over time).

– From 3-1, best results show 28.8%28.8% of all client requestsof all client requests were satisfiedfrom client caches (as a result of prefetching).

– Caching results of popular accesses in addition to prefetched documentscould be widely effective in reducing mobile client-perceived latency andoverall proxy/server loads.


Experiment 3-2: Accuracy of Prediction EngineExperiment 3-2: Accuracy of Prediction Engine

–– Predicted File Hit:Predicted File Hit:

• Recorded when Predicted item is actually the next request

• Defined by: (Total # correct guess (Total # correct guess ÷÷ total # guesses) total # guesses)

–– Predicted Byte Hit:Predicted Byte Hit:

• Recorded when there is prefetch file hit (bytes for correct guess)

• Defined by: (Total # correct file bytes (Total # correct file bytes ÷÷ total # bytes guessed) total # bytes guessed)

–– Record only Record only ‘‘client-based prediction ratesclient-based prediction rates’’::

• Do not include predictions made at end of a user’s session

• Request at the start of user’s session is not predicted

Important!Important!

Predictions are made only when the userPredictions are made only when the user’’s current session has been previouslys current session has been previously‘‘learnedlearned’’ (even though this reduces # times prefetching occurs). (even though this reduces # times prefetching occurs).


Experiment 3-2:Experiment 3-2:

40.6 32.324.1 24.121.6 21.6

40.7 32.2 1 13626.4 18.9 69 1701422.1 15.4 103 2668940.6 32.324.4 16.3

T = 3 21.6 14.5

40.6 32.3 1 13626.6 18.9 69 1701422.2 15 103 26689

40.6 32.324.4 16.421.6 14.1

40.6 32.3 1 13626.5 18.7 69 1701422.2 14.7 103 26689

- -

--

- -

C301T-

value

Client cache Size

(kB)

Workload partitions

Size of retraining

set

1 37.1 29.62 24 163 20.8 14.31 37.1 30 1 1362 24.1 16.5 69 170143 21.3 14.8 103 266891 37.1 29.62 24.4 16.13 21 141 37.1 30 1 1362 24.6 16.7 69 170143 21.5 14.4 103 266891 37.1 29.62 24.3 16.13 21 13.61 37.1 29.6 1 1362 24.5 16.2 69 170143 21.6 14 103 26689

Retrain% of

Predicted File Hits

% of Predicted Byte Hits

# of Re-trains

-

Y

8N

-

- -

Y

T=2 30N

- -

Y

60N

% % ↑↑ as as

T-value T-value ↑↑

SmallSmall% % ↑↑

a)a) Retrain phase or notRetrain phase or notb)b) T-value variationT-value variation

(C301 workload)(C301 workload)


Experiment 3-2:Experiment 3-2: a)a) Retrain phase or notRetrain phase or notb)b) T-value variationT-value variation(CS workload)(CS workload)

CSSize of

T-ValueClient Cache

Size (kB)

Workload Partitions

Re-training

set

1 36.8 35.82 67.9 73.6

36.8 35.867.9 73.636.8 35.867.9 73.6

1 35.7 22.32 71.3 78.1

Y 2 71.5 78.1 3 70651 35.7 22.32 71.3 78.1

Y 2 71.5 78.1 3 70651 35.7 22.32 71.3 78.1

Y 2 71.5 78.1 3 7065

-

60N - -

T=330

N -

- -

8N - -

60N

“

-

T=230

N“

- -

8N -

Retrain% of

Predicted File Hits

% of Predicted Byte Hits

# of Re-trains

% % ↑↑ as as

Size Size ↑↑


Experiment 3-2:Experiment 3-2: c) Workload size (varied train/test sizes)c) Workload size (varied train/test sizes)

(C301 & CS workloads)(C301 & CS workloads)Range of Range of %s%s for all for allT-values and clientT-values and client

cache sizescache sizes

% % ↓↓ as as

Size Size ↑↑

C301

# of TEST % of Predicted

Requests File Hits1 909 302 N 37.1 – 40.6 29.6 – 32.3

Y 37.1 – 40.7 30.0 – 32.32 52,185 17,395 N 24.0 – 24.4 16.0 – 16.4

Y 24.1 – 26.6 16.5 – 18.9

3 82,356 27,167 N 20.8 – 21.6 14.0 – 15.0Y 21.3 – 22.2 14.4 – 15.4

4 103,162 34,024 N 27.8 17.8 – 19.1Y 28.5 19.4 – 22.6

CS

# of TEST % of Predicted Requests File Hits

N 36.8 35.81 4677 1,599 Y - -

N 71.3 78.12 36,528 12,176 Y 71.5 78.1

Partition

# of TRAIN

requests Retrain% of Predicted

Byte Hits

Partition

# of TRAIN

requests Retrain% of Predicted

Byte Hits

Due to differentDue to differentnature ofnature of

workloadsworkloads

Experiment 3.2 Experiment 3.2 –– Summary Summary

Accuracy of the Proxy Prediction Engine: Accuracy of the Proxy Prediction Engine:– Measure → the Predicted File hit and Byte hit ratios - Prediction usingPrediction usingPath ProfilesPath Profiles [SKS98].

– File hits = (Total # correct guess (Total # correct guess ÷÷ total # guesses) total # guesses)

– Byte hits = (Total # correct file bytes (Total # correct file bytes ÷÷ total # bytes guessed) total # bytes guessed)

Observations: Observations:– Prediction algorithm evaluation:

– Inclusion of a re-training phase → very slight increase in performance (only 0.1 to 0.6% increase at best).– Prediction accuracy decreased with increased workload size– Increasing T-value → better accuracy (max. accuracy of 40.6%).

Research ContributionsResearch Contributions MoBed objective achieved:MoBed objective achieved:

– Combination of caching and prefetching schemes

– Client and proxy levels allow for separation of mobile-resident from proxy-resident functionality.

Enhanced J2ME awareness:Enhanced J2ME awareness:– J2ME: fairly new specification with rapidly growing popularity

– MoBed introduces a fresh perspective on mobile web access targeting theJ2ME platform.

MoBed architecture:MoBed architecture:– Shows that Client-proxy architecture is invaluable for wireless web access– Speculates that caching and prefetching at proxy-level could be particularly

advantageous to J2ME devices with more than minimum CLDC/MIDPrequirements.

– Even though it is not yet extensible, this architecture design is configurableand modular.

Future WorkFuture Work

Architecture deployment:Architecture deployment:– Currently, application development has been restricted to the J2ME

Wireless toolkit.

– Deployment on actual J2ME-enabled device will reveal issues/concernsnot apparent in the architecture at the moment.

Improvements to some MoBed components:Improvements to some MoBed components:– Improved HTML Parsing scheme

– Data transcoding technique with reduced overhead, efficient HTMLcompression

– Implement an accurate model for simulating transfer delays on the mobile-to-proxy link.

– Investigate additional caching/prefetching schemes

Future Work (c.)Future Work (c.)

Building a MoBed framework:Building a MoBed framework:– Such a framework - ideal for flexiblyflexibly combining caching and prefetching

schemes for investigating Web access solutions.

– Possible framework hookshooks – caching and/or prefetching extension point,data compression/transcoding and HTML parsing extension points.

– Such hooks would ensure framework adaptation by enabling/disabling,replacing or augmenting the extensions.

Servicing multiple, diverse clients:Servicing multiple, diverse clients:– Proxy services available to J2ME clients with different device

capabilities/constraints.

– ‘Discrimination’ between clients: maintain knowledge base of all clients &their device capabilities; prefetch differently (liberal/conservative) to clientsbased on their capabilities.

ConclusionConclusion

Wireless devices like cell phones, pagers, etc. are increasinglyused nowadays:– They provide convenient services: email, instant messaging, Internet access,

etc.

– They are not restricted by place or time; easily customizable to user needs.

A lot of research has been reported on performance of Webcaching/prefetching for wired Internet access; challenges arise in awireless network.

Achieved its objective via the implementation and experimentation ofAchieved its objective via the implementation and experimentation ofMoBed Client-Proxy-Server architecture.MoBed Client-Proxy-Server architecture.

Results show promise Results show promise ⇒⇒ encouraging further research in this area. encouraging further research in this area.

ReferencesReferences[Hem02] - David Hemphill; J2ME and J2EE: Together – “At Last Sun has developed a blueprint for

creating mobile and wireless applications that access enterprise services—where do we go from

here?”

[SUN03] - Java Blueprints for a Wireless white paper - Designing Wireless Clients for Enterprise

Applications with Java Technology;

[CM03] - H. Chen and P. Mohapatra; A Novel Navigation and Transmission Technique for mobile

handheld devices;

[STHK03] - Bill N. Schilit, Jonathan Trevor, David M. Hilbert, and Tzu Khiau Koh; m-links: An

infrastructure for very small Internet devices;

[BGMP00] - O. Buyukkokten, H. Garcia-Molina, and A. Paepcke; Seeing the Whole in parts: Text

Summarization for Web Browsing on Handheld devices;

[Sab97] - K. Sabnani; Wireless Data Services;

[HP1.1] - HTML Parser version 1.1; http://htmlparser.sourceforge.net/

[SKS98] - S.E. Schechter, M. Krishan, M.D. Smith; Using Path Profiles to Predict HTTP Requests;

Questions & AnswersQuestions & Answers

presentations - webdocs cs ualberta - university of alberta

Documents