web site optimization
DESCRIPTION
TRANSCRIPT
Welcome
Web Site Optimization
Presentation By:Sunil Patil
Our sponsors:
Name of Presentation
Page 4
AGENDA
AGENDA FOR THE SESSION
What is web site optimization ?
Why you should worry about web site optimization ?
Suggestions for optimizing web site
o Make fewer requests
o Use Caching
o Minimize request overhead
o Minimize response size
o Optimize browser rendering
Tools
Name of Presentation
Page 5
What is web site optimization?
WHAT IS WEB SITE OPTIMIZATION
End user cares about how much time it takes to render a page in his browser
(Perceived performance) and how fast he can move from one page to another
When you access a page in browser, it performs following steps to render page
o Make request
o Get HTML response (We focus mostly on this)
o Parse HTML response
o Find out resources (JS, CSS, Images) required on the page
o Download resource
o Parse resources
o Execute resources
During web site optimization, we try to optimize each of the above steps and try to
improve the perceived performance of the web site
Name of Presentation
Page 6
Connected.atech.com
Time to generate HTML 0.9 secTime to render page 40 sec.
Name of Presentation
Page 7
Advantages of web site optimization
WHY YOU SHOULD THINK ABOUT OPTIMIZING YOUR WEBSITE
Less than 15- 20 % of time is spent on generating and downloading html
o Improving this performance is not easy. It might require,
• Creating new architecture
• Re-fractoring code, introducing caching
• Tune backend
o If you improve this part by say 50 % overall gain will be 8-10%
More than 80% of time is spent in downloading, parsing and executing resources
o Improving performance is easy
• Configuration changes at infrastructure level
• Additional tasks, guidelines during development and build phase
• Additional components at infrastructure level
o If you improve this part by 50 %, overall gain will be 40%
Web pages are getting richer and complex (50 + resources, Ajax,..)
Name of Presentation
Page 8
Lessons Learned
LESSONS LEARNED FROM WEBSITE OPTIMIZATION EXPERIENCE AT CLIENT
Load testing does not fully capture all the performance related problems
o Business users, use older browsers compared to technical users
o Location of users matters
o Network speed matters
Changing HTTP Server level configuration takes time
o HTTP Servers are normally shared across different teams
o Application teams would be on different release cycle so they might not make changes
in their code
We under estimate the impact of web site optimization
As sites are getting richer, complex, there is greater need for web site optimization
and lot of research is happening in this area
Name of Presentation
Page 9
Make fewer request
1
Name of Presentation
Page 10
What is parallel connection ?
HOW PARALLEL CONNECTIONS IN BROWSER WORKS
The Http 1.1 specification says that a browser should allow at the most two parallel
connections per host name. So if your web page has 50 resources then browser will
start 2 downloads and queue the rest. Once a download is finished it will start next
download from queue
The total round-trip time is N/X, where N is the number of resources to fetch from a
host
Name of Presentation
Page 11
Number of parallel connection
NUMBER OF PARALLEL CONNECTIONS DEPEND ON BROWSER
Older browsers follow 2 parallel connections per host rule, but newer browsers use
more parallel connections.
o IE 6/7 -> 2
o IE 8 -> 6
o Firefox 2 -> 2
o Firefox 3 -> 6
o Safari 3/ 4 -> 4
o Opera -> 4
o Chrome -> 4
Browser can bring down number of parallel connection in special cases
o If you use IE 8 on dial up connection it will use 2 parallel connections
Name of Presentation
Page 12
Effect of number of parallel connections
2 parallel connections
6 parallel connections
Name of Presentation
Page 13
Script blocks parallel download
SCRIPT DOWNLOAD IS STOP ALL EVENT IN SOME BROWSER
When a browser encounters a <script> tag in html, it will stop everything until it
downloads the script , parses and executes the script
o Script tag might have a document.write(), which could affect the page content so
browser waits for the script to download and execute
• If your script performs long executing operation onload then it could cause issues
o Scripts on page must be executed in proper order
• second.js might depends on first.js, so first.js must be executed before second.js
Some of the newer browser download scripts in parallel but execute them in order
Name of Presentation
Page 14
Effect of browsers that block everything for scitpt
Browsers that blocks everything while downloading script
Name of Presentation
Page 15
How to improve parallelization
WHAT CAN WE DO TO ACHIEVE MORE PARALLELIZATION
Browsers limit number of parallel connections per hostname, so easiest way to get
around this problem will be to use multiple host names for downloading resources.
You can use one hostname to download HTML and up to 4 hostnames for
downloading other resources
o You can use www.static-atech.com for downloading resources. The
www.staticatech.com will actually point to same server
Combine files of similar type
o Use tools like Dojo Shrink Safe, YUI Compressor to combine multiple JS files
• Create a custom Dojo Build with additional classes, widgets,.. etc
o Use YUI Compressor to combine multiple .css files
o Use Images Maps, CSS Sprites
Inline smaller/non- cacheable resources
Name of Presentation
Page 16
Use caching
2
Name of Presentation
Page 17
Expiry based caching
WHAT IS EXPIRY BASED CACHING ?
Setting expiry caching header instructs browser to load resource from disk instead of
network. You can let browser know that it can cache response for certain period of
time
o The HTTP 1.1 Specification introduced Cache-Control header, you can set Cache-
Control: max-age=<noofseconds> and browser will cache the resource for
<noofseconds>. If it gets another request for resource during that time it will just use it
from disk
o The HTTP 1.0 Specification had Expires header. You can set Expires: Fri, 1 Oct 2010
12:00:00 GMT(Date in GMT) format. The browser will cache the resource and use it till
1st October
If you set both Cache-Control and Expires header then Cache-Control will take
precedence, older HTTP clients don’t understand Cache-control
Resource might get purged from cache if the browser’s cache size is reached
Name of Presentation
Page 18
What happens if you don’t set caching headers
HOW BROWSERS AND CACHES DEAL WITH ABSENCE OF EXPIRY RELATED HEADER
If you don’t want browser to cache a resource then you must set Cache-Control: no-
cache
If you don’t set either Expires or Cache-Control header, then browser or cache proxies
can use heuristic expiration
o Http Clients will read value of Last-Modified and if the resource is not changed for 10
months it will cache it for 1 months (Expiration Time = Now + 0.1 * (Time since Last-
Modified))
• Firefox
• IE 7
• Caching proxies
o Basic idea is if a resource is not changed for long time then it has less chance of
changing in future
Different clients might use different algorithms to come up with expiration time and
result could be unpredictable
Name of Presentation
Page 19
What can you do to improve caching ?
USE AGGRESSIVE CACHING OF STATIC RESOURCES
If you don’t know when resource will be updated, you should configure your site so
that HTML never gets cached and other resources get cached for long time (Months
or years)
o HTML document has references to all the resources on the page, so if a resource is
changed change its reference/URL in the HTML
• Change the file name Ex. From test.js to test_v1.js
• Change the folder Ex test.js to v1/test.js
• Create mod_rewrite rule. Ex v1/test.js, v2/test.js, v3/test.js gets mapped to test.js
If you know precisely when resource will be updated set Expires to that date
Name of Presentation
Page 20
Caching static resources
HOW TO CONFIGURE CACHING AT HTTP SERVER LEVEL
Apache HTTP Server has mod_expires module that you can be used to generate
expiry based caching header in response
o Sets both Cache-control and Expires header
o Can set headers for static content served by HTTP Server as well as static content
returned by the WebSphere’s File Serving Servlet
o Granular control, Can set headers globally or at URL, directory level
o Can set different expiry rules based on response content type, file extension,..
o This configuration says that images should be cached for 3 month and other resources
should be cached for 1 month
ExpiresActive On
ExpiresDefault "access 1 month"
ExpiresByType image/gif "access plus 3 month"
Name of Presentation
Page 21
Caching dynamic resources
HOW TO CONFIGURE CACHING OF RESOURCES SERVRED BY WEBSPHERE
The file serving servlet (Used for serving static files) does not set expires/cache-
control header. You can add ServletFilter in your web application
You can set Expires/Cache-Control headers in Servlet
WebSphere Portal server has navigatorservice.properties file that lets you configure
overall portal level caching, caching for ATOM feed
You can configure WPS to make anonymous page cachable, process is complicated
The Portlet Specification 2.0 has concept of expiration cache, which you can use for
setting Cache-control max-age and public/private header
o Set expiration-cache and cache-scope in portlet.xml
o Use ResourceResponse.getCacheControl() to get object of javax.portlet.CacheControl
and call its method setExpirationTime() and setPublicScope() methods
o Use ResourceURL.setCacheability() so that WPS generates cache friendly URLs
Name of Presentation
Page 22
Validation based caching
WHAT IS VALIDATION BASED CACHING ?
When a static HTML file is served (Apache HTTP Server, WebSphere’s File Serving
Servlet), the server will send Last-Modified header will value equal to date when the
file was modified (OS date)
Apache HTTP Server can generate ETag for static files based on its modification
time, size,..
If you don’t set Cache-Control: no-store, browser will store the response in cache
But every time you request the resource(No cached, or stale) it will send Conditional
GET request, with If-Modified-Since, If-None-Match header
Server will check if the resource is actually modified, if not it will return HTTP 304
with no body(Average 250 byte response) to indicate that browser can use the
response
Validation based caching is better than getting full HTTP 200 response with full body
but worst than cached resource which does not require HTTP request
Name of Presentation
Page 23
Validation based caching
HOW CACHE VALIDATION WORKS
The HTTP Specification has concept of Conditional GET, that helps client to prevent
download of same resource repeatedly
The Server can send Last-Modified, ETag header in response
HTTP Client (Browser, caching proxies) will copy the resource in disk cache along with
the headers
Next time when you request that resource the client will add If-Modified-Since and If-
None-Match headers to the request with the value that it had on disk
Server compares this values to the version it has and sends a HTTP 200 OK, with full
resource in the body of response if the resource is changed but if the resource is not
changed the server will send HTTP 304 Not Modified with only headers
o Original resource could be say 100kb, but the HTTP 304 respose will be 200-250 bytes,
you can save on download size
o Client has to make a request using one of the connections from parallel connection pool
Name of Presentation
Page 24
How validation caching works
Name of Presentation
Page 25
Configure ETag
WHY YOU SHOULD CONSIDER DISABLING ETAG
ETags are introduced to help with multiple HTTP server environment
HTTP Server can generate ETag(Similar to a version number) for the static
resources. Its enabled by default. The default format of ETag is INode MTime Size
Apache HTTP Server sends both Last-Modified and ETag header. You cant disable
Last-Modified. Browser will send both If-Modified-Since and If-None-Match header to
check if resource is still valid
As per HTTP Specification both IMS and INM conditions should be met for server to
return HTTP 304 (Desired behavior with smaller response)
If your request goes to HTTP server that has different file permission but same date,
Server will return HTTP 200 instead of HTTP 304
You can configure, disable ETag by adding FileETag None to httpd.conf. Or at least
configure it to FileETag MTime
Name of Presentation
Page 26
Leverage proxy caching
HOW TO CACHE RESOURCE ACROSS USERS
Big portion of internet traffic goes through caching proxy
o Proxy provided by ISP
o Proxy provided by corporate network for outbound connection
o Proxy infront of your web server for inbound connection
Enabling public caching in the HTTP headers for static resources allows the browser
to download resources from a nearby proxy server rather than from a remoter origin
server
o Proxy will share cached resources across proxies
You use the Cache-control: public header to indicate that a resource can be cached
by public web proxies in addition to the browser that issued the request.
Set appropriate Vary header (Vary: Accept-Encoding, User-Agent)
Name of Presentation
Page 27
Minimize request overhead
3
Name of Presentation
Page 28
HTTP Requst
WHAT HAPPENS WHEN BROWSER REQUESTS A RESOURCE
When you try accessing a resource in your browser, it performs following steps
o DNS resolution
o Establish HTTP connection
o Send request
o Receive response
You should try and reduce overhead on each of these steps
Name of Presentation
Page 29
Reduce DNS resolution time
REDUCE DNS RESOLUTION TIME
Before a browser establishes a connection with server it must resolve host name into
IP address. This value is cached by
o Operating System
o Browser
The DNS record cache has short life time and might have to traverse hierarchy to get
record
Reducing the number of unique hostnames from which resources are served cuts
down on the number of DNS resolutions that the browser has to make
Don't use more than 1 host for less than 5 resources, balance resources across host
names
Serve early loaded JavaScript from same domain as that of host
o Browsers block parallel download while downloading JavaScript, so it should be as fast
as possible
Name of Presentation
Page 30
Use HTTP Persistent Connection
WHAT IS HTTP PERSISTENT CONNECTION AND WHY YOU SHOULD CARE
Web clients often open connection to same site for downloading HTML and related
resources. HTTP 1.1 (Keep Alive in HTTP 1.0) allows HTTP devices to keep TCP
connection open after transaction complete and to reuse the preexisting connection
for future HTTP requests. The connections that are kept open after transaction are
called persistent connection
o You can avoid slow connection setup
o You can avoid slow-start congestion adaption phase.
Persistent connections are more efficient when used in conjunction with parallel
connections.
Starting from HTTP 1.1 connection is persistent by default unless you set
Connection: close
You can set “KeepAlive on” in Apache to turn on persistent connection
Name of Presentation
Page 31
Persistent Connection
Name of Presentation
Page 32
Size of HTTP Request
WHY SIZE OF HTTP REQUEST MATTERS ?
Most users have asymmetric connection, upload to download speed is in ration 1:4 to
1:20. That means uploading 500 bytes is same as downloading 10 KB. We cant
compress data in HTTP request. You should try and keep your request size small so
that it fits in one packet of 1500 bytes
Initial HTTP request suffers from Startup Throttling
HTTP request is made up of following things
o Request header set by browser
o URL, Referral URL
o Cookies
You should try and reduce size of each of the request components
Name of Presentation
Page 33
Request for static resource
Name of Presentation
Page 34
Minimize cookie size
HOW YOU CAN REDUCE COOKIE SIZE
Enterprise applications need at least few big cookies that we cant avoid
o LTPA Token, JSessionId, SSO related cookies
Every time a client sends an HTTP request, it has to send all associated cookies that
have been set for that domain and path along with it.
o Use server side storage for cookie for most of the cookie payload and send only a Key
in the cookie.
o Serving static resources from a cookie less domain reduces the total size of requests
made for a page
• Static resources do not need cookies
• Typical static file will be less than 10 KB, so more time is spent in making request then
getting response
Name of Presentation
Page 35
Minimize response size
4
Name of Presentation
Page 36
Compress response
USE GZIP FOR COMPRESSING RESPONSE
Compressing resources with GZip will reduce the size of resource by 70 %
Most modern browsers support compressed data. Browser sends Accept-Encoding
header to specify what all encodings it supports
You can configure HTTP server to compress both static files that it serves and
dynamic content that goes through it
o You should compress only text files such as HTML, JavaScript, CSS
o You should not compress binary files such as Images, PDF, They are already
compressed and there size might increase after GZip
o You should not compress resources less than 150 bytes
Name of Presentation
Page 37
Configure GZip on Apache HTTP Server
HOW TO CONFIGURE APACHE HTTP SERVER FOR GZIP
Apache HTTP Server has a mod_deflate module that you can use to GZip the
response
You can use it to GZip both static files served by Apache and dynamic responses
that are tunneled through Apache HTTP Server
o It checks if browser supports GZip and if yes then only GZip’s response
o It allows you to configure GZip by content type
• LoadModule deflate_module modules/mod_deflate.so
AddOutputFilterByType DEFLATE text/html text/plain text/xml
o Make sure that you set Vary: Accept-Encoding so that proxy can deal with clients who
do not support GZip properly
Name of Presentation
Page 38
Minification
MINIFY TEXT FILES
Minification is the practice of removing unnecessary characters from the code to
reduce its size
o Extra spaces
o Line breaks
o Indentation
o Comments
You can use tools to minify
o JavaScript
o CSS
o HTML
Name of Presentation
Page 39
Minify JavaScript
WHY MINIFY JAVASCRIPT
Compacting JavaScript code can save many bytes of data and speed up
downloading, parsing, and execution time.
Minification will reduce size by up to 30 %
There are several tools that you can use for minifying JavaScript
o Dojo Shrink safe
o YUI Compressor
o Google’s Closure compiler
Task to minify JavaScript should be part of your build script
You can also minify JavaScript on the fly using Servlet Filter
Name of Presentation
Page 40
Minify CSS
WHY MINIFY CSS
Compacting CSS code can save many bytes of data and speed up downloading,
parsing, and execution time.
Minifying CSS has same advantages that of minifying JavaScript
There are several tools for minifying CSS
o YUI Compressor
o Cssmin.js
You can add task to minify CSS in the build script
You can minify CSS on the fly using Servlet Filter
Name of Presentation
Page 41
Minify HTML
WHY COMPACT/MINIFY HTML
Compacting HTML code, including any inline JavaScript and CSS contained in it, can
save many bytes of data and speed up downloading, parsing, and execution time.
There are YUI Tag libraries that you can use to compress inline JavaScript and CSS
WebSphere generates quite few blank lines and white spaces in HTML
o Set com.ibm.wsspi.jsp.usecdatatrim property to true in Web Container Custom
settings to bring size of generated HTML by up to 15%
Name of Presentation
Page 42
Optimize Images
WHY OPTIMIZE IMAGES
Properly formatting and compressing images can save many bytes of data
Images saved from programs like Fireworks can contain kilobytes of extra comments,
and use too many colors, even though a reduction in the color palette may not
perceptibly reduce image quality
Choose an appropriate Image file format
o PNGs are almost always superior to GIFs and are usually the best choice
o Use GIFs for very small or simple graphics and for images which contain animation.
o Use JPGs for all photographic-style images.
o Do not use BMPs or TIFFs.
Use an image compressor
Name of Presentation
Page 43
Optimize browser rendering
5
Name of Presentation
Page 44
What is optimizing browser rendering
OPTIMIZE BROWSER RENDERING
Once resources have been downloaded to the client, the browser still needs to load,
interpret, and render HTML, CSS, and Javascript code. By simply formatting your
code and pages in ways that exploit the characteristics of current browsers, you can
enhance performance on the client side.
o Put CSS at the top of the document
o Always specify content type encoding
o Specifying a character set early for your HTML documents allows the browser to begin
executing scripts immediately
o Put JavaScript at the end of the document
o Avoid CSS expressions
Name of Presentation
Page 45
Tools
6
Name of Presentation
Page 46
Testing tools
WHAT TOOLS SHOULD YOU USE FOR TESTING
Traditional load testing tools like Load Runners are not well suited for capturing
browser performance data
o They take simplistic view of HTTP transaction
o Browser has lot of logic and variations
Use load testing tools that run inside browser
o iOpus iMacros
o Selenium
o Gomez
Name of Presentation
Page 47
Yahoo YSlow
Name of Presentation
Page 48
Google Page speed
Name of Presentation
Page 49
Charles Web Debugging Proxy
Name of Presentation
Page 50
Reference
MORE INFORMATION
My Blog (http://wpcertification.blogspot.com/search/label/clientsideperformance)
High performance web site, Oreilly Publication
Even faster web site, Oreilly Publication
THANK YOU FOR WATCHING
CONTACT INFO:
ASCENDANT TECHNOLOGY, LLC
8601 Ranch Road 2222
Building I, Suite 205
Austin, TX 78730
Phone (512) 346-9580
Thank You
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
April 10, 2023