cs193h: high performance web sites lecture 14: rule 11 – avoid redirects steve souders google...

28
CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google [email protected]

Upload: gabriel-hoffman

Post on 26-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

CS193H:High Performance Web Sites

Lecture 14: Rule 11 – Avoid Redirects

Steve SoudersGoogle

[email protected]

Page 2: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

announcementsmidterm Friday 10/31 3:15-4:05pm

30-40 short answer questions

10/29 3:15pm (now) – check five Web 100 Performance Profile sites

11/3 – Doug Crockford from Yahoo! will be guest lecturer talking about "Ajax Performance"

Page 3: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

3xx status codes"further action needs to be taken by the user agent in order to fulfill the request"• 300 Multiple Choices (based on Content-Type)• 301 Moved Permanently• 302 Moved Temporarily (aka, Found)• 303 See Other (clarification of 302)• 304 Not Modified • 305 Use Proxy• 306 (no longer used)• 307 Temporary Redirect (clarification of 302)

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3

response for conditional GET request

most popular

HTTP/1.1

Page 4: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

redirect example

go to the new location instead of the original onewhy use redirects?

prettier URLstrack trafficauthentication

GET / HTTP/1.1Host: astrology.yahoo.com

Request

HTTP/1.1 301 Moved PermanentlyLocation: http://shine.yahoo.com/astrology/

Response

Page 5: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

worst blocker

inserting a redirect to the HTML document is worse than how stylesheets and scripts block• all resources in the page are delayed• the user gets very little feedback (nothing in the

page)• rendering, even the HTML text, is delayed

2nd worse – redirecting to a script

Page 6: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

caching redirects

"Moved Permanently" – is it cached?no

spec: "cacheable if indicated by a Cache-Control or Expires header field"

GET / HTTP/1.1Host: astrology.yahoo.com

Request

HTTP/1.1 301 Moved PermanentlyDate: Tue, 28 Oct 2008 07:39:53 GMTLocation: http://shine.yahoo.com/astrology/Cache-Control: privateConnection: closeTransfer-Encoding: chunkedContent-Type: text/html; charset=utf-8

Response

Page 7: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

caching: 301, 302, Expires

past Expires no Expires future Expires

301Moved

Permanently

don't cache:all

don't cache: IE, FF3, Safari, Opera

cache: FF2, Chrome

don't cache: IE, FF3, Safari, Opera

cache: FF2, Chrome

302Moved

Temporarily

don't cache:all

don't cache: IE, FF3, Safari, Opera, Chrome

cache: FF2

don't cache: IE, FF3, Safari, Opera

cache: FF2, Chrome

IE 7 & 8(beta 2), Firefox 2.0 & 3.0, Safari 4, Opera 9.61, Chrome 0.2

FF2 and Chrome – only browsers to cache redirects*Firefox 3.1 fixes regression from FF2 to FF3

past Expires no Expires future Expires

301Moved

Permanently

don't cache:all

don't cache: IE, FF3, Safari, Opera

cache: FF2, Chrome

don't cache: IE, FF3*, Safari, Opera

cache: FF2, Chrome

302Moved

Temporarily

don't cache:all

don't cache: IE, FF3, Safari, Opera, Chrome

cache: FF2

don't cache: IE, FF3*, Safari, Opera

cache: FF2, Chrome

Page 8: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

redirect alternativesJavaScript

document.location = "destination.php";

what if JavaScript is disabled or not present?

meta refresh – put in document's HEAD<meta http-equiv="refresh"

content="0; url=destination.php">in IE, causes conditional GET requests for all resources

(similar to Reload button)

# of seconds

Page 9: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

cache workaround<html><head>

<script type="text/javascript">window.onload = function () { document.location = "destination.php"; }</script>

<noscript><meta http-equiv="refresh"

content="0; url=destination.php"></noscript></head>

one last thing – make this document cacheable!

need to let the page load so it can be cached

Page 10: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

redirects in the top 10

mostly ads

# redirects

www.aol.com 5

www.ebay.com

www.facebook.com

www.google.com/search

search.live.com/results

www.msn.com 1

www.myspace.com

en.wikipedia.org/wiki

www.yahoo.com 2

www.youtube.com

Page 11: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

common uses1. redirect from blah.com to www.blah.com2. missing trailing slash3. tracking internal traffic4. tracking outbound traffic5. prettier URLs, preserve old URLs6. connecting web sites7. ads8. authentication

Page 12: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 1: www

GET / HTTP/1.1Host: aol.com

Request

HTTP/1.x 301 Moved PermanentlyDate: Tue, 28 Oct 2008 23:01:42 GMTExpires: Tue, 28 Oct 2008 23:31:42 GMTLocation: http://www.aol.com/

Response

why redirect from http://aol.com/ to http://www.aol.com/?• set cookies on www domain – non-issue• cache resources once regardless of which URL is used

http://aol.com/logo.gif

http://www.aol.com/logo.gif

Page 13: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 1: www in the top 10

status

Expires

aol.com 301 +30 mins

ebay.com 301

facebook.com 301 July 1997

google.com 301 +30 days

search.live.com na

msn.com 301

myspace.com 301

wikipedia.org 200

yahoo.com 301

youtube.com 303 Apr 1971

303 See Other – "MUST NOT be cached"

how is this possible?!

which top 10 sites redirect from blah.com to www.blah.com?

Page 14: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 1: www & Wikipedia

all resources referenced via full URLseasy, if you're doing

CDNdomain shardingcookieless domain

another alternative<base href="http://www.wikipedia.org">

Page 15: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 2: missing trailing slash

GET /msn HTTP/1.1Host: astrocenter.astrology.msn.com

Request

HTTP/1.x 301 Moved PermanentlyLocation: http://astrocenter.astrology.msn.com/msn/

Response

reasons to redirect for missing trailing slash:autoindexing

workaround: don't use autoindexing

relative URLs for resourcesworkaround: use base href, full URLs, or URLs

relative to root

Page 16: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 3: internal tracking

GET /_ylt=Al…ume/**http%3A//tools.search.yahoo.com/about/forsearchers.html HTTP/1.1

Host: m.www.yahoo.com

Request

HTTP/1.x 302 Moved TemporarilyLocation: http://tools.search.yahoo.com/about/forsearchers.html

Response

"More" link on Yahoo! front pageworkaround: track referer [sic] on internal servers

Page 17: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 4: outbound tracking

GET /url?…url=http%3A%2F%2Fwww.npr.org/… HTTP/1.1Host: www.google.com

Request

HTTP/1.x 302 FoundLocation: http://www.npr.org/

Response

clicking on a Google search resultworkarounds:

image beacon – race conditionsXMLHttpRequest readyState 2 – faster, more complexHTML 5

<a ping="http://..."><link rel=pingback href="http://...">

Page 18: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 5: prettier URLs

GET / HTTP/1.1Host: music.myspace.com

Request

HTTP/1.x 302 MovedLocation: http://profile.myspace.com/index.cfm?fuseaction=music

Response

prettier URLs are easier to rememberalso, preserve old URLs when code changesworkaround: mod_rewrite, cacheable redirects

Page 19: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 6: connecting siteshttp://toolbar.google.com/

http://toolbar.google.com/index.html http://toolbar.google.com/T5/

http://toolbar.google.com/T5/intl/en/index.html http://www.google.com/tools/firefox/toolbar/

FT3/intl/en/index.html

redirects are an easy way to "integrate" separate teams (T4, T5), separate code bases (IE, FF), separate servers (toolbar, www)

workarounds: CNAMEs, mod_rewrite

Page 20: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 7: adsspecifically, counting ad impressionsadvertisers and publishers have a hard time

reconciling the countwhen do you count an ad impression?• when a page containing an ad is served?

what if the page never arrives?• when a page containing an ad arrives at the client?

what if the ad request fails, or the user stops the page?• when the content of the ad (image, Flash) is

requested from the advertiser? what if the user leaves the page before the content arrives?

• after the content arrives?is it the publisher's fault if the content is slow?

Page 21: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 7: adshow do you count an ad impression?• when a page containing an ad is served?

count it on the publisher's backend• when a page containing an ad arrives at the client?

send a beacon from the client• when the content of the ad (image, Flash) is

requested from the advertiser? count it on the advertiser's backend

• after the content arrives?send a beacon from the client

redirects can help count when content is served and reconcile the two parties

Page 22: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 7: adshttp://ad.doubleclick.net/im…5|1;;cs=o%3fhttp://ad.doubleclick.net/dot.gif?258824979

http://ad.doubleclick.net/dot.gif?258824979http://eatps.web.aol.com:9000/open_web_adhoc?subtype=40…458

http://www.aolcdn.com/pops_promo/pixelhttp://ad.doubleclick.net/ad/N553.AEAOLService/B2775919.11;dcadv=1297440;sz=1x1;ord=4613012?

http://m1.2mdn.net/viewad/1297440/1x1.gif

from http://www.aol.com/double logging?

Page 23: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 7: adshttp://ads.cnn.com/event.ng/Type=count&Clie…ARd

http://i.cdn.turner.com/cnn/images/1.gifhttp://ad.doubleclick.net/ad/N3880.SD146.3880/B3107454.25;dcove=o;sz=1x1;ord=dwgksue,beqpWcytARh?

http://m1.2mdn.net/viewad/1139835/67-1x1.gif

from http://www.cnn.com/double logging?

Page 24: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 8: authenticationcookies are used for authenticationcookies can only be set on the page's domainhow authenticate someone on domain A if

they're currently on domain B? redirects

authentication is often on https servershow authenticate someone on https if they're

currently on http?redirects

Page 25: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

use 8: authenticationhttps://www.google.com/accounts/ServiceLoginBoxAuth

https://www.google.com/accounts/CheckCookie?continue=http%3A%2F%2Fgroups.google.com%2Fgroups%2Fauth%3F_done… http://groups.google.com/groups/auth?_done…

http://groups.google.com/groups/auth?_done… http://groups.google.com/

one reason why redirects with Set-Cookie are sometimes not cached

Page 26: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

avoid redirectseliminate the need

base href or full URLs for resourcesreferer trackingHTML 5 – A ping and LINK pingbackCNAMEsmod_rewriteno autoindex

make them cacheable301 with future ExpiresJavaScript & meta refresh with future Expires

Page 27: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

Homeworkstudy for midterm – 10/31 3:15-4:05pm11/7 11:59pm – rules 4-10 applied to your

"Improving a Top Site" class project

Page 28: CS193H: High Performance Web Sites Lecture 14: Rule 11 – Avoid Redirects Steve Souders Google souders@cs.stanford.edu

QuestionsWhat's the status text for 301 and 302?What HTTP response header contains the URL the user is

redirected to?Why are redirects worse than stylesheets and scripts in

terms of blocking?If a redirect is "Moved Permanently", does that mean it's

cached?Which browsers today cache redirects?What are two other techniques for doing redirects? How

do they compare to the 301/302 status approach?