automatic data collection: server logs

26
Automatic Data Collection: Server Logs

Upload: bayle

Post on 12-Jan-2016

39 views

Category:

Documents


3 download

DESCRIPTION

Automatic Data Collection: Server Logs. As with all methods, have to ask:. What are the goals for your system? What constitutes success, or good quality service? How can you conceptualize and operationalize quality? What information can you get using this method? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automatic Data Collection: Server Logs

Automatic Data Collection:Server Logs

Page 2: Automatic Data Collection: Server Logs

As with all methods, have to ask:

• What are the goals for your system?– What constitutes success, or good

quality service?– How can you conceptualize and

operationalize quality?

• What information can you get using this method?

• How will this info help you evaluate performance?

Page 3: Automatic Data Collection: Server Logs

Sources of data about visits and visitors

• Provided by users– Registration, and whatever

demographics and preferences are asked about

• Captured by system– Server log files– Cookies

Page 4: Automatic Data Collection: Server Logs

Benefits of monitoring data

• Can yield lots of data for relatively low investment

• Unobtrusive; “outcroppings”• Numbers communicate well• Numbers are useful for comparisons

– “hits are up 20% over this time last year”

Page 5: Automatic Data Collection: Server Logs

However: what do the data mean?

Page 6: Automatic Data Collection: Server Logs

One example of simple stats

• Compare January (with DWR photos) to March (DWR photos removed)

• http://elib.cs.berkeley.edu/webstats/Mar2002.html

Page 7: Automatic Data Collection: Server Logs
Page 8: Automatic Data Collection: Server Logs
Page 9: Automatic Data Collection: Server Logs

Common measures

• “According to Forrester Research, many companies still use hits as the primary measurement of website success, followed by page views and session length.”

Page 10: Automatic Data Collection: Server Logs

Hit

“The retrieval of any item, like a page or a graphic, from a Web server. For example, when a visitor calls up a Web page with four graphics, that's five hits, one for the page and four for the graphics. For this reason, hits often aren't a good indication of Web traffic. See page view.”

http://www.webopedia.com/TERM/h/hit.html

Page 11: Automatic Data Collection: Server Logs

Measuring success• “Companies sometimes make the mistake of buying elaborate

software packages that analyze data a million ways, and then neglect to look at the most basic, day-to-day measurements of how a site is doing in its primary function….

• For an e-commerce site, those basic measurements are conversion rate—that is, the ratio of buyers to visitors—and average order size.

• For sites that make money via advertising banners… the number of ad banners viewed;

• other sites can measure traffic from return visitors versus traffic from new visitors.

• Remember one of the most basic elements of delivering a good customer experience: making sure that pages load quickly, even when the site is barraged with traffic.”

http://www.cio.com/archive/051500_parade.html

Page 12: Automatic Data Collection: Server Logs
Page 13: Automatic Data Collection: Server Logs

Server logs contents

• Time • IP Address • Server • Action • Object• Result code and size• Browser version and platform

• Referring URL

Page 14: Automatic Data Collection: Server Logs

Server log contents

• Time | IP Address | Server | Action | Object | Result code and size | Browser / version and platform | Referring URL

• 01:50:17 216.126.148.89 - ICICWEB1 GET /images/pdq.gif - 200 793 290 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+98) http://128.231.164.190/pdq.html

• 01:50:18 216.126.148.89 - ICICWEB1 GET /images/banner1.gif - 200 4067 294 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+98) http://128.231.164.190/pdq.html

• 01:50:18 216.126.148.89 - ICICWEB1 GET /images/news.gif - 200 1054 291 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+98) http://128.231.164.190/pdq.html

Page 16: Automatic Data Collection: Server Logs

Some issues in using log data

• Differentiating users from machines or proxies– Cookies and registration

• Relating IP addresses, user locations, user characteristics

• identifying sessions– Cookies; assumptions about nature of

sessions

• Measuring hits– cached pages?

• Interpreting results relative to your goals

Page 17: Automatic Data Collection: Server Logs

One source recommends:– Who is visiting your site

– unique visitor identification so you know whether a visitor is returning to your site.

– The path visitors take through your pages -- “visitor trails”– knowing each page a visitor viewed and the order, you can

identify trends in how visitors navigation through your pages.

– what element (link, icon) a visitor clicked on each page to go to the next page.

– How much time visitors spend on each page– They say: “A pattern of lengthy viewing time on a page

might lead you to deduce the page is very interesting or very confusing.”

– But…How do you know what (else) the user is doing?

Page 18: Automatic Data Collection: Server Logs

Recommendations, cont.– Where visitors are leaving your site

– The last page a visitor viewed before leaving your site might be a logical place to end the visit, or it might be a place where the visitor bailed out.

– The success of users’ experiences at your site– Purchases transacted, downloads completed, and

information viewed are concrete indicators of tasks accomplished.

From Tec-Ed, Inc., "Assessing Web Site Usability from Server Log Files" on Tec-Ed., Inc. Web site http://www.teced.com/c_and_p.html#WU

Page 19: Automatic Data Collection: Server Logs

Another example promises statistics about:

• Web server activity– number of visitors, the number of unique IPs,

bandwidth used, number of hits they received, broken down by Time Increment, Day of the Week, and Hour of the Day

• Type of data visitors access on your site – Web pages viewed, files downloaded, directories

accessed, images accessed during a time period. Broken down by Page Views, Browsing Sequences, Downloaded Files, Accessed Directories, Accessed Images.

• Referrer information– Referring Domains and Referring URLs. (Referrers

are sites with links to your site. )

Page 20: Automatic Data Collection: Server Logs

Promises, cont.

• Search engine performance– the search engines which referred visitors to

the site, the phrases and keywords visitors searched for broken down by Top Search Engines, Keywords, and Each Search Engine.

• Visitors' geographic region– Displays a Most Active Countries graph and a

table showing which Countries your visitors come from.

• Browsers and platforms visitors used• Errors visitors encountered at the site

Page 21: Automatic Data Collection: Server Logs

Promises, cont.

• Advanced visitor filters– Visitors who accessed specific pages or files. – Visitors who came from specific referring URLs. – Day of Week (Example: see what happened on a

specific day); Hour of Day. – Visitors whose first visit is a specific page. – Visitors' countries or regions. – Visitors who make purchases on your web site:

see information on visitors who actually buy something from your web site.

Source: http://www.123loganalyzer.com/features.htm

Page 22: Automatic Data Collection: Server Logs

cookies

• Simulate continuous connection, session

• Identify user• Store info about user, preferences,

past activity http://www.netscape.com/newsref/std/cookie_spec.html

Page 23: Automatic Data Collection: Server Logs

Cookies

• “the server nytimes.com wishes to set a cookie that will be sent to any server in the domain nytimes.com

• The name and value of the cookie are nytime-s …

• The cookie will persist until Tues April 8 14:25:04 2003”

Page 24: Automatic Data Collection: Server Logs

Set-Cookie: NAME=VALUE; expires=DATE;path=PATH; domain=DOMAIN_NAME; secure

• NAME=VALUE : a sequence of characters. The only required attribute.

• expires=DATE : valid life time of that cookie. Once reached, cookie no longer stored or given out.

• domain=DOMAIN_NAME : When searching the cookie list for

valid cookies, domain attributes of the cookie are compared with domain name of host from which URL will be fetched. Default is the host name of the server which generated the cookie response.

• path=PATH; the subset of URLs in a domain for which the cookie is valid. If not specified, is assumed to be the same as the document described by the header which contains the cookie.

• Secure: Cookie will only be transmitted if the communications channel with the host is a secure one.

Page 25: Automatic Data Collection: Server Logs

Other methods

• Analyses of queries on site search engines

• Emails:– Customer queries and requests for more

information– Customer complaints

• Suggestion boxes

Page 26: Automatic Data Collection: Server Logs

Analyses

• Frequencies• Cross tabulations

– Page visited by IP address

• Correlations– Beware of assumptions about causality

• Graphics• Exponential distributions