weblog analsys
DESCRIPTION
The main idea of this presentation is to give an overall idea of web log analysis tool.TRANSCRIPT
Presented By: Somnath Mazumdar [email protected]
https://www.csi.ucd.ie/users/somnath-mazumdar
z Introduction z Pros & Cons of Methods z AWStats z Google Analytics z AWStats Vs Google Analytics z Packet Sniffing z Approach z Conclusion
1
z Weblogs: Activity/transaction information of web servers
z Earlier weblogs are used to count the visitors. z Web Analysis: off-site and on-site. z On site information retrieval: 1. Page Tag
2. Historical Web data Analysis. z Usages : 1.Performance
2.Security 3.Prediction (Regression/CART) 4.Reporting&Profiling: 4.1. Web statistics 4.2. Business
Analytics(K-means, MC)
2
z Pros: 1. Accuracy: End user data. 2. Speed of Data Reporting 3. Data Collection Flexibility 4. No need of own web server
z Cons: 1. User or Firewalls can restrict tag L
2. Tag each page L 3. cannot report on non-pages hit 4. Unable to track bandwidth, server
response time or completed downloads.
3
z Pros: 1. Non-invasive data collection 2. Can track bandwidth and completed downloads 3. Helps to optimize for search engine 4. Securely capture http user names 5. Can track “spiders” or robots.
4
6. Exact content delivery information 7. Website content time-to-serve time 8. Missing or broken pages information
z Cons: 1. Proxy/caching inaccuracies
2. No event (javascript, flash or AJAX ) tracking
3. Log management :Log generation, Log storage, and log file transfer.
5
z Goal: System based or Product based z Cost: Freeware or Commercial z Storage: Log Storage (3rd party) z Report/Tips: Generate report static or real time with
tips.. AWStats is a powerful log analyzer creates
advanced web, ftp, mail and streaming server statistics reports.
Google Analytics provides in depth product marketing information and tips (Google Adwords/AdSense).
6
z Freeware z Graphically presented reports z Customizable reports z Reports based on users, OS, browser, location, data
transfer, bookmark, total visits and so on. z Standard and custom log format supported z Works from CLI as well as a CGI (Flexibility) z Written in Perl z Many desired features.. z But Less visualized/interactive (GA)
7
z Issues: 1. DNS look up & Full Year View (time) 2. Database Format Using "xml" format 3 times larger than default. 3. Feature exclude records from SPAM
referrer (5 times slower). 4. To differentiate URLs of dynamic pages
(memory). 5. Accuracy hampers speed: Keywords ( 1%),
Search Engines (9%) Worms Detection(15%), OS(2%). 6. Each Extra section reduces AWStats
speed by 8%. Wrong setup may eat all memory.
8
z Session "unknown" z AWStats counts everything as pages z Reports cannot be generate based on current/custom
date z Reports cannot be generate based on custom date
range and on weekly basis. z On few Intel Pentium4 / Xeon4 based host systems,
log file time can not be computed correctly L .
9
10
z “Google Analytics shows you how people found your site, how they explored it, and how you can enhance their visitor experience.”—Google
z Free z Help visitors by providing better keyword search z Provide information related to website design. z Tagging :Automatic for content management system
or blogging platform but manual for customize website.
z Confidentiality : Third party data processing.
11
12
Name AWStats Google Analytics Based on logs Yes Site Search data Page Tagging No Yes Hits count Count everything as
page IP address and
cookies Confidentiality Not an issue Issue (if not owner) Meant for website traffic
analysis. Website traffic and
marketing effectiveness.
Market Share NA Around 49.95% of top 1,000,000 hosts
13
z Power of analysis is limited by the information in logs. z Extensive logging that consumes resources.
….more we measure, less accurate we understand …..
Awstats, Webalizer and Google Analytics are always different due to different techniques.
Use AWStats as well as Google Analytics to have better prediction
14
15
z Packet sniffer can capture and decode data streams passing over a digital network.
z Non-intrusive technology : no log, no page tag. z Deploy sniffer into local network of servers to be tracked. z Completely transparent for tracked website(s) z Supports multiple servers without effecting server
response time.
16 Block Diagram of Packet Sniffing
z Packet sniffer can capture and decode data streams passing over a digital network.
z Non-intrusive technology : no log, no page tag. z Deploy sniffer into local network of servers to be tracked. z Completely transparent for tracked website(s) z Supports multiple servers without effecting server
response time.
17 Block Diagram of Packet Sniffing
z Client communication disconnects information z Server-side timing information z Website content delivery information z Full spectrum of hits including non-pages z Copes with proxy or browser caching z Robots and automated agents data available z Website content time-to-serve time
18
19