![Page 1: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/1.jpg)
Building an Intelligent Web:Th d P tiTheory and Practice
Pawan LingrasSaint Mary’s University
Rajendra AkerkarAmerican University of Armenia and SIBER, IndiaAmerican University of Armenia and SIBER, India
![Page 2: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/2.jpg)
![Page 3: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/3.jpg)
Discipline
Computer Science Mathematics and Statistics Management
Research Graduate Research Graduate
Chapters 1 – 8 excluding shaded portion related to
mathematics and implementation.
Complete Book Information Retrieval Web Mining
Chapters 2, 4 – 8 excluding shaded portion related to
implementation.
Chapters 1 – 8 excluding shaded portion related to
implementation.
Chapters 1, 2, 3, 7 and 8 Chapters 4 - 8
![Page 4: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/4.jpg)
Information Retrieval
![Page 5: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/5.jpg)
![Page 6: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/6.jpg)
Create a list of words
Remove stop words
Stem words
Calculate frequency of each stemmedCalculate frequency of each stemmed word
Figure 2.1 Transforming text document to a weighted list of keywords
![Page 7: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/7.jpg)
![Page 8: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/8.jpg)
Data Mining has emerged as one of the most exciting and dynamicfields in computing science. The driving force for data mining isthe presence of petabyte-scale online archives that potentiallycontain valuable bits of information hidden in them. Commercial
t i h b i k t i th l f thienterprises have been quick to recognize the value of thisconcept; consequently, within the span of a few years, thesoftware market itself for data mining is expected to be in excessof $10 billion. Data mining refers to a family of techniques usedto detect interesting nuggets of relationships/knowledge in data.While the theoretical underpinnings of the field have been aroundfor quite some time (in the form of pattern recognition,statistics, data analysis and machine learning), the practice anduse of these techniques have been largely ad-hoc. With theavailability of large databases to store manage and assimilateavailability of large databases to store, manage and assimilatedata, the new thrust of data mining lies at the intersection ofdatabase systems, artificial intelligence and algorithms thatefficiently analyze data. The distributed nature of severaldatabases, their size and the high complexity of many techniquespresent interesting computational challenges.
![Page 9: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/9.jpg)
![Page 10: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/10.jpg)
![Page 11: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/11.jpg)
![Page 12: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/12.jpg)
![Page 13: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/13.jpg)
![Page 14: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/14.jpg)
![Page 15: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/15.jpg)
![Page 16: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/16.jpg)
0 75
1
0.5
0.75ec
isio
n
0.25
Pre
00.25 0.5 0.75 1
RecallRecall
Figure 2.43 Relationship between precision and recallg p p
![Page 17: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/17.jpg)
![Page 18: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/18.jpg)
Semantic Web
![Page 19: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/19.jpg)
Semantic WebThe layer language modelThe layer language model
(Berners-Lee, 2001; Broekstra et al, 2001)
![Page 20: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/20.jpg)
<h1>Student Service Centre</h1>
Welcome to the home page of the Student Service Centre.
The centre is located in the main building of the University.
You may visit us for assistance during working days.
<h2>Office hours</h2>
Mon to Thu 8am - 6pm<br>
Fri 8am - 2pm<p>
But note that centre is not open during the weeks of theBut note that centre is not open during the weeks of the
<a href=”. . .”>State Of Origin</a>.
Figure 3.2 Example of a Web page of a Student Service Centre
![Page 21: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/21.jpg)
<organization>
<serviceOffered>Admission</serviceOffered>
<organizationName>Student Service Centre</organizationName>
<staff>
<director>John Roth</director>
<secretary>Penny Brenner</secretary>
</staff>
</organization>
Figure 3.3 Example of a Web page of a Student Service Centre
![Page 22: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/22.jpg)
Figure 3.4 Representing classes and instances (Noy et al., 2001)
![Page 23: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/23.jpg)
![Page 24: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/24.jpg)
lecturer @name EdwardBunker
course
course
@title
@titleComputati
onalAlgebra
Algorithms
lecturer
course
@name
@title Nonlinear
DanielaFrost
root college
course
@name
@title
SamHoofer
Analysis
lecturer course
co rse
@title
@title Modern
DiscreteStructures
course
course
@title
@title NonlinearAnalysis
Algebra
location Innsbruck
![Page 25: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/25.jpg)
![Page 26: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/26.jpg)
Queries 1 and 2lecturer
course
@name
@title Algorithms
EdwardBunker
lecturer
course
@name
@title
DanielaFrost
Computational
Algebra
root college
course @title
Sam
NonlinearAnalysis
Frost
lecturer course
@name
@title DiscreteStructures
SamHoofer
course @title
Nonlinear
ModernAlgebra
location
course @title
Innsbruck
NonlinearAnalysis
![Page 27: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/27.jpg)
Queries 3 and 4lecturer
course
@name
@title Algorithms
EdwardBunker
lecturer
course
@name
@title
DanielaFrost
Computational
Algebra
root college
course
@name
@title
SamHoofer
NonlinearAnalysis
lecturer course
@
@title DiscreteStructures
Hoofer
course
course
@title
@title NonlinearAnalysis
ModernAlgebra
location Innsbruck
![Page 28: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/28.jpg)
![Page 29: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/29.jpg)
![Page 30: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/30.jpg)
![Page 31: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/31.jpg)
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"p // g/ / / y #
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="">
<dc:title>
Building an Intelligent Web: Theory and Practice
</dc:title>
<dc:creator> Rajendra Akerkar and Pawan Lingras </dc:creator>
</rdf:Description> </rdf:Description>
</rdf:RDF>
Figure 3.26 Fragment of RDF
![Page 32: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/32.jpg)
A RDF model for automobiles
![Page 33: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/33.jpg)
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdf http://www.w3.org/1999/02/22 rdf syntax ns#
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:my="http://www.myvehicle.com/vehicle-schema/">
<rdfs:Class rdf:about="#Vehicle"/>
<rdfs:Class rdf:about="#Car">
<rdfs:subClassOf rdf:resource="#Vehicle"/>
</rdfs:Class>
df P t df b t "# " <rdf:Property rdf:about="#name">
<rdfs:domain rdf:resource="#Vehicle"/>
</rdf:Property>
<rdf:Description rdf:about="#Ford">
<rdf:type rdf:resource="#Car"/>
<my:name>Ford Icon</my:name>
</rdf:Description>
<my:Truck rdf:about="#Mitsubishi">
<my:name>Mitsubishi</my:name>
<my:carry rdf:resource="#Mitsubishi"/>
</my:Truck>
</rdf:RDF>
Figure 3.29 RDF/XML file for the automobile example
![Page 34: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/34.jpg)
<?xml version="1.0"?>
<topicMap id="tmrf"
xmlns = 'http://www.topicmaps.org/xtm/1.0/'
xmlns:xlink = 'http://www.w3.org/1999/xlink'>
<!--
The map contains information about Technomathematics Research Foundation.
We can include comment and narrative here…
-->
.... here my topics and my associations go ....... here my topics and my associations go ...
</topicMap>
Figure 3.30 A Topic Map document (Adopted from http://topicmaps.bond.edu.au/docs/6/1)
![Page 35: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/35.jpg)
Classification and Association
![Page 36: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/36.jpg)
Data PreparationData Preparation
• Database TheorySQL• SQL
• Data Transformation• http://www.ecn.purdue.edu/KDDCUP/data/
![Page 37: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/37.jpg)
ClassificationClassification• Find a rule, a formula, or black box classifier for
organizing data into classes. – Classify clients requesting loans into categories
based on the likelihood of repaymentp y– Classify customers into Big or Moderate Spenders
based on what they buy– Classify the customers into loyal, semi-loyal,Classify the customers into loyal, semi loyal,
infrequent based on the products they buy• The classifier is developed from the data in the
training settraining set• The reliability of the classifier is evaluated using
the test set of data
![Page 38: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/38.jpg)
ClassificationClassification
• ID3 AlgorithmID3 Algorithm– Numerical Illustration
Application to a Small E commerce Dataset– Application to a Small E-commerce Dataset• C4.5 for Experimentation• Other approaches
– Neural Networks– Fuzzy Classification– Rough Set Theory
![Page 39: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/39.jpg)
AssociationAssociation
• Market basket analysisMarket basket analysis– determine which things go together
• Transactions might reveal thatTransactions might reveal that– customers who buy banana also buy candles– cheese and pickled onions seem to occur frequently
in a shopping cart• Information can be used for
– arranging a physical shop or structuring the Web site– for targeted advertising campaign
![Page 40: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/40.jpg)
AssociationAssociation
• Apriori AlgorithmD t ti f E• Demonstration for an E-commerce Application
![Page 41: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/41.jpg)
Clustering
![Page 42: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/42.jpg)
ClusteringClustering
• Breaks a large database into differentBreaks a large database into different subgroups or clusters
• Unlike classification there are no• Unlike classification there are no predefined classesTh l t t t th th b i• The clusters are put together on the basis of similarity to each other
• The data miners determine whether the clusters offer any useful insight
![Page 43: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/43.jpg)
5
3
4
2
0
1
00 1 2 3 4 5
![Page 44: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/44.jpg)
Statistical MethodsStatistical Methods
• k – meansNumerical Example– Numerical Example
– Implementation • Data Preparation• Data Preparation • Clustering
• Other Methods• Other Methods
![Page 45: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/45.jpg)
Neural Network Based ApproachesNeural Network Based Approaches
• Kohonen Self Organising MapsNumerical Demonstration– Numerical Demonstration
– Application to Web Data Collection Oth N l N t k B d A h• Other Neural Network Based Approaches
![Page 46: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/46.jpg)
Clustering of customersClustering of customers
![Page 47: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/47.jpg)
Web Mining
W b C t t W b St t W b UWeb ContentMining
Web StructureMining
Web UsageMining
Web Page Search Result General CustomizedWeb PageContent Mining
Search ResultMining Access Pattern
Tracking
CustomizedUsage Tracking
![Page 48: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/48.jpg)
Web Usage MiningWeb Usage Mining
![Page 49: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/49.jpg)
High level web usage mining process(S i t t l 2000)(Srivastava et al., 2000)
![Page 50: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/50.jpg)
Applications of web usage mining
(Romanko, 2006; Srivastava et al., 2000)
![Page 51: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/51.jpg)
140.14.6.11 - pawan [06/Sep/2001:10:46:07 -0300] "GET /s.htm HTTP/1.0" 200 2267
140.14.7.18 - raj [06/Sep/2001:11:23:53 -0300] "POST /s.cgi HTTP/1.0" 200 499
![Page 52: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/52.jpg)
![Page 53: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/53.jpg)
![Page 54: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/54.jpg)
![Page 55: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/55.jpg)
![Page 56: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/56.jpg)
![Page 57: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/57.jpg)
![Page 58: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/58.jpg)
![Page 59: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/59.jpg)
![Page 60: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/60.jpg)
![Page 61: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/61.jpg)
![Page 62: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/62.jpg)
![Page 63: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/63.jpg)
![Page 64: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/64.jpg)
![Page 65: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/65.jpg)
![Page 66: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/66.jpg)
![Page 67: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/67.jpg)
Clustering exerciseClustering exercise
![Page 68: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/68.jpg)
![Page 69: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/69.jpg)
![Page 70: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/70.jpg)
Classification exerciseClassification exercise
Channel Recall Precision Finance 44.3% 98.27% Health 52 3% 89 66%Health 52.3% 89.66%Market 49.1% 83.34% News 44.1% 89.27% Shopping 31.5% 91.31% Specials 60.2% 92.86% Sport 50.0% 91.93%Surveys 21.9% 92.66% Theatre 54.8% 94.63%
Table 6.8 Precision and recall for predicting user’s interest in channelsTable 6.8 Precision and recall for predicting user s interest in channels
(Baglioni, et al., 2003)
![Page 71: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/71.jpg)
Association exerciseAssociation exercise
News Section
MinimumRequests
MaximumRequests
Mean Requests
Standard Deviationq q q
Science 1 97 2.3034 2.8184Culture 1 208 3.7878 5.9742Sports 1 318 5.6985 10.8360Economics 1 258 3.9335 7.2341International 1 208 3.3823 5.5540L l Li b 1 460 5 6883 11 5650Local Lisbon 1 460 5.6883 11.5650Local Port 1 256 7.5984 13.2351Politics 1 208 3.3577 5.4101Society 1 367 4.2673 7.9853Education 1 90 2.6496 3.29090
Table 6.9 Summary statistics of requests to the Publico on-line newspaper(Batista and Silva, 2002)
![Page 72: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/72.jpg)
The association mining showed strong associations between the following pairs:The association mining showed strong associations between the following pairs:
Politics and Society
Politics and International News Politics and International News
Politics and Sports
Society and International News Society and International News
Society and Local Lisbon
Society and SportsS y Sp
Society and Culture
Sports and International Newsp
![Page 73: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/73.jpg)
Sequence Pattern Analysis of W b LWeb Logs
![Page 74: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/74.jpg)
![Page 75: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/75.jpg)
![Page 76: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/76.jpg)
![Page 77: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/77.jpg)
Web Content MiningWeb Content Mining
![Page 78: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/78.jpg)
Data CollectionData Collection
• Web Crawlers P blic Domain Web Cra lers• Public Domain Web Crawlers
• An Implementation of a Web Crawler
![Page 79: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/79.jpg)
Architecture of a search engine(Romanko, 2006)(Romanko, 2006)
![Page 80: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/80.jpg)
![Page 81: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/81.jpg)
![Page 82: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/82.jpg)
![Page 83: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/83.jpg)
Other topics in Web Content MiningOther topics in Web Content Mining
• Search EnginesSearch Engines– How to prepare for and setup a search
engineengine – Types and listings of search engines
(freeware, remote hosting services,(freeware, remote hosting services, commercial)
• Multimedia Information RetrievalMultimedia Information Retrieval
![Page 84: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/84.jpg)
Web Structure MiningWeb Structure Mining
![Page 85: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/85.jpg)
0/10: The site or page is probably new.
3/10: The site is perhaps new, small in size and has very little or no worthwhile3/10: The site is perhaps new, small in size and has very little or no worthwhile
arriving links. The page gets very little traffic.
5/10: The site has a fair amount of worthwhile arriving links and traffic volume. The
site might be larger in size and gets a good amount of steady traffic with some
return visitors.
8/10: The site has many arriving links, probably from other high PageRank pages.
The site perhaps contains a lot of information and has a higher traffic flow and
i ireturn visitor rate.
10/10: The Web site is large, popular and has an extremely high number of links
pointing to it.pointing to it.
![Page 86: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/86.jpg)
http://www.iprcom.com/papers/pagerank/p p p p p g
![Page 87: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/87.jpg)
![Page 88: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/88.jpg)
![Page 89: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/89.jpg)
![Page 90: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/90.jpg)
![Page 91: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/91.jpg)
Index quality for different search engines
(Henzinger, et al., 1999)
![Page 92: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/92.jpg)
Index quality per page for different search engines
(Henzinger, et al., 1999)
![Page 93: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/93.jpg)
Page Freq. Freq. RankWalk2 Walk1 Walk1
www.microsoft.com/ 3172 1600 1www.microsoft.com/windows/ie/default.htm 2064 1045 3www.netscape.com/ 1991 876 6www microsoft com/ie/ 1982 1017 4www.microsoft.com/ie/ 1982 1017 4www.microsoft.com/windows/ie/download/ 1915 943 5www.microsoft.com/windows/ie/download/all.htm 1696 830 7www.adobe.com/prodindex/acrobat/readstep.html 1634 780 8home.netscape.com/ 1581 695 10www.linkexchange.com/ 1574 763 9www.yahoo.com/ 1527 1132 2
Table 8.2 Most frequently visited pages (Henzinger, et al., 1999)
![Page 94: Building an Intelligent Web: Theory & Practice](https://reader034.vdocument.in/reader034/viewer/2022050707/5485ad48b4af9f9b0d8b4ef4/html5/thumbnails/94.jpg)
Site Frequency Frequency RankWalk 2 Walk 1 Walk 1
www.microsoft.com 32452 16917 1home.netscape.com 23329 11084 2www.adobe.com 10884 5539 3www.amazon.com 10146 5182 4www.netscape.com 4862 2307 10excite netscape com 4714 2372 9excite.netscape.com 4714 2372 9www.real.com 4494 2777 5www.lycos.com 4448 2645 6www.zdnet.com 4038 2562 8www.linkexchange.com 3738 1940 12www yahoo com 3461 2595 7www.yahoo.com 3461 2595 7
Table 8.3 Most frequently visited hosts (Henzinger, et al., 1999)