of the university of washington web mining of the...
TRANSCRIPT
![Page 1: of the University of Washington Web Mining of the ...faculty.washington.edu/.../Melihawebmining.pdf · 3 The Information School of the University of Washington Mining Topic Specific](https://reader036.vdocument.in/reader036/viewer/2022080718/5f78617a2ffcc474c753c095/html5/thumbnails/1.jpg)
1
Th
e I
nf
or
ma
tio
n S
ch
oo
lo
f t
he
Un
ive
rs
ity
of
Wa
sh
in
gt
on
Web Mining
Meliha Yetisgen-Yildiz
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Outline
• Definition of Web Mining • Web Mining Taxonomy• Examples
– Mining Topic Specific Concepts and Definitions
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
• Web is a huge collection of – Documents– Hyper link information– Access and usage information
• Mining enormous wealth of information on the Web– Financial information (i.e. stock quotes)– Book stores (i.e. Amazon)
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
nWWW Facts
• Unstructured: No standards and heterogeneous
• Dynamic: Growing and changing very rapidly
• Size: Too huge for effective data warehousing and mining
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Related Fields
• Natural Language Processing• Information Retrieval• Machine Learning• Statistics• Information Visualization
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Web StructureMining
Web ContentMining
Web PageContent Mining
Search ResultMining
Web UsageMining
General AccessPattern Tracking
CustomizedUsage Tracking
Web Mining Taxonomy
![Page 2: of the University of Washington Web Mining of the ...faculty.washington.edu/.../Melihawebmining.pdf · 3 The Information School of the University of Washington Mining Topic Specific](https://reader036.vdocument.in/reader036/viewer/2022080718/5f78617a2ffcc474c753c095/html5/thumbnails/2.jpg)
2
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Web StructureMining
Web ContentMining
Web Page Content MiningWeb Page Summarization WebLog (Lakshmanan et.al. 1996),WebOQL(Mendelzon et.al. 1998) …:Web Structuring query languages; Can identify information within given web pages •Ahoy! (Etzioni et.al. 1997):Uses heuristics to distinguish personal home pages from other web pages•ShopBot (Etzioni et.al. 1997): Looks for product prices within web pages
Search ResultMining
Web UsageMining
General AccessPattern Tracking
CustomizedUsage Tracking
Web Content Mining
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Web Content Mining
Web UsageMining
General AccessPattern Tracking
CustomizedUsage Tracking
Web StructureMining
Web ContentMining
Web PageContent Mining Search Result Mining
Search Engine Result Summarization•Clustering Search Result (Leouskiand Croft, 1996, Zamir and Etzioni, 1997): Categorizes documents using phrases in titles and snippets
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Web ContentMining
Web PageContent Mining
Search ResultMining
Web UsageMining
General AccessPattern Tracking
CustomizedUsage Tracking
Web Structure Mining
Web Structure MiningUsing Links•PageRank (Brin et al., 1998)•CLEVER (Chakrabarti et al., 1998)Use interconnections between web pages to give weight to pages.
Using Generalization•MLDB (1994), VWV (1998)Uses a multi-level database representation of the Web. Counters (popularity) and link lists are used for capturing structure.
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Web StructureMining
Web ContentMining
Web PageContent Mining
Search ResultMining
Web UsageMining
General Access Pattern Tracking
•Web Log Mining (Zaïane, Xin and Han, 1998)Uses KDD techniques to understand general access patterns and trends.Can shed light on better structure and grouping of resource providers.
CustomizedUsage Tracking
Web Usage Mining
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Mining
Customized Usage Tracking
Adaptive Sites (Perkowitz and Etzioni, 1997)Analyzes access patterns of each user at a time.
Web site restructures itself automatically by learning from user access patterns.
Web UsageMining
General AccessPattern Tracking
Web Usage Mining
Web StructureMining
Web ContentMining
Web PageContent Mining
Search ResultMining
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Web Content Mining Example
L. Bing, C.W. Chin, and H.T. Ng. (2003) “Mining Topic Specific concepts and Definitions on the Web”. WWW 2003, Budapest, Hungary.
![Page 3: of the University of Washington Web Mining of the ...faculty.washington.edu/.../Melihawebmining.pdf · 3 The Information School of the University of Washington Mining Topic Specific](https://reader036.vdocument.in/reader036/viewer/2022080718/5f78617a2ffcc474c753c095/html5/thumbnails/3.jpg)
3
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Mining Topic Specific Concepts and Definitions
• Goal:– “To help people learn in-depth knowledge of a
topic systematically on the Web”
• Main Assumption:– The typical path of a person who wants to learn
more on a new topic• First : Definitions and/or descriptions of the topic• Second: Sub-topics and/or salient concepts of the
topic
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
System Architecture
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Evaluation
• 28 search topics from Computer Science• Precision comparison based on
– Top 10 results returned by WebLearn, Google, AskJeeves
– Related pages = Pages with definitions
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
nResults
0.000.0050.0010. Time Series
40.0020.0090.009. Fuzzy logic
20.0030.0080.008. Neural Network
0.000.0040.007. Linear Algebra
50.0033.3383.336. Relational Calculus
0.000.0033.335. Computer Vision
11.1122.2277.784. Machine Learning
50.0037.5075.003. Web Mining
10.0030.0070.002. Data Mining
0.000.0050.001. Artificial Intelligence
AskJeevesGoogleWebLearnSearch Topic
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Salient Concepts for Information Retrieval
1. Digital Libraries2. Modern Information Retrieval3. Indexing4. Images5. Relevance Feedback6. Internet7. Modeling8. Search Engines9. Information Processing10. Machine Learning
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
References• D. Backman and J. Rubbin. Web log analysis: Finding a recipe for success. In
http://techweb.comp.com/nc/811/811cn2.html, 1997.• O. Etzioni. The world-wide web: Quagmire or gold mine? Communications of ACM,
39:65-68, 1996.• U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in
Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
• C. Faloutsos. Access methods for text. ACM Comput. Surv., 17:49-74, 1985.• R. Feldman and I. Dagan. Knowledge discovery in textual databases (KDT ). Proc.
1st Int. Conf. Knowledge Discovery and Data Mining, Montreal, Canada, Aug. 1995.
• J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.• T. Imielinski and H. Mannila. A database perspective on knowledge discovery.
Communications of ACM, 39:58-64, 1996.• R. Meo, G. Psaila, and S. Ceri. A new SQL -like operator for mining association rules. In
VLDB'96, 122-133, Bombay, India, Sept. 1996.
![Page 4: of the University of Washington Web Mining of the ...faculty.washington.edu/.../Melihawebmining.pdf · 3 The Information School of the University of Washington Mining Topic Specific](https://reader036.vdocument.in/reader036/viewer/2022080718/5f78617a2ffcc474c753c095/html5/thumbnails/4.jpg)
4
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
References• J. Graham-Cumming. Hits and miss-es: A year watching the web. In Proc. 6th Int. World
Wide Web Conf., Santa Clara, California, April 1997.
• M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access patterns. In Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997.
• J. Pitkow. In search of reliable usage data on the www. In Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997.
• T. Stabin and C. E. Glasson. First impression: 7 commercial log processing tools slice & dice logs your way. In http://www.netscapeworld.com/netscapeworld/nw-08-1997/nw-08-loganalysis.html, 1997
• T. Sullivan. Reading reader reaction : A proposal for inferential analysis of web server log files. In Proc. 3rd Conf. Human Factors & the Web, Denver, Colorado, June 1997.
• L. Tauscher and S. Greenberg. How people revisit web pages: Empirical findings and implications for the design of history systems. International Journal of Human Computer Studies, Special issue on World Wide Web Usability, 47:97-138, 1997 Th
e In
form
atio
n Sc
hool
of th
e U
nive
rsity
of
Was
hing
ton
References• G. Salton, J. Allen, C. Buckley, and A. Singhal. Automatic analysis, theme
generation, and summarization of machine-readable texts. Science, 264:1421-1426, 1994.
• O. R. Za"iane, M. Xin, and J. Han. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. In Proc. Advances in Digital Libraries Conf. (ADL'98), pages 19-29, Santa Barbara, CA, April 1998.
The
Info
rmat
ion
Scho
olof
the
Uni
vers
ity o
f W
ashi
ngto
n
Questions ???