information retrieval
DESCRIPTION
Information Retrieval. For the MSc Computer Science Programme. Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17. Dell Zhang Birkbeck , University of London. … (30). agriculture. biology. physics. CS. space. dairy. botany. cell. AI. courses. crops. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/1.jpg)
Information Retrieval
Lecture 7Introduction to Information Retrieval (Manning et al. 2007)
Chapter 17
For the MSc Computer Science Programme
Dell ZhangBirkbeck, University of London
![Page 2: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/2.jpg)
Yahoo! Hierarchy
http://dir.yahoo.com/science
dairycrops
agronomyforestry
AI
HCIcraft
missions
botany
evolution
cellmagnetism
relativity
courses
agriculture biology physics CS space
... ... ...
… (30)
... ...
![Page 3: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/3.jpg)
Hierarchical Clustering
Build a tree-like hierarchical taxonomy (dendrogram) from a set of unlabeled documents. Divisive (top-down)
Start with all documents belong to the same cluster. Eventually each node forms a cluster on its own.
Recursive application of a (flat) partitional clustering algorithm, e.g., kMeans (k=2) Bi-secting kMeans.
Agglomerative (bottom-up) Start with each document being a single cluster.
Eventually all documents belong to the same cluster.
![Page 4: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/4.jpg)
Dendrogram
Clustering is obtained by cutting the dendrogram at a desired level: each connected component forms a cluster.
The number of clusters k is not required in advance.
![Page 5: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/5.jpg)
Dendrogram – Example
Clusters of News Stories:Reuters RCV1
![Page 6: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/6.jpg)
Dendrogram – Example
Clusters of Things that People Want: ZEBO
![Page 7: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/7.jpg)
HAC
Hierarchical Agglomerative Clustering Starts with each doc in a separate cluster. Repeat until there is only one cluster:
Among the current clusters, determine the pair of clusters, ci and cj, that are most similar. (Single-Link, Complete-Link, etc.)
Then merges ci and cj to a single cluster. The history of merging forms a binary tree or
hierarchy.
![Page 8: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/8.jpg)
Single-Link
The similarity between a pair of clusters is defined by the single strongest link (i.e., maximum cosine-similarity) between their members:
After merging ci and cj, the similarity of the resulting cluster to another cluster, ck, is:
),(max),(,
yxsimccsimji cycx
ji
),(),,(max)),(( kjkikji ccsimccsimcccsim
![Page 9: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/9.jpg)
HAC – Example
![Page 10: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/10.jpg)
HAC – Example
![Page 11: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/11.jpg)
HAC – Example
d1
d2
d3
d4
d5
d1,d2
d4,d5
d3
d3,d4,d5
As clusters agglomerate, docs are likely to fall into a dendrogram.
![Page 12: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/12.jpg)
HAC – Example Single-Link
![Page 13: Information Retrieval](https://reader035.vdocument.in/reader035/viewer/2022081603/56815981550346895dc6c0e6/html5/thumbnails/13.jpg)
Take Home Message
Single-Link HAC Dendrogram
),(max),(,
yxsimccsimji cycx
ji
),(),,(max)),(( kjkikji ccsimccsimcccsim