textrank: bringing order into texts

12
TextRank: Bringing Order into Texts Rada Mihalcea and Paul Tarau Presented by : Sharath T.S Shubhangi Tandon

Upload: sharath-ts

Post on 03-Mar-2017

91 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: TextRank: Bringing Order into Texts

TextRank: Bringing Order into Texts

Rada Mihalcea and Paul Tarau

Presented by :

Sharath T.S

Shubhangi Tandon

Page 2: TextRank: Bringing Order into Texts

The TextRank Algorithm

1. Identify text units that best define the task at hand,and add them as vertices in the graph.

2. Identify relations that connect such text units, and use these relations to draw edges between vertices in the graph. Edges can be directed or undirected, weighted or unweighted.

3. Iterate the graph-based ranking algorithm until convergence.

4. Sort vertices based on their final score. Use the values attached to each vertex for ranking/selection decisions.

Page 3: TextRank: Bringing Order into Texts

The TextRank Model

■ G = (V, E)■ V = Set of vertices , E = Set of Edges■ V(in) = Set of incoming edges■ V(out) = Set of outgoing edges■ d = damping factor■ In addition, W = set of edge weights ■ Note : For undirected graphs, V(in) = V(out)

Page 4: TextRank: Bringing Order into Texts

ConvergenceConvergence of 4 different kinds of graphs

with respect to directed/undirected and

weighted unweighted.

Page 5: TextRank: Bringing Order into Texts

KeyWord ExtractionHow is the graph built?

● Each word(lexical unit) is a node.● A co-occurrence relation, two vertices are connected if their

corresponding lexical units co-occur within a window of maximum words, where it can be set anywhere from 2 to 10 words.

Page 6: TextRank: Bringing Order into Texts

Example

Page 7: TextRank: Bringing Order into Texts

Results for Keyword Extraction

Page 8: TextRank: Bringing Order into Texts

Sentence Extraction

● Goal is to rank entire sentences, vertex = sentence. ● Co-occurrence cannot be used. Why ?● We need a new relation for our edges : Similarity. ● Measured as content overlap between two sentences( nodes).

Page 9: TextRank: Bringing Order into Texts

Evaluation● Single Document Summarisation ● Data : DUC (2002) , 567 news articles● Evaluation metrics :ROUGE ● Compared against 15 systems , including baseline provided by DUC

Page 10: TextRank: Bringing Order into Texts

Results● Highly Dense Graph● Output compared to human

summaries

Page 11: TextRank: Bringing Order into Texts

Comparison - TextRank and Opinosis● Both are unsupervised graphical algorithms● Both try to identify the regions most traversed node/path in a

graph(topics, content described most about)● TextRank uses node importances(as a word and sentence) for KeyWord

extraction and summarization whereas Opinosis uses path weights across nodes(words) to generate fine-grained summaries.

Page 12: TextRank: Bringing Order into Texts

Observations1. Common pattern : usage of text-unit co-occurrence as a feature in all

supervised topic modelling algorithms ( LDA, BTM, TextRank )2. Future work : http://web.fi.uba.ar/~fbarrios/tprofesional/articulo-en.pdf3. Industry started :Included as a module in gensim