prefix tag clouds

17
Pre x Tag Clouds Michael Burch, Steffen Lohmann, Daniel Pompe, Daniel Weiskopf BY: Farah Kamw

Upload: xerxes-dominguez

Post on 02-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Prefix Tag Clouds. Michael Burch, Steffen Lohmann, Daniel Pompe, Daniel Weiskopf. BY: Farah Kamw. Introduction. Tag clouds are a popular way to visually represent word frequencies. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Prefix Tag Clouds

Prefix Tag Clouds

Michael Burch, Steffen Lohmann, Daniel Pompe, Daniel Weiskopf

BY: Farah Kamw

Page 2: Prefix Tag Clouds

Introduction

Tag clouds are a popular way to visually represent word frequencies. A major limitation of tag clouds is that they treat

different forms of the same word as individual tags.

For example: singular and plural forms of the same word are treated as individual tags .

Page 3: Prefix Tag Clouds

We have two cases:In the first case, all variations of words appear

in the tag cloud .In this case, they take up screen space that could better be used to display other information or additional tags.

In the second case, Such variations are not shown at tag cloud because they are not frequent enough. In this case, the viewer has no indication whether there are other forms of a word in the text or not.

Introduction(cont.)

Page 4: Prefix Tag Clouds

To overcome this limitation, they have developed prefix tag cloud.

It creates a prefix tree that groups different word forms and visualizes the subtrees as tag cloud.

The color is used to emphasize the grouping, while the relative frequencies of the word are indicated by font size.

The grouped word forms are arranged in a circular tag cloud layout that supports the quick identification of the most frequent words.

Introduction(cont.)

Page 5: Prefix Tag Clouds

CREATION OF PREFIX TAG CLOUDS

The process of creating prefix tag clouds consists of three main components:

1. A prefix tree is generated from a set of tags. This requires to order the tags lexicographically.

2. The prefix subtrees are rendered as node-link diagrams. The font sizes of the prefixes in the diagrams are scaled according to the tag frequencies.

3. The tag cloud is composed from the subtrees and placed in a given drawing area.

Page 6: Prefix Tag Clouds

Prefix Tree GenerationInitial input is a set of tags T={t1,t2,…,tn} with

individual tags . The tags are composed of a finite sequence of characters from the alphabet.

Each tag ti contains at least one character and is not an empty string.

Page 7: Prefix Tag Clouds

Algorithm 1 provides the pseudo code for that function which consists of the following steps:1. Adding the empty string λ to T: It serve as prefix

for all tages.2. Ordering T lexicographically: Fast sorting

algorithms such as merge sort can do this in O(n log n) time. As a result, we obtain the lexicographically ordered list T with the empty string λ as first element.

3. Generating the prefix tree from Tlex

Prefix Tree Generation(cont.)

Page 8: Prefix Tag Clouds
Page 9: Prefix Tag Clouds

Subtree RenderingBefore generating the prefix tag cloud, they split

the prefix tree P at its root node vroot into a set of subtrees .

The subtrees of are visualized as node-link diagrams a left-to-right orientation. Parent nodes are placed in the vertical center and to the left of their child nodes.

The font sizes of the prefixes in the subtrees are scale according to the frequency values.

Since a linear change of The font size has a quadratic effect on the text area , they use the square root of the frequency values for scaling.

Page 10: Prefix Tag Clouds
Page 11: Prefix Tag Clouds

Tag Cloud GenerationThe subtrees are visualized in a circular tag cloud

layout with the most frequent tags in the center and tags with decreasing frequencies towards the boundary.

Drawing starts with the subtree that contains the tag with the highest frequency value, This first subtree is placed in the center of the drawing area.

Subtrees with decreasing frequencies are placed along the spiral path.

Page 12: Prefix Tag Clouds

Subtree drawing continues until one of three conditions is met: 1. All subtrees of P have been rendered,2. A user defined number of subtrees of P has been rendered.3. The drawing area is completely filled with subtrees.

The first two conditions result in a circle-shaped tag cloud due to the spiral placement of the subtrees. The third condition results in a tag cloud with the shape of the drawing area.

Tag Cloud Generation(cont.)

Page 13: Prefix Tag Clouds

APPLICATION EXAMPLEThe investigated text corpus is a large dataset

containing more than 2.1 million publications in the field of computer science.

They transformed the publication titles into individual tags. First they remove special characters and separate words by spaces. We then converted the tags to lowercase and removed stop words.

Finally, they counted the frequencies of each tag.

Page 14: Prefix Tag Clouds
Page 15: Prefix Tag Clouds
Page 16: Prefix Tag Clouds

Conclusion We have introduced the prefix tag cloud that makes

use of prefix trees. Different word forms are visually grouped by color and space, facilitating their identification and comparison in the tag cloud.

The left-to-right orientation of the prefix subtrees

leads to a well readable tag cloud layout.

This method has limitation; for example, words may be grouped that are not related (like ”gene” and ”general”).

Page 17: Prefix Tag Clouds

Thank You