seeing things in the clouds over concept lattices with tag clouds browsing semi-structured data...
TRANSCRIPT
Seeing
Things
in the Clouds
over concept lattices
with tag clouds
browsingsemi-structured data
Bernd Fischer
object
attribute context table
relation
Galois connection
knowledge discovery
mining
software repositories
join
focus
visualization
information retrieval
meet
navigation
Stellenbosch
Computer Science
How do you find stuff on the Internet?
concept-based browsing
query
How do you find stuff on the Internet?
Yikes! 3 370 000 results!
How do you find stuff on the Internet?
concept-based browsing
query
lattice
How do you find stuff you didn’t look for?
Retrieval: extract objects that satisfy a pre-defined criterion• query describes criterion• main operation is matching: check satisfaction against query• main goal is precision: show only relevant objects
Browsing: spontaneously explore a collection • focus describes current position and selection• main operation is navigation: change the focus• main goal is recall: show all relevant objects
How do you browse?
(hierarchical)navigation structure
focus
selection
How do you browse semi-structured data?
What is semi-structured data?
What is structured data?
Structured data has...• ... a very high degree of regularity• ... an explicit, tight format (schema)
Typical examples:• spreadsheets• relational databases (SQL: structured query language)
How do you browse semi-structured data?
What is semi-structured data? Semi-structured data ...
• ... contains both free-text and formatted fields
• ... has large structural variance• ... is implicitly formatted
Typical examples:• product reviews• newspaper articles
+ meta-data• revision control logs
Approach:• find a suitable abstract data representation
– bag-of-words, graphs, binary relations, RDF triples, XML, ...• find a suitable hierarchy
– metric spaces, graphs, concept lattices, ...• find a suitable visual representation
– lists, graphs, tag clouds, city scapes, ...• find a navigation algorithm
How do you browse semi-structured data?
How do you represent data?
Structured data is represented by n-ary relations or tables:• each object becomes a row• each column represents
an attribute type• text remains unstructured
author title year venue
Fischer Specification-based browsing... 2000 J. ASE
van Zijl Supernondeterministic finite... 2001 CIAA
How do you represent data?
Structured data is represented by n-ary relations or tables:• each object becomes a row• each column represents
an attribute type• text remains unstructured• set-valued attributes require normalization
author title year venue
Fischer Specification-based browsing... 2000 J. ASE
van Zijl Supernondeterministic finite... 2001 CIAA
Greene ConceptCloud: A Tag-cloud... 2014 FSE
Fischer ConceptCloud: A Tag-cloud... 2014 FSE
How do you represent data?
Structured data is represented by n-ary relations or tables:• each object becomes a row• each column represents
an attribute type• text remains unstructured• set-valued attributes require normalization
Semi-structured data can be represented by binary relations:• text is split into words• each occurring value and
word becomes an attribute• build context table: add cross if attribute applies to object
– word appears in document, meta-data, references ...
id title year venue
08 Specification-based browsing... 2000 J. ASE
15 Supernondeterministic finite... 2001 CIAA
42 ConceptCloud: A Tag-cloud... 2014 FSE
id author
08 Fischer
15 van Zijl
42 Greene
42 Fischer
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
How do you find hierarchy in relations?
Formal concept analysis:• formal context: (O, A, ~ₓ)
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
How do you find hierarchy in relations?
Formal concept analysis:• formal context: (O, A, ~ₓ) • common attributes:
α(O) = { a ∈ A | ∀o ∈ O : o ~ₓ a } α({08, 42} =
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
α({08, 42} = {Fischer, browsing}
How do you find hierarchy in relations?
Formal concept analysis:• formal context: (O, A, ~ₓ) • common attributes:
α(O) = { a ∈ A | ∀o ∈ O : o ~ₓ a }• common objects:
ω(A) = { o ∈ O | ∀a ∈ A : o ~ₓ a }• concept:
(O, A) s.t. α(O) = A ∧ ω(A) = O
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
α({08, 42} = {Fischer, browsing}
ω({Fischer, browsing}
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
ω({Fischer, browsing} = {08, 42}
extent intent
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
How do you find hierarchy in relations?
Formal concept analysis:• formal context: (O, A, ~ₓ) • common attributes:
α(O) = { a ∈ A | ∀o ∈ O : o ~ₓ a }• common objects:
ω(A) = { o ∈ O | ∀a ∈ A : o ~ₓ a }• concept:
(O, A) s.t. α(O) = A ∧ ω(A) = O
{08}{F,browsing,’00}
{42}{F,G,browsing,tag,’14}
{08, 42}{F,browsing}
{42}{tag}
extent intent
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
How do you find hierarchy in relations?
Formal concept analysis:• formal context: (O, A, ~ₓ) • common attributes:
α(O) = { a ∈ A | ∀o ∈ O : o ~ₓ a }• common objects:
ω(A) = { o ∈ O | ∀a ∈ A : o ~ₓ a }• concept:
(O, A) s.t. α(O) = A ∧ ω(A) = O• sub-concept ordering:
(O₁, A₁) ≤ (O₂, A₂) iff O₁ ⊆ O₂ iff A₁ ⊇ A₂• concept lattice: concepts of a context form a complete lattice
{08}{F,browsing,’00}
{42}{F,G,browsing,tag,’14}
{08, 42}{F,browsing}
{42}{tag}
Are we there yet?
Nope.
Concept lattices induce • enough structure for navigation...• ... but too much to show directly!
How do you visualize concept lattices?
Approach:• don’t show the lattice• use concepts as focus• visualize only focus concept
– but in relation to lattice
How do you visualize concepts?
Approach:• don’t show the lattice• use concepts as focus• visualize only focus concept
– but in relation to lattice• use extent to derive tag cloud
How do you build tag clouds for concepts?
What is a tag cloud?
• visual representation of text data– summarize large data set
– emphasize important tags
• single words or short phrases• importance reflected as size
– frequency in document
– number of tagged items
– number of page hits
• different layout methods
How do you build tag clouds for concepts?
• intent looks like tag cloud...• ... but is common to all objects
⇒ all tags same size• instead: collect all attributes
from all objects in extent– can be expressed in
concept lattice:
– also add extent via object identifiers
• intent shown as largest tags– smaller tags are related
information
{08, 42}{Fischer,browsing}
Fischer Greene van Zijl browsing tag 2000 2001 2014
08 × × ×15 × ×42 × × × × ×
2 1 - 2 1 1 - 1
08 42 2000 2014 browsingFischer Greene tag
The ConceptCloud Browser
by: Gillian Greene, US
file
message
date
author
controls
The ConceptCloud Browser
most prolificcontributor
How do you navigate with tag clouds?
Navigation modes:• refinement: narrow the selection
– select a new tag
• widening: extend the selection– remove a selected tag
How do you navigate with concept lattices?
Navigation modes:• refinement: narrow the selection
– select a new tag: f’ = f ∧ δ(t)
• widening: extend the selection– remove a selected tag
(ω({t}), α(ω({t}))) if t∈Aδ(t) = (α(ω({t})), ω({t})) if t∈O
focusconcept
tagconcept
focusconcept
focusconcept
How do you navigate with concept lattices?
Navigation modes:• refinement: narrow the selection
– select a new tag: f’ = f ∧ δ(t)
• widening: extend the selection– remove a selected tag: f’ = f ∨ δ(t)
(ω({t}), α(ω({t}))) if t∈Aδ(t) = (α(ω({t})), ω({t})) if t∈O
tagconcept
focusconcept
tagconcept
focusconcept
focusconcept
focusconcept
How do you navigate with concept lattices?
Navigation modes:• refinement: narrow the selection
– select a new tag: f’ = f ∧ δ(t)
• widening: extend the selection– remove a selected tag: f’ = f ∨ δ(t) f’ = ∧i∈π(f) \ {t} δ(i)– join-based widening can be
useful as well
(ω({t}), α(ω({t}))) if t∈Aδ(t) = (α(ω({t})), ω({t})) if t∈O
tagconcept
focusconcept
tagconcept
focusconcept
Navigation in the ConceptCloud Browser
Navigation in the ConceptCloud Browser
Navigation in the ConceptCloud Browser
The Percept Browser
by: Carl Kritzinger, Fireworks
• Semi-structured data is common but hard to analyze• Tag clouds are a good visualization approach...• ... and the combination with concept lattices makes it easy to
navigate and find related information• Flexible approach, generic tool
– different data sets– different types of contexts ( different types of analysis)⇒
• Scalability– DBLP, IMDb, Wikipedia?
• Customizability– context extraction– tool scripting
Conclusions & Future Work
Conclusions & Future Work
• Semi-structured data is common but hard to analyze• Tag clouds are a good visualization approach...• ... and the combination with concept lattices makes it easy to
navigate and find related information• Flexible approach, generic tool
– different data sets– different types of contexts ( different types of analysis)⇒
• Scalability– DBLP, IMDb, Wikipedia?
• Customizability– context extraction– tool scripting