project update matt williams xml document visualization and retrieval
DESCRIPTION
Information Retrieval Standard Information Retrieval (IR) tf*idf tf – frequency of a term in a doc Idf – inverse document frequency Number of documents containing the termTRANSCRIPT
Project Update
Matt Williams
XML Document Visualization and Retrieval
Background XML vs Web Doc Added Structure
<book> <title>My First XML</title> <prod id="33-657“ media="paper"> </prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing
tag</para> <para>Elements must be properly
nested</para> </chapter></book>
Can we take advantage of this structure when searching for documents?
Information Retrieval Standard Information Retrieval (IR)
tf*idf tf – frequency of a term in a doc Idf – inverse document frequency Number of documents containing the
term
Information Retrieval A fair bit of previous work on adding structure
to IR queries.
Examples XIRQL – Fuhr and GroBjohann
//book/chapter[heading $cw$ “InfoVis”] XXL – Theobald and Weikum
Select Z From Index Where zoos.~animal.~cougar as Z
But… What if we are unsure of the structure? What if we have variability in the structure?
Information Retrieval My goal is to provide an interface to
explore the XML collection with limited information
Meta-Schema Information – Element Index Visual Clustering – Multidimensional
Scaling Visual Queries – Element Selection
Related Work Visual Information Seeking
Homefinder / Periodic Table – Algerg and Shneiderman
Related Work Galaxies
Wise et al.
Visual Web Retrieval Lighthouse - Leuski
Related Work ZUI – Pad, Jazz, and Piccolo
Ben Bederson SpaceTree
Jesse Grosjean et al. TreeMaps ??
Ben Shneiderman
Multidimensional Scaling
Document Similarity Dimensionality Reduction From full
dimensional distance measure 2 dimensional distance measure
Problems – Speed?
Test Environment eXist – Open Source XML Native Database
Wolfgang M. Meier http://exist-db.org/
I am working on providing a front end to the Database that provides: A Selectable Element Index Interactive Results That Dynamically
Cluster and Zoom
Thus Far Lots of Learning!! XML Databases Multidimensional Scaling XML Queries XML Information
Retrieval Zoomable Interfaces Treemaps
Added basic GUI to eXist Added a Service to offer
the element Index as part of the API