project update matt williams xml document visualization and retrieval

11
Project Update Matt Williams XML Document Visualization and Retrieval

Upload: letitia-conley

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Information Retrieval Standard Information Retrieval (IR) tf*idf tf – frequency of a term in a doc Idf – inverse document frequency Number of documents containing the term

TRANSCRIPT

Page 1: Project Update Matt Williams XML Document Visualization and Retrieval

Project Update

Matt Williams

XML Document Visualization and Retrieval

Page 2: Project Update Matt Williams XML Document Visualization and Retrieval

Background XML vs Web Doc Added Structure

<book> <title>My First XML</title> <prod id="33-657“ media="paper"> </prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing

tag</para> <para>Elements must be properly

nested</para> </chapter></book>

Can we take advantage of this structure when searching for documents?

Page 3: Project Update Matt Williams XML Document Visualization and Retrieval

Information Retrieval Standard Information Retrieval (IR)

tf*idf tf – frequency of a term in a doc Idf – inverse document frequency Number of documents containing the

term

Page 4: Project Update Matt Williams XML Document Visualization and Retrieval

Information Retrieval A fair bit of previous work on adding structure

to IR queries.

Examples XIRQL – Fuhr and GroBjohann

//book/chapter[heading $cw$ “InfoVis”] XXL – Theobald and Weikum

Select Z From Index Where zoos.~animal.~cougar as Z

But… What if we are unsure of the structure? What if we have variability in the structure?

Page 5: Project Update Matt Williams XML Document Visualization and Retrieval

Information Retrieval My goal is to provide an interface to

explore the XML collection with limited information

Meta-Schema Information – Element Index Visual Clustering – Multidimensional

Scaling Visual Queries – Element Selection

Page 6: Project Update Matt Williams XML Document Visualization and Retrieval

Related Work Visual Information Seeking

Homefinder / Periodic Table – Algerg and Shneiderman

Page 7: Project Update Matt Williams XML Document Visualization and Retrieval

Related Work Galaxies

Wise et al.

Visual Web Retrieval Lighthouse - Leuski

Page 8: Project Update Matt Williams XML Document Visualization and Retrieval

Related Work ZUI – Pad, Jazz, and Piccolo

Ben Bederson SpaceTree

Jesse Grosjean et al. TreeMaps ??

Ben Shneiderman

Page 9: Project Update Matt Williams XML Document Visualization and Retrieval

Multidimensional Scaling

Document Similarity Dimensionality Reduction From full

dimensional distance measure 2 dimensional distance measure

Problems – Speed?

Page 10: Project Update Matt Williams XML Document Visualization and Retrieval

Test Environment eXist – Open Source XML Native Database

Wolfgang M. Meier http://exist-db.org/

I am working on providing a front end to the Database that provides: A Selectable Element Index Interactive Results That Dynamically

Cluster and Zoom

Page 11: Project Update Matt Williams XML Document Visualization and Retrieval

Thus Far Lots of Learning!! XML Databases Multidimensional Scaling XML Queries XML Information

Retrieval Zoomable Interfaces Treemaps

Added basic GUI to eXist Added a Service to offer

the element Index as part of the API