l c sl c s haystack dennis quan oxygen workshop, january, 2002
TRANSCRIPT
L C SIntroduction
• Personalized information store
• Semistructured data with arbitrary metadata
• Unified ontology
• Standards-based components and infrastructure
• Compatible with existing systems
• Example user interface
• Integration with mail and groupware concepts
• Collaboration possibilities
L C SWhat is an Ontology?
• “The branch of metaphysics that deals with the nature of being. “ – American Heritage Dictionary
• Describes relationships between different objects in a system
• Like schemata or class hierarchies
L C SResource Description Format (RDF)
• Standard defined by W3C in 1999 (http://www.w3.org/RDF/)
• Models statements of the form:
<subject> <predicate> <object>
• Can be expressed as a labeled, directed graph
• For example, statements “Bob likes Alice” and “Bob likes Jane”:
Bob
Alicelikes
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
<rdf:Description rdf:about=“Bob”>
<likes rdf:resource=“Alice” />
<likes rdf:resource=“Jane” />
</rdf:Description>
</rdf:RDF>
Janelikes
L C SRDF Store
• RDF Store used by Haystack to store all information
• Runs off of a standard SQL database
• Provides querying facility
• Example: who likes Jane?
(?x likes Jane); return ?x
L C SBelief
• With multitude of information, how much is believable?
• Annotate who said what
• Also can describe belief network using RDF
• Example: John says that Bob likes Jane, and Bob believes John
• Belief Server—component of Haystack that evaluates belief network and “filters” the store for information believed by the user
Bob
Janelike
s
assertedBy
Johnbelieves
L C SCollections
• Basic means of aggregation
• Difference from “folders”: containment versus membership
• Categorization and subcategories
L C SQueries
• One possible means for constructing a collection (result set)
• Can use all possible metadata fields to construct query
• Natural language
• Multiple query sources—the Web, other people’s Haystacks, etc.
• Automatic update of query result sets
• Possibilities for machine learning (e.g., when a user removes an item from a result set—a message to Haystack that an object does not belong)
L C SServices
• Callable services in Haystack
• Also, automatic agents that respond to events
• Available methods described in metadata
• Haystack service initialization script also described in metadata
• Services mainly written in Java, but can be written in any language
L C SSOAP, WSDL and UDDI
• Relationship to Web Services standards:– Simple Object Access Protocol (SOAP)
http://www.w3.org/TR/SOAP/
– Web Services Description Language (WSDL)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwebsrv/html/wsdl.asp
– Universal Description, Discovery and Integration (UDDI)
http://www.uddi.org/
• SOAP and HTTP/PUT used as protocols for communication between services, including the RDF Store
• RDFized version of WSDL used to describe services’ interfaces
• UDDI query functionality easily modeled in RDF query
L C SInference Layer
• The semantics defined in RDF often permit deduction
• Example: Fido is a dog and dogs are mammals Fido is a mammal
• Deduced knowledge is useful and should be stored
• Inference Layer recognizes patterns and triggers agents/services to perform deduction
L C SViews
• May be several different ways of looking at an object
• Example: appointment book can be viewed as a sortable list of appointments or a calendar
• Views are a distinct type of object used to model these different ways of looking at objects
L C SUser Interface Ontology
• UI components (e.g. JavaBeans, ActiveX controls) rich sources of metadata
• Form descriptions also describable with metadata
• Possible to construct a directed graph that models a user interface
• Similar in concept to XUL
• Permits dynamic deduction of user interface similar to XSLT, except semantic rather than syntactic
• Part: a Haystack UI component
• ViewPart: a kind of part specially designed to display a specific kind of View
L C SSWT
• Cross-platform Java widget toolkit
• Part of Eclipse project (http://www.eclipse.org/)
• Uses native operating systems’ widgets, avoiding performance problems
• Used for Part framework
• Integrates with Mozilla web browser
• Also possible to use ActiveX controls and GTK widgets
L C SOzone
• Haystack experimental user interface
• Modeled after a web browser
• Uses parts to describe user interface
L C SBrowse/Query Paradigm
• Browsing: going through nested folders/categories to locate sought item(s)
• Query: giving an explicit set of conditions to locate sought item(s)
• Ozone adopts hybrid Browse/Query paradigm
• Traditional subcategories still present in Collection view
• Also, parameterized categories similar to queries
• Previously issued queries persist as subcategories
L C SMail
• E-mail a good source of metadata-rich documents
• Messages, e-mail addresses, people and groups can be modeled in RDF
• Haystack agents can be used to filter e-mail to make it more manageable
• Many e-mail management techniques applicable to documents in general and vice versa
L C SStorage Model
• Objects in Haystack named by Uniform Resource Identifiers (URIs)
• URLs are a subclass of URIs
• Documents and web pages can be named by URLs
• HTTP/FTP/WebDAV servers can then be used to store documents
• Inefficient to store terabytes of “data” in RDF when existing storage solutions are effective
L C SCollaboration
• Allow Haystack-Haystack and Haystack-Semantic Web information exchange
• Filtration of imported data
• Who’s the expert? problem
• Privacy concerns
• Different ways of organizing information between different parties
• Can be used to model mailing lists, newsgroups, and groupware
L C SOntological Conversion
• Unlikely that everyone will agree on the same schemata
• Ontological conversion converts from one schema to another
• Can be implemented as Haystack agents that respond to metadata with “foreign” schemata
L C SImplementation
• Written for Java 2 platform (JDK 1.3.1)
• SWT (Eclipse) used for user interface components
• Mozilla web browser
• HSQL open source SQL database written in Java
• Lucene (Apache Jakarta project) search engine written in Java
• Tomcat (Apache Jakarta project) web server written in Java
• Parts written in Jython, Java-based Python interpreter