Download - DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz
![Page 1: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/1.jpg)
DiGIR 1
DiGIRDistributed Generic Information Retrieval
Stan Blum, Dave Vieglais, P.J. Schwartz
![Page 2: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/2.jpg)
DiGIR 2
Project Goals To define a protocol for retrieving
structured data from multiple, heterogeneous databases
To build a reference implementation of said protocol
![Page 3: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/3.jpg)
DiGIR 3
Design Goals To use open protocols and standards, such
as HTTP, XML, and UDDI to leverage existing and emerging technologies
To de-couple the protocol, software and semantics
To automate the establishment of a new data provider as much as possible
![Page 4: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/4.jpg)
DiGIR 4
High-level Architecture
ProtocolProviderPortalRegistry
![Page 5: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/5.jpg)
DiGIR 5
Protocol Defines request and response message
formats for communication between Provider and Portal
Assumes Providers conform to a known federation schema
Remains flexible to allow for federation schema pluggability
![Page 6: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/6.jpg)
DiGIR 6
Provider Makes structured data
available to portals Communicates via protocol
compliant messaging only Complies with a known
federation schema Supplies meta-data to
describe data classification and availability
![Page 7: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/7.jpg)
DiGIR 7
Portal The entry point for a “user” Can make requests of N
number of providers Communicates via protocol
compliant messaging only Queries registry for available
providers Can determine, based on
provider meta-data, whether a provider should be queried
![Page 8: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/8.jpg)
DiGIR 8
Project Information The DiGIR project is a collaborative effort DiGIR is currently established as an open
source project on SourceForge (http://sourceforge.net).
Further documentation is available on the SourceForge site.
Please join us in collaborating!
![Page 9: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/9.jpg)
DiGIR 9
Protocol Details
![Page 10: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/10.jpg)
DiGIR 10
Protocol Details Specified in an XML Schema (.xsd) Intended to work in conjunction with
federation schemas, also expressed as XML Schemas
Actual request and response documents are instance documents conforming to both the protocol schema and a federation schema
![Page 11: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/11.jpg)
DiGIR 11
<request xmlns="http://www.namespaceTBD.org/digir" xmlns:darwin="http://www.namespaceTBD.org/darwin" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.namespaceTBD.org/digir digir.xsd http://www.namespaceTBD.org/darwin darwin.xsd">
<header> <requestType>search</requestType> </header> <search> <dbName>myDiggableBipesDB</dbName> <filter> <and> <in> <list xsi:type=“darwin:list”> <darwin:Month>11</darwin:Month> <darwin:Month>12</darwin:Month> </list> </in> <equals> <darwin:Genus>Bipes</darwin:Genus> </equals> </and> </filter> <records start=“0” count=“50”> </search></request>
![Page 12: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/12.jpg)
DiGIR 12
Request Explanation Composed of elements from the protocol
namespace (default) and the schema namespace <header> contains information about the payload <search> contains dbName, filter, and record
specification (will also specify result format) <filter> is effectively an XML representation of a
SQL where clause This search request is for the first 50 specimen
records that are genus Bipes and were found in the months of November or December.
![Page 13: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/13.jpg)
DiGIR 13
Filter BuildingLOPs (logical operators) <and> <or> <andNot> <orNot> Can be nested
COPs (comparison ops) <equals> <lessThan> <lessThanOrEquals> <notEquals> <greaterThan> <greaterThanOrEquals> <like> <in> (multi value)
![Page 14: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/14.jpg)
DiGIR 14
What “binds” the schemas? The protocol schema defines various abstract
types and elements:<xsd:element name="searchCondition" abstract="true"><xsd:element name="alphaSearchCondition" abstract="true“
substitutionGroup="searchCondition"><xsd:complexType name="listType" abstract="true" /><xsd:complexType name="numericListType" abstract="true" />
A federation schema must define searchable concepts, or groups of them, as substitutable for these abstract elements or extensions of the abstract types
<xsd:element name="Species" type="xsd:string“substitutionGroup="digir:alphaSearchCondition" />
![Page 15: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/15.jpg)
DiGIR 15
<xsd:complexType name="list <xsd:complexContent> <xsd:extension base="digir:listType"> <xsd:sequence> <xsd:choice> <xsd:element ref="ScientificName" maxOccurs="unbounded"/> <xsd:element ref="Kingdom" maxOccurs="unbounded" /> <xsd:element ref="Phylum" maxOccurs="unbounded" /> <xsd:element ref="Class" maxOccurs="unbounded" /> <xsd:element ref="Order" maxOccurs="unbounded" /> <xsd:element ref="Family" maxOccurs="unbounded" /> <xsd:element ref="Genus" maxOccurs="unbounded" /> <xsd:element ref="Species" maxOccurs="unbounded" /> <…> </xsd:choice> </xsd:sequence> </xsd:extension> </xsd:complexContent></xsd:complexType>
![Page 16: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/16.jpg)
DiGIR 16
Why “bind” like this? To provide data-typing (string, numeric,
etc.) for various concepts within operators at an abstract level (e.g. LIKE only valid for string data; IN allows for multiples, but in a controlled fashion)
To allow for federation schemas to simply classify data as types without having to redefine/extend operators
![Page 17: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/17.jpg)
DiGIR 17
Request Issues Do we need another abstract element such as
dateSearchCondition? What information will be useful in the header? How should we specify the format of the results?
What standard formats should be offered (I.e. brief, full?).
Will tblName be part of the meta-data required of providers?
What concepts of Darwin Core 2 are searchable?
![Page 18: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/18.jpg)
DiGIR 18
Response Prototype<response xmlns="http://www.namespaceTBD.org/digir"
xmlns:darwin="http://www.namespaceTBD.org/darwin" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.namespaceTBD.org/digir digir.xsd http://www.namespaceTBD.org/darwin darwin.xsd">
<header>
<!-- contents TBD -->
</header>
<content>
<record>
</record>
</content>
<diagnostics>
</diagnostics>
</response>
![Page 19: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/19.jpg)
DiGIR 19
Response Issues How do we format and validate the response
content? What elements are needed for the <header>, if
any? Do we always have diagnostics, or only if there is
an error? Should a finite set of diagnostics be created and
maintained in its own XML Schema? Will there ever be a diagnostic that is specific to a federation schema?
![Page 20: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/20.jpg)
DiGIR 20
Provider Details
![Page 21: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/21.jpg)
DiGIR 21
Provider Details Implemented as a web application that answers questions Interface is not specific to a particular information domain No state information is recorded
Each request is treated as unique and uninfluenced by previous requests
Must always generate a valid response Consists of four key components
Request handler Filter handler Result set cache Response generator
![Page 22: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/22.jpg)
DiGIR 22
Request Handler Receives XML document Validates document Generates internal structures for further
processing
![Page 23: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/23.jpg)
DiGIR 23
Filter Handler Internal structural representation of filter
(query) structure Responsible for generating a native query
string for querying the database Communicates with UDDI to obtain
standard database definition Custom configured to work with specific
database implementation
![Page 24: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/24.jpg)
DiGIR 24
Result Set Cache Contains the results of applying a query Responsible for generating the response
records in the requested format Somewhat directly integrated with the
response generator
![Page 25: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/25.jpg)
DiGIR 25
Response Generator Generates the response XML document Serializes the response header information Serializes diagnostic information Serializes the requested subset of records
![Page 26: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/26.jpg)
DiGIR 26
Provider ConfigurationPortal
ProfileSchema
Data Provider System
Data
DiGIRProvider
Data MapSchema
Data Provider System
Data
DiGIRProvider
Data MapSchema
![Page 27: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/27.jpg)
DiGIR 27
Portal Details
![Page 28: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/28.jpg)
DiGIR 28
Portal Details Divided into two distinct components: a
presentation layer and PortalServices The presentation layer supports the UI and
translates requests (HTTP requests from forms or links) into protocol compliant XML requests
The presentation layer also handles all display issues involving the responses, such as format, sorting, collating, etc…
The presentation layer is envisioned to be an application server/web server implementation
![Page 29: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/29.jpg)
DiGIR 29
Portal Details PortalServices handles all external network
activity (UDDI calls, provider calls, etc) PortalServices limits provider calls to those
necessary based on provider meta-data PortalServices threads provider calls for
increased performance (I.e. response time) PortalServices is envisioned to be a webapp and
supporting classes running within an application server, such as TomCat
![Page 30: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/30.jpg)
DiGIR 30
PortalServices RegistryAccess ProviderCache PortalConfig PortalServlet PortalRequestHandler
ProviderFilterer Marshallers
![Page 31: DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz](https://reader035.vdocument.in/reader035/viewer/2022062620/551a6839550346545e8b5dc0/html5/thumbnails/31.jpg)
DiGIR 31
Portal Issues What information will be stored in UDDI about a
provider? What information will be known for
communicating with a Provider (I.e. IP address, port, etc…?)
What meta-data will be provided and what are the rules for using such data for provider filtering?
What requirements are there for logging and monitoring?