cobase: scalable and extensible cooperative information system
DESCRIPTION
CoBase: Scalable and Extensible Cooperative Information System. Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu. Conventional Query Answering. Need to know the detailed database schema Cannot get approximate answers - PowerPoint PPT PresentationTRANSCRIPT
1
CoBase: Scalable and Extensible Cooperative Information System
Wesley W. ChuComputer Science Department
University of California, Los Angeles
http://www.cobase.cs.ucla.edu
2
Conventional Query Answering
Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries
Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries
3
Find a seaport with railway facility in Los Angeles
CoBase ServersHeterogeneousInformation Sources
CoBase provides: Relaxation Approximation Association Explanation
Find a nearby friendly airport that can land F-15
Domain Knowledge
Find hospitals with facility similar to St. John’s near LAX
Cooperative Queries
4
Generalization and Specialization
More Conceptual Query
Specific Query
Conceptual Query Conceptual Query
Specific Query
Generalization
SpecializationGeneralization
Specialization
5
Type Abstraction Hierarchy (TAH)
Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes
Large_SizeSmall_Size
Very_Small
Small_to_Medium
Large_to_Extra_Large
Very_Large
XL XXLLMSXXSXXXS
Provide multi-level knowledge representations
6
Type Abstraction Hierarchy (TAH)
CA
N. CAS. CA C. CA
SanJose
PaloAltoSacramento
DavisSanDiego
LongBeach
LA SF
(Location Example)
7
Relaxation Agent
query conditionsconstraints
Use knowledge-based approach (generalization
and specialization via Type Abstraction Hierarchy)
to relax the followings for matching:
8
Query Relaxation
Yes
Query
Display
AnswersRelaxAttribute Database
No
QueryModificationTAHs
9
10
Visualization of Relaxation Process
Query: Find seaports in the given region.
given region
relaxed region
11
12
Relaxation Control Primitives
not-relaxable runway-length
relaxation-order (runway length,
location)
preference-listunacceptable-listanswer-sizerelaxation-level
13
Relaxation Primitives
^ (approximate) ^ 9 am
betweennear-to (context-sensitive) Airport near-to
LAX Restaurant near-to
UCLA
similar-to Airport similar-
to LAX base-on (traffic,runway)
within
14
Similar-to
Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.
select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’
based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd
15
Similar-to Result
APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27
Similar-to module ranks the returned answersaccording to mean-squared error.
16
Unacceptable List Operator
NETunisia
CentralTunisia
NWTunisia
SWTunisia
Tunisia
Bizerte El Borma...
CentralTunisia
SWTunisia
Tunisia
Gafsa El Borma
Type Abstraction Hierarchy Trimmed TAH
Avoid Northern Tunisia!
CoBaseRelaxationManager
Constraint
Gafsa
17
TAH Generation for Numerical Attribute Values
Relaxation Error Difference between the exact value and the
returned approximate value The expected error is weighted by the
probability of occurrence of each value
DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data
18
TAH Generation for Non-numerical Attribute Values
Pattern Based Knowledge Induction (PBKI)
Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)
19
Type Abstraction Hierarchy (TAH)
Location Name Runway Length
All
Short Medium Long
0 ... 700 700 ... 1K 1K ... 5K
Tunisia
NE Tunisia
Bizerte
Tunis
Djedeida
CentralTunisia
SW Tunisia
El Borma
...
Provide multi-level knowledge representations
20
Associative Query Answering
Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500
and approximately 10000 feet
Airport Name Runway Length (feet)Jerba 10171
Monastir 9700Tunis 10500
Weather Runway QualitySunny GoodRain Good
Foggy Damaged
Military or Civilian Flag
Refrigerated Storage Capacity (Tons)
CC 0.00C 1000.00
Query Answers
Associated Attributes and Answers Associated Attributes and Answers
User Type = Pilot User Type = Planner
21
CoBase and GLADIntegration
22
CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton
Provide conceptual query answering Find “Earth Moving” Equipment
Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)
Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)
23
Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation
Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)
24
CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies
Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to
their distance from the selected location
25
CoBase and GLAD TIE
ReportCollection
Report QueryConstructor
Filter
Editor
ObjectCache
DisplayGenerator
QueryCollection
GLAD
CoBase QueryEditor
CoBaseRelaxationManager
KnowledgeBase
DataCacheCoBase
Data Source
Manager
Databases
NSNs
SpatialArea
Selection
26
GLAD Query
Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'
and upper(cbs_category_nomen) = 'AIRCRAFT'
and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)
27
CoGLAD Query with Relaxation Control Operators
Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'
and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)
not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston
28
CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.
select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'
based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))
at-least 4
* '0000IB0000961' is an answer from the previous query
29
CoGLAD Querywith Approximate Operator
Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.
select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
on_hand_quantity = ~150
30
Adding Constraints to a Query
GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
nomenclature like ‘%TARP%’
Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
nomenclature like ‘%TARP%’ and on_hand_quantity = ~150
andsize_in_square_feet = 350
31
Example of Spatial Relaxation
NSNsselected an area on the mapconstraint: quantity on hand
CoBaseRelaxationManager
satisfyconstraints
Yes
No
return the answers
QueryProcessing
relax the selected areabased on the context-sensitive TAHs
32
Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)
not-relaxable: price
at-least: value: size of the tarpaulin quantity on hand: relax until enough
quantity on hand (specified by the user) is obtained
33
Scalable and Extensible CoBase Architecture
34
Mediator Inter-Communications via KQML
ModuleObjects
APIs
Content LanguageDataActions
CoBaseOntology
Mediator A
Module A
CoBase Ontology
CoBase Content Language
KQML
Mediator B
Module B
CoBase Ontology
CoBase Content Language
KQML
35
36
Query Answers Without CoBase
Query: find chemical suits
37
38
39
40
41
42
43
Electronic Warfare
Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters
Determine platform sites near the line of the bearing of an emitter
This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
44
Performance Improvement by Using CoBase in EW
Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2
identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%
Conventional DB: parameter ranges from emitter specificationsCoBase:
DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters
TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75
This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
45
Current CoBase Users and Applications
ARPI members ISI Unisys
Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation
UCLA KMeD Project Medical School
Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and contents explanation of approximate matching quality
Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views
Lockheed/Martin Marietta
Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms
BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation
46
47
XML Query Relaxation
51
XML Overview
XML (eXtensible Markup Language) is a format for specifying structured documents and data. XML is extensible since it allows users to define their own schema (unlike HTML which is a pre-defined markup language).
52
XML (cont.)XML is a hierarchical data model.A XML document consists of two parts
1. Schema2. Data
The schema describes the structure of the data.Example:
<?xml version="1.0" encoding="ISO-8859-1"?><!-- Edited with XML Spy v4.2 --><!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>
]><note>
<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>
</note>
Schema
Data
53
XML Query Languages
XML can be represented as an ordered tree with: Nodes representing elements and attributes Edges representing inclusion relationships
An XML query can similarly be represented as a tree with edges of two types: “/” for parent-child relationships “//” for ancestor-descendent relationships
54
XML Query Language: Example
The following XML<a>
<d><b/>
</d></c>
</a>Yields the following tree:
1,a2,d 4,c
3,bA possible query is:
$1=a
$2=b $3=c
55
Query Relaxation
XML Query Relaxation can be categorized into two main types:
1. Value Relaxation: values are relaxed to expand the scope values are allowed to take
2. Structure Relaxation: the structure of the query tree is relaxed to allow for more answers
57
Structure Relaxation
In structure relaxation nodes and/or edges of the query tree can be relaxed to allow for more answers.There are three types of structural relaxation:
1. Edge Relaxation2. Node Relaxation3. Order Relaxation
58
Edge Relaxation
A parent-child edge can be relaxed to a ancestor-descendent edge.
For example:
1,a
2,b 4,b 7,b 9,d 12,d
3,d 5,d 8,c 10,b 13,b
6,c 11,c 14,d
15,c
Original query “a/b/c” 1,7,8
Relaxed queries: “a//b/c” 1,7,8 & 1,10,11 “a/b//c” 1,7,8 & 1,4,6 “a//b//c” 1,7,8; 1,10,11; 1,4,6; 1,13,15
59
Node Relaxation
Nodes can be relaxed in several ways: A node can be relabeled with a similar tag name
based on the domain knowledge. For example: article/sec article/section
A node can be replaced with a “don’t care” such that it will match any non-null answer.
For example: /a/b/c a/_ /c A node can be removed while ensuring the
“superset” property. For example: a/b/c a/b
60
Order Relaxation
The order in an XML query can be relaxed to allow any ordering of search conditions.For example:
$1=a $1=a
$2=b < $3=c $2=b$3=c
Two documents:D1 D2
<a> <a><d> <c/>
<b> <d></d> <b/><c/> </d>
<a> </a>
Original query matches D1 onlyRelaxed query matches D1 and D2
66
Conclusions
Provide user and context sensitive query relaxations (structured ,semi-structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators
CoXML( Cooperative XML) Value relaxation Structure relaxation ( edge, node, order)
67
References
[1] W.W.Chu,H.Yang, K.Chiang, M.Minock, G.Chow, and C.Larson, "CoBase: A Scalable and Extensible Cooperative Information System", Journal of Intelligence Information Systems, 6, 1996
[2] Shaorong Liu and Wesley W. Chu, Cooperative XML(CoXML) Query Answering at INEX 2003, INEX Workshop 2003
[3] Dongwon Lee "Query Relaxation for XML Model“ In Ph.D Dissertation, University of California, Los Angeles, June 2002
[4] Dongwon Lee, Murali Mani, Wesley W. Chu"Effective Schema Conversions between XML and Relational Models“ In European Conf. on Artificial Intelligence (ECAI), Knowledge Transformation Workshop (ECAI-OT), Lyon, France, July 2002 (Invited)
http://www.cobase.cs.ucla.edu