data integration approaches 2
TRANSCRIPT
-
7/29/2019 Data Integration Approaches 2
1/50
1
Approaches to the Integration of
Distributed and Heterogeneous
Data Resources
Ahmet Sayar
Indiana University
Computer Science Department
-
7/29/2019 Data Integration Approaches 2
2/50
2
Motivation
Integrating data from multiple data sources
Distributed query and transactions of data.
Definitions and adoptions of data, metadata and theirstorages.
Accessing the data seamlessly.
Transparency, support for heterogeneity, extensibilityand scalability.
-
7/29/2019 Data Integration Approaches 2
3/50
3
Outline
Data Integration Approaches
Application Specific Solutions
Application-Integration Framework
ASIS (Application Specific Information System) Database Federation
Ogsa-DAI (Ogsa-Data Access and Integration)
CompareASIS with Ogsa-DAI
Digital Libraries SRB (Storage Resource Broker)
Sompels Digital Library Approach
CompareASIS with SRB and Sompels DL
-
7/29/2019 Data Integration Approaches 2
4/50
-
7/29/2019 Data Integration Approaches 2
5/50
-
7/29/2019 Data Integration Approaches 2
6/50
6
Application-Integration
Framework
It can also be called component-based framework Such as CORBA or Filters with common interfaces
Not necessarily address data integration issues
Based on common data model (such as CML and GML) With adaptors, if the source change the adaptor may have tochange, but application may never see it.
Adding a new source is easy a new adaptor may need to be written.
The adaptor may already be exist online. No need to detailed system knowledge
Ex. ASIS - OGC GIS Application Integration Framework
-
7/29/2019 Data Integration Approaches 2
7/50
7
ASIS (1)
Enables inter-service communication through well-defined service interfaces, message formats andcapabilities metadata.
Data model is ASL (Application Specific Lang.)
Metadata model is capability document Data and metadata have common predefined schema
Components are Filter Services
Web Services, comon service interfaces defined in WSDL
Information/data services enabling distributed access,querying and transformation through their predictableinput/output interfaces.
Chainable, located, and capable of updating theirmetadata manually or dynamically
-
7/29/2019 Data Integration Approaches 2
8/50
8
ASIS (2)
Data and data storage model
Any data can be integrated into the system aftertransforming to ASL.
Heterogeneity is handled at the end-Filters with adaptors.
ASL is community-accepted application specific language GML (Geographic Markup Lang.) in GIS applications
CML (Chemistry Markup Lang.) in Chemistry applications
Filters common service interfaces getCapabilities, getData, getFeatureInfo.
Requests to Filters interfaces getCapabilitiesReq, getDataReq, getFeatureInfoReq
Expected return types are defined in Filters capabilitymetadata
-
7/29/2019 Data Integration Approaches 2
9/50
9
ASIS (3)
Metadata and Metadata storage model:
Data integration is done through Filters capability metadata
Metadata is stored in local Filters file system as a flat file.
Capability:
Inspired from OGC WMS capability specification.
Look like Dublin Core format.
Capability like structure is also used in Gannons approach
(XPOLA), for Grid services security issues.
Describes dynamic Web/Grid resources. Updated manually or dynamically.
Consists of descriptor, service and provider metadata
Inter-service communication is achieved without a third-party.
Enables chain of Filters.
-
7/29/2019 Data Integration Approaches 2
10/50
10
ASIS (4)Data Access and Filter Chaining
Filter Name Initial Data Provided After Chaining Data Provided
F1 None Earth, Fault and State Boundary
F2 Earth (raster) Earth and Fault
F3 State Boundary (vect) State Boundary
F4 Fault (vector) Fault
F1
F3
F2 F4
Fault
State Boundary
Earth
Each Filter is capable of acting asboth a server and a client
Capability integration is donethrough getCapability serviceinterface
Requests for common serviceinterfaces are created in accordancewith predefined XML schema
Fault
State Boundary
FaultEarth
-
7/29/2019 Data Integration Approaches 2
11/50
11
Outline
Data Integration Approaches
Application Specific Solutions
Application-Integration Framework
ASIS Database Federation
Ogsa-DAI
Compare ASIS with Ogsa-DAI
Digital Libraries SRB
Sompels DL
CompareASIS with SRB and Sompels DL
-
7/29/2019 Data Integration Approaches 2
12/50
12
Database Federation
Middleware consisting of database managementsystem
Uniform access to number of heterogeneousdata sources
Provides query language used to combine,contrast, analyze and manipulate the data
Data integration is done through Databaseintegration.
Combine data from multiple sources in a singleSQL statement query recreation.
Ex. Ogsa-DAI (Open Grid Service ArchitectureData Access and Integration)
-
7/29/2019 Data Integration Approaches 2
13/50
13
Ogsa-DAI (1)
Provides common Java API for accessing andintegrating data resourcessuch relational and XMLdatabases, and files- in Grid environment
Specifically designed for OGSA architecture SQL queries on relational resources and XPath
statements on XML collections
Provides data pipelining (similar to Filter chaining) via anXML document called perform document.
Allows developers to easily add or extend functionalitywithin Ogsa-DAI, activity document.
-
7/29/2019 Data Integration Approaches 2
14/50
14
Ogsa-DAI (2)
Data and storage model : Any data stored in XML or relational databases, files
No common data model
Data is provided through GDS (Grid Data Services)
Uses Ogsa-DQP (Distributed Query Processor) tocoordinate to access to multiple data services
The enactment engine is the core of Ogsa-DAI.Orchestrate running of the perform document
Information in perform document includes: The list of activities and their XML schemas andimplementation classes.
The list of role mappers and details
The info about data resource
-
7/29/2019 Data Integration Approaches 2
15/50
15
Ogsa-DAI (3)
Metadata storage model: Metadata is kept in Catalog Service (MCS)
MCS enables attribute-based querying
Metadata is for the datasets, data can be anything (binary, text ..)
Data integration is done through XML based activity file mixingactivities (in SQL queries) and metadata
Simple data access scenario A client contacts a DAISGR first to locate the GDSFs.
Accesses suitable GDSFs directly to find out more about theirproperties and the data resources they represent.
Asks GDSF to instantiate a GDS
Accesses resource by sending the GDS the GDS-Perform doc.
-
7/29/2019 Data Integration Approaches 2
16/50
16
Ogsa-DAI (4)
Metadata model:
No common schema for metadata likecapability
Defines Metadata for the datasets No schema in XML
Stored in Database tables as attributes
Defines Metadata for the Database system to
enable querying and defining activities Schema in XML (mcsActivity.xsd schema file)
Kept as XML file in the file system (mcsActivity.xml)
-
7/29/2019 Data Integration Approaches 2
17/50
17
ASIS vs. Ogsa-DAI
Ogsa-DAI does not define metadata and data in XMLschema. Metadata is mixed with Database schema. ASIShas predefined data and metadata models.
Ogsa-DAI uses any data, and they have predefined
Database schema to enable querying and accessing data. ASISs data integration is on demand and based on
capability federation. Instead, Ogsa-DAIs data integrationis coded in XML struc perform and activity documents.
Ogsa-DAI has central (MCS), ASIS has distributedmetadata approach.
Both system are based on Web Services.
Ogsa-DAI uses GridFTP, and ASIS uses NaradaBrokeringfor the performance issues in data transfers.
-
7/29/2019 Data Integration Approaches 2
18/50
18
Outline
Data Integration Approaches
Application Specific Solutions
Application-Integration Framework
ASIS Database Federation
Ogsa-DAI
Compare ASIS with Ogsa-DAI
Digital Libraries SRB
Sompels DL
CompareASIS with SRB and Sompels DL
-
7/29/2019 Data Integration Approaches 2
19/50
19
Digital Libraries
Main focus is publishing and discovering of the digital
objects.
Digital Objects : file, URL, SQL command string and any
string of bits.
Collects data from multiple different data sources.
It is little bit different from the other data integration
approaches
Data curation services such as publishing and removing
data from the data sources.
Ex. SRB (Storage Resource Broker) and Sompels
Digital Library Approach
-
7/29/2019 Data Integration Approaches 2
20/50
-
7/29/2019 Data Integration Approaches 2
21/50
21
SRB (2) Data and storage model:
Uniform storage interface Resource-specific drivers to map from defined storage to interface
Storage resources are registered within SRB as physical resources
Logical resources (LSR) enable replication.
LSR = one or more than one physical resource
Client API refers to LSR. Collections are created by LSR
Metadata storage model (MCAT): Serves both a core-metadata and domain-dependent metadata
Core-metadata is a standardized schema like Dublin Core
Stores metadata about data, collections, users, resources, methods Attribute based access and querying, updating metadata catalog
Implemented as a relational database. Oracle, DB2 or Sybase
Abstraction and Replica information for data
Global user name space and authentication
Authorization through ACL and tickets
-
7/29/2019 Data Integration Approaches 2
22/50
22
SRB (3)
Metadata and Metadata Exchange Model: MAPS (Metadata Attribute Presentation Structure)
Independent of the internal representation of theattributes inside the catalog.
Provides a uniform interface specification that can beused between user applications and the MCATcatalog and vice verse.
Structures which form the MAPS: MAPS_Query_Struct,
MAPS_Result_Struct,
MAPS_Update_Struct and MAPS_Definition_Struct
Mapping from MAPS to other models and exchangeformat. Dublin Core format is under implementation.
-
7/29/2019 Data Integration Approaches 2
23/50
23
SRB (4)
Simple data access scenario: SRB server spawns SRB agent to authenticate theuser/Application by comparing it with information stored in MCAT.
Find the location in MCAT.
Check user request against permissions stored in MCAT.
SRB agent contacts user with the result of his request.
SRB agent communicates with the user through a port specific tothis client session.
SRB server chaining scenario (integrated SRBs): First 3 steps from simple data access case.
SRB agent contacts remote SRB agent via remote SRB server. The second SRB agent returns the pointer to the data item to the
first SRB agent which passes it on to the user.
The SRB client interact with the data item directly. The federatedSRB scheme -SRB server acts as a client to another.
-
7/29/2019 Data Integration Approaches 2
24/50
24
ASIS vs. SRB
SRB doesnt define metadata in XML structure (as ASISdoes)
SRB uses any data but ASIS uses ASL
SRB keeps the metadata in Catalogue Services (MCAT).ASIS uses XML structured capability metadata
SRB has central metadata handling approach, ASIS hasdistributed metadata handling approach
ASISs data integration is based on metadata federation,SRBs data integration is based on SRB serverfederation.
Instead of Filters, SRB uses SRB server and agents foraccessing data resources.
-
7/29/2019 Data Integration Approaches 2
25/50
25
Sompels DL (1)
Scholarly communication as a network-based workflow Instead of Filters and ASL in ASIS, Sompel defines
repositories and digital objects, respectively.
Repository is a networked system that provides servicespertaining to a collection of Digital Objects
Repositories have common service interfaces. Obtain, Harvest and Put.
Two classes of participants. Data providers (DP) and Service providers (SP)
SP collect metadata from DPs (via 3 service interface);normalize and cluster it to deal with duplicates.
DP offer some type of search mechanism for their ownrepositories.
-
7/29/2019 Data Integration Approaches 2
26/50
26
Sompels DL (2) Data and storage model:
Data is the abstraction of the Digital Objects
Digital Objects = Digital data + key metadata.
Serialization of Digital Objects = Surrogates
Surrogates
Information for the value chains and service
information used at repository service interfaces.
In the XML/RDF format
Composed of dataStream and/or Entity tag elements.
Chained object is defined by keymetadataID or providerInfo.
Different storage types: book repositories, teaching objectrepositories, dataset repositories etc.
Repositories are active nodes. Repositories enable theuse and re-use of materials in many contexts.
-
7/29/2019 Data Integration Approaches 2
27/50
27
Sompels DL (3) Metadata model:
Surrogates are essentially metadata records for objects Based on Dublin Core format with domain specific extensions.
Dublin core has 15 standard entities to define resources.
For more details see http://doublincore.org
Chaining for integrating data:
Application/User doesnt need to use workflow engine or script tocreate or run the chain. (As in ASIS)
Chain (they call value chain) is hidden in the surrogates.
Surrogates are updated through the common interfaces (putobtain and harvest) of the resources.
Chain is defined in the Entity element in the surrogate document
with the Lineage sub element. Sample chaining scenario:
A paper might have references to some papers and these papersmight be references to some other papers.
Value chain does not stop.
Papers have different metadata (value added) through value chain
http://doublincore.org/http://doublincore.org/ -
7/29/2019 Data Integration Approaches 2
28/50
28
ASIS vs. Sompels Approach
Instead of Filters and ASL in ASIS, Sompel defines repositoriesand digital objects respectively
DP correspond to End-Filters, and SP correspond to Filters in ASIS
ASIS do not have publishing or putting service interfaces Obtain corresponds to getData in ASIS
Harvest corresponds to getCapabilities in ASIS Both have distributed metadata approaches for data integration
ASIS direct communication between Filters by usingGetCapabilities interface
Sompes DL direct communication between repositories andservices by using Harvest interface
Sompels DL uses Dublin Core for the representation of theresources ASIS uses its own schema.
ASIS uses ASL for the representation of the data - Sompelsapproach doesnt have common data model.
-
7/29/2019 Data Integration Approaches 2
29/50
29
Summary
Application-Integration Framework (ASIS) Easy to add new sources
Using online Filters providing required adaptors
peer-to-peer chain of Filters
no central metadata catalog server Distributedcapability exchange and aggregation
SOA
Re-usable components (Filters) for different
applications in predefined domain Implications of Filter services Scalable and Fault-tolerant
Load-balancing and caching
Dynamically updating capability metadata
-
7/29/2019 Data Integration Approaches 2
30/50
30
THANKS !
-
7/29/2019 Data Integration Approaches 2
31/50
31
APPENDIX
-
7/29/2019 Data Integration Approaches 2
32/50
32
Capability in Grid Services Security
XPOLA The infrastructure is built on a peer-to-peer chain-of-trust model. No
central admins
WS-Security compliant
Extensible PKI and SAML based
Dynamic and reusable (manually or automatically generated)
Composed of two sectors.
Policy document (SAML, lifetime info, binding info etc.)
Providers signature
Existing grid security solutions to fine-grained authorizationwere not addressing general Web/Grid services incompliant with Web Services security specs.
With central admins, other approaches dont addressdynamic services
-
7/29/2019 Data Integration Approaches 2
33/50
33
Sample Capabilities File (too simplified) GIS Domain
CGL_Mapping
CGL_Mapping WMS
image/GIFimage/PNG
-
7/29/2019 Data Integration Approaches 2
34/50
34
Dublin Core
Challenge of resource description and discovery
Language for making a particular class of statementsabout resources
There 2 namespaces Dublin Core element set (dc)and
Dublin Core qualifiers (dcq ex. dcq:iso8601). Some of Dublin core metadata element set
Title (dc:title), subject, description, creator, publisher, type,format, source, language, rights
Using DC in RDF, specifications for DC in RDF (work inprogress)
Resource has(verb) property(dc:creator) X(dc:Ahmet)
-
7/29/2019 Data Integration Approaches 2
35/50
35
Sample Dublin Core
http://www.ils.unc.edu/mrc/jcdl2006/slides/kunze.pdf
-
7/29/2019 Data Integration Approaches 2
36/50
36
Open Archive Initiative
OAI
-
7/29/2019 Data Integration Approaches 2
37/50
37
OAI
Deals with e-print server world Need to develop services that permitted searching
across papers housed at multiple repositories
Repositories also needed capabilities to automaticallyidentify and copy papers that had been deposited inthem.
Definition of an interface to permit e-print servers toexpose the metadata for the papers that it held.
Service providers with similar metadata standards needto harvest this metadata
Service providers act as a federation of repositories, byindexing documents, so that multiple collections cen besearched as though they form a single collection
-
7/29/2019 Data Integration Approaches 2
38/50
38
OAI-PMH
For the variety of the communities engaged inpublishing content on the Web
Any networked server can emplly the protocol toenable service providers to collect its metadata
HTTP-based request-response transaction Service Providers
Harvest metadata from Data Providers using the OAIprotocol and use the returned metadata as a basis for
building value-added services. Data Providers (repositories) Adopt OAI technical as a means of exposing
metadata about their content.
-
7/29/2019 Data Integration Approaches 2
39/50
39
Comments on OAI
OAI-PMH is ultimately only as useful as themetadata it transports.
The tendency of implementers to almostexclusively apply the lowest commondenominator of unqualified dublin core makes itdifficult to implement more advanced searchinterface features.
Content providers should prefer moreexpressive metadata schema like MARC orqualified DC and find ways to augment human-generated descriptive metadata.
-
7/29/2019 Data Integration Approaches 2
40/50
40
Sompels Digital Library
Approach
-
7/29/2019 Data Integration Approaches 2
41/50
41
Sompels Approach
Hierarchy steps
http://msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf
Sompels DL
-
7/29/2019 Data Integration Approaches 2
42/50
42
Sompel s DL
Data Model
msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf
-
7/29/2019 Data Integration Approaches 2
43/50
-
7/29/2019 Data Integration Approaches 2
44/50
44
Ogsa-DAI Figurehttp://www.globus.org/grid_software/data/dai.php
-
7/29/2019 Data Integration Approaches 2
45/50
45
Perform Document
http://www.ogsadai.org.uk/documentation/ogsadai-wsi-2.2/doc/interaction/Perform.html
-
7/29/2019 Data Integration Approaches 2
46/50
-
7/29/2019 Data Integration Approaches 2
47/50
47
MCAT vs. MCS
MCAT can be used just with SRB
MCS can be used just in OGSA architecture
MCAT stores both physical and logical
addresses MCS stores logical metadata attributes and
handles that can be resolved by a data location
or data access services.
They can both be extended for serving
application-specific metadata, but they dont
have generalized way for doing that.
-
7/29/2019 Data Integration Approaches 2
48/50
48
SRB
-
7/29/2019 Data Integration Approaches 2
49/50
49
SRB
CLIENT
-
7/29/2019 Data Integration Approaches 2
50/50
50
CLIENT
Example interaction with SRB using Scommands: Sinit
Start interaction with SRB
Spwd
Display current position within SRB repository
Smeta -iI UDSMD0=author I UDSMD1=bob myfile
Add metadata describing the author the file
Smeta -iI UDSMD0=author I UDSMD1=arthur
Search for files with author metadata set as arthur
Sget myFile
Copy myFile from SRB to local storage
SreplicateS anotherResource myFile
Create a replica of myFile on anotherResource
Srm myFile
Remove myFile (and all replicas) from SRB
Sexit
End interaction with SRB