data integration approaches 2

Upload: sheoran3

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Data Integration Approaches 2

    1/50

    1

    Approaches to the Integration of

    Distributed and Heterogeneous

    Data Resources

    Ahmet Sayar

    Indiana University

    Computer Science Department

  • 7/29/2019 Data Integration Approaches 2

    2/50

    2

    Motivation

    Integrating data from multiple data sources

    Distributed query and transactions of data.

    Definitions and adoptions of data, metadata and theirstorages.

    Accessing the data seamlessly.

    Transparency, support for heterogeneity, extensibilityand scalability.

  • 7/29/2019 Data Integration Approaches 2

    3/50

    3

    Outline

    Data Integration Approaches

    Application Specific Solutions

    Application-Integration Framework

    ASIS (Application Specific Information System) Database Federation

    Ogsa-DAI (Ogsa-Data Access and Integration)

    CompareASIS with Ogsa-DAI

    Digital Libraries SRB (Storage Resource Broker)

    Sompels Digital Library Approach

    CompareASIS with SRB and Sompels DL

  • 7/29/2019 Data Integration Approaches 2

    4/50

  • 7/29/2019 Data Integration Approaches 2

    5/50

  • 7/29/2019 Data Integration Approaches 2

    6/50

    6

    Application-Integration

    Framework

    It can also be called component-based framework Such as CORBA or Filters with common interfaces

    Not necessarily address data integration issues

    Based on common data model (such as CML and GML) With adaptors, if the source change the adaptor may have tochange, but application may never see it.

    Adding a new source is easy a new adaptor may need to be written.

    The adaptor may already be exist online. No need to detailed system knowledge

    Ex. ASIS - OGC GIS Application Integration Framework

  • 7/29/2019 Data Integration Approaches 2

    7/50

    7

    ASIS (1)

    Enables inter-service communication through well-defined service interfaces, message formats andcapabilities metadata.

    Data model is ASL (Application Specific Lang.)

    Metadata model is capability document Data and metadata have common predefined schema

    Components are Filter Services

    Web Services, comon service interfaces defined in WSDL

    Information/data services enabling distributed access,querying and transformation through their predictableinput/output interfaces.

    Chainable, located, and capable of updating theirmetadata manually or dynamically

  • 7/29/2019 Data Integration Approaches 2

    8/50

    8

    ASIS (2)

    Data and data storage model

    Any data can be integrated into the system aftertransforming to ASL.

    Heterogeneity is handled at the end-Filters with adaptors.

    ASL is community-accepted application specific language GML (Geographic Markup Lang.) in GIS applications

    CML (Chemistry Markup Lang.) in Chemistry applications

    Filters common service interfaces getCapabilities, getData, getFeatureInfo.

    Requests to Filters interfaces getCapabilitiesReq, getDataReq, getFeatureInfoReq

    Expected return types are defined in Filters capabilitymetadata

  • 7/29/2019 Data Integration Approaches 2

    9/50

    9

    ASIS (3)

    Metadata and Metadata storage model:

    Data integration is done through Filters capability metadata

    Metadata is stored in local Filters file system as a flat file.

    Capability:

    Inspired from OGC WMS capability specification.

    Look like Dublin Core format.

    Capability like structure is also used in Gannons approach

    (XPOLA), for Grid services security issues.

    Describes dynamic Web/Grid resources. Updated manually or dynamically.

    Consists of descriptor, service and provider metadata

    Inter-service communication is achieved without a third-party.

    Enables chain of Filters.

  • 7/29/2019 Data Integration Approaches 2

    10/50

    10

    ASIS (4)Data Access and Filter Chaining

    Filter Name Initial Data Provided After Chaining Data Provided

    F1 None Earth, Fault and State Boundary

    F2 Earth (raster) Earth and Fault

    F3 State Boundary (vect) State Boundary

    F4 Fault (vector) Fault

    F1

    F3

    F2 F4

    Fault

    State Boundary

    Earth

    Each Filter is capable of acting asboth a server and a client

    Capability integration is donethrough getCapability serviceinterface

    Requests for common serviceinterfaces are created in accordancewith predefined XML schema

    Fault

    State Boundary

    FaultEarth

  • 7/29/2019 Data Integration Approaches 2

    11/50

    11

    Outline

    Data Integration Approaches

    Application Specific Solutions

    Application-Integration Framework

    ASIS Database Federation

    Ogsa-DAI

    Compare ASIS with Ogsa-DAI

    Digital Libraries SRB

    Sompels DL

    CompareASIS with SRB and Sompels DL

  • 7/29/2019 Data Integration Approaches 2

    12/50

    12

    Database Federation

    Middleware consisting of database managementsystem

    Uniform access to number of heterogeneousdata sources

    Provides query language used to combine,contrast, analyze and manipulate the data

    Data integration is done through Databaseintegration.

    Combine data from multiple sources in a singleSQL statement query recreation.

    Ex. Ogsa-DAI (Open Grid Service ArchitectureData Access and Integration)

  • 7/29/2019 Data Integration Approaches 2

    13/50

    13

    Ogsa-DAI (1)

    Provides common Java API for accessing andintegrating data resourcessuch relational and XMLdatabases, and files- in Grid environment

    Specifically designed for OGSA architecture SQL queries on relational resources and XPath

    statements on XML collections

    Provides data pipelining (similar to Filter chaining) via anXML document called perform document.

    Allows developers to easily add or extend functionalitywithin Ogsa-DAI, activity document.

  • 7/29/2019 Data Integration Approaches 2

    14/50

    14

    Ogsa-DAI (2)

    Data and storage model : Any data stored in XML or relational databases, files

    No common data model

    Data is provided through GDS (Grid Data Services)

    Uses Ogsa-DQP (Distributed Query Processor) tocoordinate to access to multiple data services

    The enactment engine is the core of Ogsa-DAI.Orchestrate running of the perform document

    Information in perform document includes: The list of activities and their XML schemas andimplementation classes.

    The list of role mappers and details

    The info about data resource

  • 7/29/2019 Data Integration Approaches 2

    15/50

    15

    Ogsa-DAI (3)

    Metadata storage model: Metadata is kept in Catalog Service (MCS)

    MCS enables attribute-based querying

    Metadata is for the datasets, data can be anything (binary, text ..)

    Data integration is done through XML based activity file mixingactivities (in SQL queries) and metadata

    Simple data access scenario A client contacts a DAISGR first to locate the GDSFs.

    Accesses suitable GDSFs directly to find out more about theirproperties and the data resources they represent.

    Asks GDSF to instantiate a GDS

    Accesses resource by sending the GDS the GDS-Perform doc.

  • 7/29/2019 Data Integration Approaches 2

    16/50

    16

    Ogsa-DAI (4)

    Metadata model:

    No common schema for metadata likecapability

    Defines Metadata for the datasets No schema in XML

    Stored in Database tables as attributes

    Defines Metadata for the Database system to

    enable querying and defining activities Schema in XML (mcsActivity.xsd schema file)

    Kept as XML file in the file system (mcsActivity.xml)

  • 7/29/2019 Data Integration Approaches 2

    17/50

    17

    ASIS vs. Ogsa-DAI

    Ogsa-DAI does not define metadata and data in XMLschema. Metadata is mixed with Database schema. ASIShas predefined data and metadata models.

    Ogsa-DAI uses any data, and they have predefined

    Database schema to enable querying and accessing data. ASISs data integration is on demand and based on

    capability federation. Instead, Ogsa-DAIs data integrationis coded in XML struc perform and activity documents.

    Ogsa-DAI has central (MCS), ASIS has distributedmetadata approach.

    Both system are based on Web Services.

    Ogsa-DAI uses GridFTP, and ASIS uses NaradaBrokeringfor the performance issues in data transfers.

  • 7/29/2019 Data Integration Approaches 2

    18/50

    18

    Outline

    Data Integration Approaches

    Application Specific Solutions

    Application-Integration Framework

    ASIS Database Federation

    Ogsa-DAI

    Compare ASIS with Ogsa-DAI

    Digital Libraries SRB

    Sompels DL

    CompareASIS with SRB and Sompels DL

  • 7/29/2019 Data Integration Approaches 2

    19/50

    19

    Digital Libraries

    Main focus is publishing and discovering of the digital

    objects.

    Digital Objects : file, URL, SQL command string and any

    string of bits.

    Collects data from multiple different data sources.

    It is little bit different from the other data integration

    approaches

    Data curation services such as publishing and removing

    data from the data sources.

    Ex. SRB (Storage Resource Broker) and Sompels

    Digital Library Approach

  • 7/29/2019 Data Integration Approaches 2

    20/50

  • 7/29/2019 Data Integration Approaches 2

    21/50

    21

    SRB (2) Data and storage model:

    Uniform storage interface Resource-specific drivers to map from defined storage to interface

    Storage resources are registered within SRB as physical resources

    Logical resources (LSR) enable replication.

    LSR = one or more than one physical resource

    Client API refers to LSR. Collections are created by LSR

    Metadata storage model (MCAT): Serves both a core-metadata and domain-dependent metadata

    Core-metadata is a standardized schema like Dublin Core

    Stores metadata about data, collections, users, resources, methods Attribute based access and querying, updating metadata catalog

    Implemented as a relational database. Oracle, DB2 or Sybase

    Abstraction and Replica information for data

    Global user name space and authentication

    Authorization through ACL and tickets

  • 7/29/2019 Data Integration Approaches 2

    22/50

    22

    SRB (3)

    Metadata and Metadata Exchange Model: MAPS (Metadata Attribute Presentation Structure)

    Independent of the internal representation of theattributes inside the catalog.

    Provides a uniform interface specification that can beused between user applications and the MCATcatalog and vice verse.

    Structures which form the MAPS: MAPS_Query_Struct,

    MAPS_Result_Struct,

    MAPS_Update_Struct and MAPS_Definition_Struct

    Mapping from MAPS to other models and exchangeformat. Dublin Core format is under implementation.

  • 7/29/2019 Data Integration Approaches 2

    23/50

    23

    SRB (4)

    Simple data access scenario: SRB server spawns SRB agent to authenticate theuser/Application by comparing it with information stored in MCAT.

    Find the location in MCAT.

    Check user request against permissions stored in MCAT.

    SRB agent contacts user with the result of his request.

    SRB agent communicates with the user through a port specific tothis client session.

    SRB server chaining scenario (integrated SRBs): First 3 steps from simple data access case.

    SRB agent contacts remote SRB agent via remote SRB server. The second SRB agent returns the pointer to the data item to the

    first SRB agent which passes it on to the user.

    The SRB client interact with the data item directly. The federatedSRB scheme -SRB server acts as a client to another.

  • 7/29/2019 Data Integration Approaches 2

    24/50

    24

    ASIS vs. SRB

    SRB doesnt define metadata in XML structure (as ASISdoes)

    SRB uses any data but ASIS uses ASL

    SRB keeps the metadata in Catalogue Services (MCAT).ASIS uses XML structured capability metadata

    SRB has central metadata handling approach, ASIS hasdistributed metadata handling approach

    ASISs data integration is based on metadata federation,SRBs data integration is based on SRB serverfederation.

    Instead of Filters, SRB uses SRB server and agents foraccessing data resources.

  • 7/29/2019 Data Integration Approaches 2

    25/50

    25

    Sompels DL (1)

    Scholarly communication as a network-based workflow Instead of Filters and ASL in ASIS, Sompel defines

    repositories and digital objects, respectively.

    Repository is a networked system that provides servicespertaining to a collection of Digital Objects

    Repositories have common service interfaces. Obtain, Harvest and Put.

    Two classes of participants. Data providers (DP) and Service providers (SP)

    SP collect metadata from DPs (via 3 service interface);normalize and cluster it to deal with duplicates.

    DP offer some type of search mechanism for their ownrepositories.

  • 7/29/2019 Data Integration Approaches 2

    26/50

    26

    Sompels DL (2) Data and storage model:

    Data is the abstraction of the Digital Objects

    Digital Objects = Digital data + key metadata.

    Serialization of Digital Objects = Surrogates

    Surrogates

    Information for the value chains and service

    information used at repository service interfaces.

    In the XML/RDF format

    Composed of dataStream and/or Entity tag elements.

    Chained object is defined by keymetadataID or providerInfo.

    Different storage types: book repositories, teaching objectrepositories, dataset repositories etc.

    Repositories are active nodes. Repositories enable theuse and re-use of materials in many contexts.

  • 7/29/2019 Data Integration Approaches 2

    27/50

    27

    Sompels DL (3) Metadata model:

    Surrogates are essentially metadata records for objects Based on Dublin Core format with domain specific extensions.

    Dublin core has 15 standard entities to define resources.

    For more details see http://doublincore.org

    Chaining for integrating data:

    Application/User doesnt need to use workflow engine or script tocreate or run the chain. (As in ASIS)

    Chain (they call value chain) is hidden in the surrogates.

    Surrogates are updated through the common interfaces (putobtain and harvest) of the resources.

    Chain is defined in the Entity element in the surrogate document

    with the Lineage sub element. Sample chaining scenario:

    A paper might have references to some papers and these papersmight be references to some other papers.

    Value chain does not stop.

    Papers have different metadata (value added) through value chain

    http://doublincore.org/http://doublincore.org/
  • 7/29/2019 Data Integration Approaches 2

    28/50

    28

    ASIS vs. Sompels Approach

    Instead of Filters and ASL in ASIS, Sompel defines repositoriesand digital objects respectively

    DP correspond to End-Filters, and SP correspond to Filters in ASIS

    ASIS do not have publishing or putting service interfaces Obtain corresponds to getData in ASIS

    Harvest corresponds to getCapabilities in ASIS Both have distributed metadata approaches for data integration

    ASIS direct communication between Filters by usingGetCapabilities interface

    Sompes DL direct communication between repositories andservices by using Harvest interface

    Sompels DL uses Dublin Core for the representation of theresources ASIS uses its own schema.

    ASIS uses ASL for the representation of the data - Sompelsapproach doesnt have common data model.

  • 7/29/2019 Data Integration Approaches 2

    29/50

    29

    Summary

    Application-Integration Framework (ASIS) Easy to add new sources

    Using online Filters providing required adaptors

    peer-to-peer chain of Filters

    no central metadata catalog server Distributedcapability exchange and aggregation

    SOA

    Re-usable components (Filters) for different

    applications in predefined domain Implications of Filter services Scalable and Fault-tolerant

    Load-balancing and caching

    Dynamically updating capability metadata

  • 7/29/2019 Data Integration Approaches 2

    30/50

    30

    THANKS !

  • 7/29/2019 Data Integration Approaches 2

    31/50

    31

    APPENDIX

  • 7/29/2019 Data Integration Approaches 2

    32/50

    32

    Capability in Grid Services Security

    XPOLA The infrastructure is built on a peer-to-peer chain-of-trust model. No

    central admins

    WS-Security compliant

    Extensible PKI and SAML based

    Dynamic and reusable (manually or automatically generated)

    Composed of two sectors.

    Policy document (SAML, lifetime info, binding info etc.)

    Providers signature

    Existing grid security solutions to fine-grained authorizationwere not addressing general Web/Grid services incompliant with Web Services security specs.

    With central admins, other approaches dont addressdynamic services

  • 7/29/2019 Data Integration Approaches 2

    33/50

    33

    Sample Capabilities File (too simplified) GIS Domain

    CGL_Mapping

    CGL_Mapping WMS

    image/GIFimage/PNG

  • 7/29/2019 Data Integration Approaches 2

    34/50

    34

    Dublin Core

    Challenge of resource description and discovery

    Language for making a particular class of statementsabout resources

    There 2 namespaces Dublin Core element set (dc)and

    Dublin Core qualifiers (dcq ex. dcq:iso8601). Some of Dublin core metadata element set

    Title (dc:title), subject, description, creator, publisher, type,format, source, language, rights

    Using DC in RDF, specifications for DC in RDF (work inprogress)

    Resource has(verb) property(dc:creator) X(dc:Ahmet)

  • 7/29/2019 Data Integration Approaches 2

    35/50

    35

    Sample Dublin Core

    http://www.ils.unc.edu/mrc/jcdl2006/slides/kunze.pdf

  • 7/29/2019 Data Integration Approaches 2

    36/50

    36

    Open Archive Initiative

    OAI

  • 7/29/2019 Data Integration Approaches 2

    37/50

    37

    OAI

    Deals with e-print server world Need to develop services that permitted searching

    across papers housed at multiple repositories

    Repositories also needed capabilities to automaticallyidentify and copy papers that had been deposited inthem.

    Definition of an interface to permit e-print servers toexpose the metadata for the papers that it held.

    Service providers with similar metadata standards needto harvest this metadata

    Service providers act as a federation of repositories, byindexing documents, so that multiple collections cen besearched as though they form a single collection

  • 7/29/2019 Data Integration Approaches 2

    38/50

    38

    OAI-PMH

    For the variety of the communities engaged inpublishing content on the Web

    Any networked server can emplly the protocol toenable service providers to collect its metadata

    HTTP-based request-response transaction Service Providers

    Harvest metadata from Data Providers using the OAIprotocol and use the returned metadata as a basis for

    building value-added services. Data Providers (repositories) Adopt OAI technical as a means of exposing

    metadata about their content.

  • 7/29/2019 Data Integration Approaches 2

    39/50

    39

    Comments on OAI

    OAI-PMH is ultimately only as useful as themetadata it transports.

    The tendency of implementers to almostexclusively apply the lowest commondenominator of unqualified dublin core makes itdifficult to implement more advanced searchinterface features.

    Content providers should prefer moreexpressive metadata schema like MARC orqualified DC and find ways to augment human-generated descriptive metadata.

  • 7/29/2019 Data Integration Approaches 2

    40/50

    40

    Sompels Digital Library

    Approach

  • 7/29/2019 Data Integration Approaches 2

    41/50

    41

    Sompels Approach

    Hierarchy steps

    http://msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf

    Sompels DL

  • 7/29/2019 Data Integration Approaches 2

    42/50

    42

    Sompel s DL

    Data Model

    msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf

  • 7/29/2019 Data Integration Approaches 2

    43/50

  • 7/29/2019 Data Integration Approaches 2

    44/50

    44

    Ogsa-DAI Figurehttp://www.globus.org/grid_software/data/dai.php

  • 7/29/2019 Data Integration Approaches 2

    45/50

    45

    Perform Document

    http://www.ogsadai.org.uk/documentation/ogsadai-wsi-2.2/doc/interaction/Perform.html

  • 7/29/2019 Data Integration Approaches 2

    46/50

  • 7/29/2019 Data Integration Approaches 2

    47/50

    47

    MCAT vs. MCS

    MCAT can be used just with SRB

    MCS can be used just in OGSA architecture

    MCAT stores both physical and logical

    addresses MCS stores logical metadata attributes and

    handles that can be resolved by a data location

    or data access services.

    They can both be extended for serving

    application-specific metadata, but they dont

    have generalized way for doing that.

  • 7/29/2019 Data Integration Approaches 2

    48/50

    48

    SRB

  • 7/29/2019 Data Integration Approaches 2

    49/50

    49

    SRB

    CLIENT

  • 7/29/2019 Data Integration Approaches 2

    50/50

    50

    CLIENT

    Example interaction with SRB using Scommands: Sinit

    Start interaction with SRB

    Spwd

    Display current position within SRB repository

    Smeta -iI UDSMD0=author I UDSMD1=bob myfile

    Add metadata describing the author the file

    Smeta -iI UDSMD0=author I UDSMD1=arthur

    Search for files with author metadata set as arthur

    Sget myFile

    Copy myFile from SRB to local storage

    SreplicateS anotherResource myFile

    Create a replica of myFile on anotherResource

    Srm myFile

    Remove myFile (and all replicas) from SRB

    Sexit

    End interaction with SRB