data services: addressing the challenges of transformation to a knowledge-driven enterprise
DESCRIPTION
Data Services: Addressing the challenges of transformation to a knowledge-driven enterprise. Sri Gopalan Booz Allen Hamilton. Agenda. Challenges of transitioning to a knowledge-driven enterprise Facets of an effective Data Services solution An approach to realizing Data Services - PowerPoint PPT PresentationTRANSCRIPT
Data Services:Addressing the challenges of transformation to a knowledge-driven enterprise
Sri GopalanBooz Allen Hamilton
Agenda
Challenges of transitioning to a knowledge-driven enterprise
Facets of an effective Data Services solution An approach to realizing Data Services The Way Ahead Questions and Comments
Challenges of transitioning to a knowledge-driven enterprise
The current production rate of digital information exceeds the ability to process it
Technology research firm IDC determined that the world generated 161 billion gigabytes of digital information last year
Data is contained in a multitude of unstructured (images, video, free text) and structured ( RDBMS, XML, etc…) formats
Greater policy requirements both from regulatory concerns (i.e. Sarbanes-Oxley, HIPAA, etc…) and enterprise interests (i.e. security constraints, etc…)
Organizations are struggling to get a handle on what information they have, how to search for it, and how to protect it
Vol
ume
of D
ata
Cre
atio
n
Time
Within many enterprises, there is no consistent way to discover, access, or share data
Dept. A
Dept D.
Dept. B
Dept. C
DB
XML
Proprietary
JDBC
HTTPEmailERP
Portals Web Application
Stand-alone Apps
Without a priori knowledge of where systems are, how to access them, and how to query them, users find it difficult to get all the information that they need
Providing Business Context to Search
The key element to search is to provide search results relevant to the given business context
While a consumer might make a request in his/her business context, the data providers may interpret that request in their own divergent business context
Org. A
Apps
Data Format
Data
Service
Interface
Org. B
Apps
Data Format
Data
Service
Interface
“I need a tank…”
“I have scuba tanks”
“I have gas tanks”
Facets of an effective Data Services solution
“Web 2.0” technologies provide enhanced collaboration and spark community-building activities
HousingMaps = Google Maps + Craigslist.com
JobMaps = Google Maps + Indeed Job Search
Mashups are a great example of re-purposing data, but they are still point-to-point and require a lot of redundant developer effort to create each one
Lessons Learned from Social Software techniques
Leverage industry strengths Use technologies and standards that are well supported by
commercial and open-source tools in order to facilitate greater adoption
Greatest common factor approach Develop solutions that meets the requirements of the widest
based of users, including those that may be technologically limited or resource constrained
Evolve with the community Develop solutions that are flexible and adaptable enough to
change over time and incorporate community feedback and contributions
Keep it Simple While Data Services solutions may perform very complicated
process in the back end, try to keep the front-end interfaces to it as simple and easy to work with as possible
The importance of Metadata The main purpose of metadata, or data about data, is to
speed up and enrich searching for resources “What data services have information on recent financial filings?” “Which data services are associated with a HR data within an
enterprise taxonomy?”
Metadata Type Description Examples
Syntactic Describes the physical, syntactic markup of individual data elements (formatting, field markers)
Datatype, Field Length, Field Name, Tag Names, Flat File Makers
Structural Describes the logical grouping of individual of data elements (i.e. entity-attribute groupings)
Logical schema definitions (PersonRecord: PersonName, PersonSSN, PersonDOB)
Semantic Describes the codified meaning of data elements, and their relationships, including any rules or constraints on those relationships
Person was-born on PersonDOB, and was-born once and only once
Ex
pre
ss
ive
ne
ss
Types of MetadataTypes of Metadata
The Need for Data Discovery Data Discovery provides service consumer agents with a common
facility to distribute a search for relevant information across data assets within the enterprise including those that are known a priori and those that are unexpected
Data Discovery exposes the essential metadata of a data resource (e.g. id, title, summary), not the data resource itself
Potential usage scenarios: An consumer can “subscribe” to a Data Discovery service to
automatically receive streams of information about topics he/she is interested in from a variety of data providers he/she may or may not know about
Data providers, both small and large, can more directly advertise their information to interested service consumer agents that it may or may not know about
An analyst may request more metadata about a data resource before accessing it
Example Data Discovery Scenario
1
2
3
3
3
4
5
SearchService #1 DB
SearchService #2 XML
SearchService #3
Video
SearchService #4
Images
SearchAggregator
ServiceDiscovery
UDDI
1. Consumer makes discovery request
2. Search Aggregator queries Service Discovery for
relevant Search Services
3. Search Aggregator distributes request to relevant
Search Services
4. Search Aggregator aggregates search results
5. Search Aggregator returns all search results
The need for Data Access and Delivery
Once a data resource of interest has been identified via Data Discovery, a service consumer might want to “access” or “deliver” that data resource for further processing
Data Access and Delivery capabilities provide service consumer agents with a common facility to synchronously fetch a data resource or asynchronously route it to a pre-determined endpoint
Potential usage scenarios: An user at his/her workstation can directly “access” a data resource for
detailed inspection An field technician on the job site can use his/her mobile device to
“deliver” a data resource to his/her computer at work to analyze later Data providers can lower the cost of integration by supporting a common
data retrieval interface that is well-understood throughout the local enterprise and industry
Example Data Access & Delivery Scenario
Messaging
Infrastructure
RetrieveService #1
DB
CallbackInterface
1
2a
2b3a
3a1. Consumer makes data access request
2a. Retrieve Service returns requested information
2b. Retrieve Service forwards requested information to
the Messaging Infrastructure
3a. Messaging Infrastructure routes requested
information to service consumer
3b. Messaging Infrastructure routes requested
information to service consumer receiver agent
implementing a Callback Interface
Major issues facing distributed information sharing
Must support for a number of interaction models Request-response, subscribe-push, probe and match,
authenticated and/or single use of data, etc… Must support a variety of metadata and content formats
Atom, Dublin Core, Images, Video, PDF, Open Document, etc… Different types of data lend themselves to be queried by
different mechanisms XML can be natively searched XQuery Images cannot be natively searched with XQuery
Must be designed for controlled evolution Do not want the addition of new features to alienate current users
through constant upgrades or revisions Discourage specification “lip service” by avoiding unbounded
fields
An approach to realizing Data Services
Data Service Objectives Address the need to enable enterprise-wide
data discovery and aggregation across any number of service implementations while offering the end users with relevant information
Enable horizontal discovery, access, and consumption of data of relevance, regardless of physical location, data type, and/or technical implementation
Support a variety of messaging patterns, security and policy requirements, and data needs
Profile-Based Approach to achieving Data Services
Data Services specifications should focus on capturing the high-level process and use-cases requirements (i.e. the need to search against metadata and content), rather than the low-level realizations of those features (i.e. XQuery vs. Keyword search)
Abstract Data Services interface focused on defining a high-level construct to capture intended behaviors that will be implemented by pluggable profiles
Inspired by token profiles within WS-Security Loosely coupled specification that enables service providers to
add new capabilities without having to change the WSDL Enables service providers to only implement those profiles that
satisfy their specific requirements
What are the profiles we need to consider?
Context – What is the business context of the data service operation (search, retrieve)
Ex. A set of taxonomy key-value pairs to search against a UDDI registry
Metadata – What are the metadata formats that I would like to interact against?
Ex. Dublin Core Metadata Element Set, Atom 1.0, RSS Content – What are the content types that I would like to
interact with? Ex. PDF, Open Document, Open XML, JPEG, MPEG2
Query – Given the type of metadata and/or content, how would I like to query for information?
Ex. Keyword search, XQuery request, SPARQL requests
Data Services Request
The combination of different “profiles” can have measurable impact
While “CriminalMetadata”, “MugShotContent”, “CriminalQL” and “ImageMatch” do not exist today, if they are introduced in the future it should not significantly alter the way we process requests for information
Metadata Profile: CriminalMetadata
Query Profile: CriminalQL
Find Where sex = “male” and race = “white” and height >= “5-09” and height <= “5-10”
Content Profile: MugShotContent
Query Profile: ImageMatch
Encouraging collaboration with REST and/or SOAP
SOAP is a protocol specification that defines a uniform way of passing XML-encoded data that abstracts the physical transport layer.
Representational State Transfer (REST) are a set of architectural principles that loosely describes any simple interface that uses the use XML over HTTP without an additional messaging layer such as SOAP
SOAP and REST are two different approaches that serve different needs
In many areas the provided functionality overlaps and causes a bit of contention
The two approaches, if used properly, can be complementary and will help to meet the overall data services needs
RESTful feeds may be appropriate for disparate content subscriptions
Source: RSS--Promising Technology for Building Customer Relationships (http://www.mediathink.com/rss/rss_marketers2.asp)
SOAP-based messages are better suited for complex requests and messaging patterns
RetrieveService #1 DB
RetrieveService #2 XML
RetrieveService #3
Video
RetrieveService #4
Images
SubscribeService
CallbackInterface
Subscribe
Notify
Scheduled Pull
Scheduled Pull
Scheduled Pull
Supporting standards that may help to advance Data Services initiatives
There is a no existing set of standards that fully supports the functionality of a complete Data Services solution
Need Standard(s)
Service Registry UDDI v3, ebXML Registry
Security/Policy Concerns WS-Security, SAML 2.0, XACML, WS-Policy
Notifications and Eventing WS-Notification, WS-Eventing, WS-EventNotification
Asynchronous Behavior WS-Addressing
Reliable Messaging WS-ReliableMessaging
Query Languages XQuery 1.0, XPath, SPARQL
Metadata Formats Dublin Core, Atom 1.0
Search Functionality Z39.50
The Way Ahead
OASIS Data Services Framework Technical Committee (OASIS DSF TC)
Goals and objectives for the TC include: Collect, analyze and document the requirements for
data management and sharing in a networked environment where data services lie under different domains of ownership and stewardship
Aid architects in understanding the conceptual patterns of interaction pertaining to data oriented operations
Create an abstract specification normatively describing a framework of operations to manage and retrieve data in a services environment, across ownership and stewardship boundaries.
Describe service patterns and interactions between a provider, consumer, and other resources and entities
OASIS Data Services Framework Technical Committee (OASIS DSF TC)
Out of Scope Items: Define a mapping of the functions and elements
described in the specifications to any programming language, to any particular messaging middleware, or to specific network transports.
Define new key query algorithms, metadata specifications, or content specifications.
Define concepts or renderings for functions that are of wider applicability including but not limited to:
Addressing Query frameworks Routing Reliable message exchange
Summary
The need for a distributed discovery, aggregation, and access mechanism becoming more an more important
Any Data Services solution must account for a growing number of metadata specifications, content formats, and query mechanism
WS-Security demonstrates that a a profile-based solution can meet the diverse needs of a community
OASIS Data Service Framework TC will identify and fill the gaps to achieve a complete Data Services solution
Questions and Comments