click to add text © 2012 ibm corporation 1 streams – datastage integration infosphere streams...
TRANSCRIPT
© 2012 IBM Corporation1
Streams – DataStage Integration
InfoSphere Streams Version 3.0
Mike KorandaRelease Architect
© 2012 IBM Corporation2
Agenda
What is InfoSphere Information Server and DataStage? Integration use cases Architecture of the integration solution Tooling
© 2012 IBM Corporation3
Transform Enterprise Business Processes & Applications with
Trusted InformationDeliver Trusted Information for Data Warehousing and
Business Analytics
Build and Manage a
Single View
Integrate & Govern Big
Data
Make Enterprise Applications more Efficient
Consolidate and Retire
Applications
Secure Enterprise Data & Ensure Compliance
Information Integration Vision
Address information integration in context of broad and changing environment
Simplify & accelerate: Design once and leverage anywhere
© 2012 IBM Corporation4
Traditional ApproachStructured, analytical, logical
New ApproachCreative, holistic thought, intuition
Internal App Data
Data Warehouse
Traditional Sources
StructuredRepeatable
Linear
Transaction Data
ERP data
Mainframe Data
OLTP System Data
HadoopStreams
New Sources
UnstructuredExploratory
Iterative
Web Logs
Social Data
Text & Images
Sensor Data
RFID
DataWarehouse
HadoopStreams
TraditionalSources
NewSources
IBM Comprehensive Vision
InformationIntegration &Governance
© 2012 IBM Corporation5
IBM InfoSphere DataStage
Industry Leading Data Integration for the EnterpriseSimple to design - Powerful to deploy
Rich capabilities spanning six critical dimensions
Developer Productivity Rich user interface features that simplify the design process and metadata management requirements
Transformation ComponentsExtensive set of pre-built objects that act on data to satisfy both simple & complex data integration tasks
Connectivity ObjectsNative access to common industry databases and applications exploiting key features of each
Runtime Scalability & Flexibility Performant engine providing unlimited scalability through all objects tasks in both batch and real-time
Operational ManagementSimple management of the operational environment lending analytics for understanding and investigation.
Enterprise Class AdministrationIntuitive and robust features for installation, maintenance, and configuration
© 2012 IBM Corporation9
Runtime Integration High Level View
DataStageDataStageDataStageDataStage StreamsStreamsStreamsStreams
JobJob JobJob
StreamsStreams ConnectorConnector
StreamsStreams ConnectorConnector
DSSource / DSSource / DSSink DSSink OperatorOperator
DSSource / DSSource / DSSink DSSink OperatorOperator
TCP/IPTCP/IP
Composite operators that wrap existing Composite operators that wrap existing TCPSource/TCPSink operatorsTCPSource/TCPSink operators
© 2012 IBM Corporation10
Streams Application (SPL)
use com.ibm.streams.etl.datastage.adapters::*;
composite SendStrings { type RecordSchema = rstring a, ustring b; graph stream<RecordSchema> Data = Beacon() { param iterations : 100u; initDelay:1.0; output Data : a="This is single byte chars"r, b="This is unicode"u; } () as Sink = DSSink(Data) { param name : "SendStrings"; } config applicationScope : "MyDataStage";}
• When the job starts, the DSSink/DSStage stage registers its name with the SWS nameserver
© 2012 IBM Corporation11
DataStage Job
User adds a Streams Connector and configures properties and columns
© 2012 IBM Corporation12
DataStage Streams Runtime Connector Uses nameserver lookup to establish connection (“name” + “application
scope”) via HTTPS/REST Uses TCPSource/TCPSink binary format Has initial handshaking to verify the metadata Supports runtime column propagation Connection retry (both initial & in process) Supports all Streams types Collection types (List, Set, Map) are represented as a single XML column Nested tuples are flattened Schema reconciliation options (unmatched columns, RCP, etc) Wave to punctuation mapping on input and output Null value mapping
© 2012 IBM Corporation13
Tooling Scenarios
User creates both DataStage job and Streams application from scratch – Create DataStage job in IBM Infosphere DataStage and QualityStage
Designer – Create Streams Application in Streams Studio
User wishes to add Streams analysis to existing DataStage jobs– From Streams Studio create Streams application from DataStage
Metadata User wishes to add DataStage processing to existing Streams application
– From Streams Studio create Endpoint Definition File and import into DataStage
© 2012 IBM Corporation14
Streams to DataStage Import
1. On Streams side, user runs ‘generate-ds-endpoint-defs’ command to generate an ‘Endpoint Definition File’ (EDF) from one or more ADL files
2. User transfers file to DataStage domain or client machine 3. User runs new Streams importer in IMAM to import EDF to StreamsEndPoint model4. Job Designer selects end point metadata from stage. The connection name and columns are
populated accordingly.
EDFEDFEDFEDF
Streams command line or Studio menu
Streams command line or Studio menu
ADLADLADLADLADLADLADLADL
EDFEDFEDFEDF IMAMIMAMXmetaXmeta
FTP
© 2012 IBM Corporation17
DataStage to Streams Import
SPLSPLSPLSPLREST APIREST APIXmetaXmeta
1. On Streams side, user runs ‘generate-ds-spl-code’ command to generate a template application that from a DataStage job definition
2. The command uses a Java API that uses REST to query DataStage jobs in the repository3. The tool provides commands to identify jobs that use the Streams Connector, and to extract the
connection name and column information4. The template job includes a DSSink or DSSource stage with tuples defined according to the DataStage
link definition
Java APIJava API
HTTP
Streams command line or Studio menu
Streams command line or Studio menu