consistent regions in specialized toolkits for ibm infosphere streams v4.0
Post on 07-Aug-2015
77 Views
Preview:
TRANSCRIPT
© 2015 IBM Corporation
Consistent Region in Specialized
Toolkits
IBM InfoSphere Streams 4.0
Samantha Chan
Team Lead, Streams Toolkits Team
For questions about this presentation contact: chanskw@ca.ibm.com
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
Requirements for operators to participate in a consistent region
Compile Errors and Warnings
Specialized Toolkits Support
4 © 2015 IBM Corporation
Three Kinds of Operators in a Consistent Region
Start Operator– Allow user to start a consistent region at this operator
– Operator can persist states into checkpoint• Internal States: state variables, windows, fields that can change over time
• External States: external models, file systems, DBs, etc.
– Operator can restore states upon reset
– Source operator that can replay tuples upon reset
Start Operator
5 © 2015 IBM Corporation
Three Kinds of Operators in a Consistent Region
Middle Operator– Processing operator that can participate in a consistent region
– Operator can persist states into checkpoint • Internal States: state variables, windows, fields that can change over time
• External States: external models, file systems, DBs, etc.
– Operator can restore states upon reset
Middle Operator
6 © 2015 IBM Corporation
Three Kinds of Operators in a Consistent Region
End Operator– Represents the end of a consistent region
– Can be a sink operator with no output port
– Can be annotated as Autonomous
– Operator can persist states into checkpoint • Internal States: state variables, windows, fields that can change over time
• External States: external models, file systems, DBs, etc.
– Operator can restore states upon reset
– Writing duplicated tuples to external systems have no detrimental /
unexpected effect
End Operator
7 © 2015 IBM Corporation
Compile Errors and Warnings
For each operator– We determine if the operator can participate in a consistent region
• We will provide a compile warning/error if an operator cannot be part of a consistent
region or start of region
CDISP9163W WARNING: The following operator is not supported in a
consistent region:
com.ibm.streams.timeseries.modeling::AutoForecaster2. The operator
does not checkpoint or reset its internal state. If an application
failure occurs, the operator might produce unexpected results even
if it is part of a consistent region.
8 © 2015 IBM Corporation
Adding Consistent Region Support in Operator
For operators that can participate in a consistent region, we did the
following:
– Become a StateHandler – to be called by the runtime to drain -> checkpoint -> reset
– Drain – called before checkpoint is called. Empties all internal buffer and submits any
pending tuples
– Checkpoint – persisting operator internal states upon checkpoint
– Reset – upon application failure, reset the operator internal states to the checkpoint
states
– Reset to Initial – called if an application failure is detected before the first checkpoint
can be taken.
9 © 2015 IBM Corporation
Toolkits Support of Consistent Region
Consistent Region Support is added to the following toolkits:
– Cep
– Data Explorer
– DB
– RProject
– Rules
– Text
– Hbase
– HDFS
– Messaging
Consistent Region is not supported by the following toolktis:– Geospatial – plan to enable consistent region in a future release
– Timeseries – plan to enable consistent region in a future release
– Financial
– Mining
– Inet
10 © 2015 IBM Corporation
Consistent Region Changes for Specialized Toolkits
For details on consistent region behavior for an operator, refer to its
SPLDoc
11 © 2015 IBM Corporation
R Toolkit
Rscript operator – Spawns off a new process for an R session
– Execute R-Script
– Parses output from R session and submit as tuples
Rscript - Can participate in a consistent region, but cannot be a
start operator.
Operator does not have states, but R-Environment that executes
the R-scripts have states that can change during the lifetime of the
operator.
Checkpoint – save R environment to a file in the data directory
Reset – Call R to restore an R environment from file
Files are deleted when checkpoints are retired.
12 © 2015 IBM Corporation
Messaging Toolkit
Supported in Consistent Region:– JMSSink
– KafkaProducer
– MQTTSink
– XMSSink
Can participate in consistent
region
Cannot be start of a consistent
region.
MQTTSink
– Control input port not supported in
consistent region
– Messages with Qos=1 or Qos=2 will be
delivered to an MQTT provider at least
once.
– Messages with Qos=1 can still be lost as
messages can be lost in transit
Disallowed to be in a Consistent
Region:– JMSSource
– KafkaConsumer
– MQTTSource
– XMSSource
To enable consistent region,
use ReplayableStart operator
13 © 2015 IBM Corporation
HDFS Toolkit
Toolkit is supported in a consistent region.
HDFS2DirectoryScan– Scans directory from HDFS and submits filenames as output
Consistent Region Behavior– Can be the start operator of a consistent region if there is no input port
– Drain – do nothing, operator has no internal buffer
– Checkpoint – saves the last submitted filename and its modification
timestamp.
– Reset – restores the last submitted filename and modification timestamp
– When processing resumes:• Find all files on the file system
• Will only submit filenames that have not been submitted since the checkpoint
• This algorithm allows us to support exactly-once processing
14 © 2015 IBM Corporation
HDFS Toolkit
HDFS2FileSource– Reads file content from HDFS and submits as output
Consistent Region behavior:
• Can be the start operator if it does not have an input port.
• Supports both operator-driven and periodic policy.
• If operator driven, a checkpoint is established after the file is fully
read.
• Drain - the operator flushes internal buffer.
• Checkpoint - the operator saves the current filename and file cursor
location
• Reset - the operator resets cursor location and start reading again
when processing is resumed.
• This allows operator to supports exactly once processing as
content submitted before the checkpoint will not be sent again.
15 © 2015 IBM Corporation
HDFS Toolkit
HDFS2FileSink– Writes data to files in HDFS
Consistent Region Behavior:– HDFS must be configured properly with APPEND enabled
– Cannot support exactly once processing because HDFS does not support
random write.
– Drain – flushes any internal buffer and write content to file system. The
operator will force a flush of content from HDFS client as well.
– Checkpoint - saves the current filename, filesize, tuple count, file number, etc,
to checkpoint.
– Reset – operator closes the current file. Resets all the various counters and
properties. It will regenerate the filename and open the file in APPEND
mode.
– When processing is resumed, content will be appended at the end of the file
being reset to.
16 © 2015 IBM Corporation
Questions?
top related