telecommunications event data analytics for ibm infosphere streams v4.0
TRANSCRIPT
© 2015 IBM Corporation
Telecommunications Event Data Analytics
IBM InfoSphere Streams Version 4.0
Mark-Oliver Heger, Paul Zollna
IBM Research & Development
For questions about this presentation contact
Mark-Oliver Heger [email protected]
Paul Zollna [email protected]
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
Toolkit Overview
Project setup wizard installation
Sample application demonstration
Mediation use case demonstration (ASN.1 to CSV)
4 © 2015 IBM Corporation
High-Level Overview
The Telecommunications Event Data Analytics toolkit provides
a set of generic operators that are used in telecommunications applications
an application framework that enables you to setup new file-to-file applications
Connect to various input sources and downstream applications
Speed up custom implementations and reduce development & test efforts
Utility Functions and Operators
DB Loader *
File / Directory Operators
Application Framework
Lookup & Enrichment Data
Input Data Files Output Data Files
NFS, GPFS, HDFS
DB Loader
NFS, GPFS
Files with Call Detail Recordsin mounted directories
Priority Handling Queue
Data Source Adapters Target System Adapters
Parser:· ASN.1· Structure· CSV
File Writer
(S)FTP Operators *
Metadata checkpoints
* Release via GitHubOperator GUISetup WizardCheat Sheets
5 © 2015 IBM Corporation
High-Level Overview (2)
Applications are based on code templates
The applications support
Customization
Configurable parallel processing
Graceful application shutdown
Reliable file processing
Utility Functions and Operators
DB Loader *
File / Directory Operators
Application Framework
Lookup & Enrichment Data
Input Data Files Output Data Files
NFS, GPFS, HDFS
DB Loader
NFS, GPFS
Files with Call Detail Recordsin mounted directories
Priority Handling Queue
Data Source Adapters Target System Adapters
Parser:· ASN.1· Structure· CSV
File Writer
(S)FTP Operators *
Metadata checkpoints
* Release via GitHubOperator GUISetup WizardCheat Sheets
6 © 2015 IBM Corporation
Revenue Assurance and Business Intelligence applications
Location based services
Campaign management
User experience, user behavior & statistics
Network and services usage
Fraud detection
Utility Functions and Operators
DB Loader *
File / Directory Operators
Application Framework
Lookup & Enrichment Data
Input Data Files Output Data Files
NFS, GPFS, HDFS
DB Loader
NFS, GPFS
Files with Call Detail Recordsin mounted directories
Priority Handling Queue
Data Source Adapters Target System Adapters
Parser:· ASN.1· Structure· CSV
File Writer
(S)FTP Operators *
Metadata checkpoints
* Release via GitHubOperator GUISetup WizardCheat Sheets
Use cases
7 © 2015 IBM Corporation
Toolkit structure
Operators and functions
Parser operators Utility operators & functions
BloomFilter
ExceptionCatcher
ScheduledBeacon
createDirectory()
rename()
ASN1Parse
CSVParse
StructureParse
CSVParse
parses an input line with comma separated values and assigns the fields to output
tuple attributes
StructureParse
parses a binary data stream that contains fixed-length binary data structures, extracts
the specified data fields, and sends the fields as tuples to downstream operators
ASN1Parse
parses a binary data stream that contains ASN.1-encoded data, extracts parts of the
data, and sends the data as tuples to downstream operators
8 © 2015 IBM Corporation
Toolkit structure (2)
Operators and functions
Parser operators Utility operators & functions
BloomFilter
ExceptionCatcher
ScheduledBeacon
createDirectory()
rename()
ASN1Parse
CSVParse
StructureParse
ExceptionCatcher
catches exceptions from fused downstream operators and reports these exceptions
ScheduledBeacon
utility source that generates tuples at the configured time
BloomFilter
detects duplicate tuples in a memory efficient way
9 © 2015 IBM Corporation
Toolkit structure (3)
Operators and functions
Parser operators Utility operators & functions
BloomFilter
ExceptionCatcher
ScheduledBeacon
createDirectory()
rename()
ASN1Parse
CSVParse
StructureParseGenericParsers
File UtilityFunctions
19 SampleApplications
10 © 2015 IBM Corporation
Toolkit structure (4)
Application framework
Operators and functions
Multi-levelde-duplication
Multi-stageLookups
DemoApplication
ETL & CampaignManagement
DataIntegrity
MultiThreading
Setup
wizardConfigurable &
customizable
applications
Monitoring
GUI
Parser operators Utility operators & functions
BloomFilter
ExceptionCatcher
ScheduledBeacon
createDirectory()
rename()
ASN1Parse
CSVParse
StructureParseGenericParsers
File UtilityFunctions
19 SampleApplications
11 © 2015 IBM Corporation
Ingest, transform, enrich data records for downstream applications
Application Framework
12 © 2015 IBM Corporation
ITE application - File processing - ingest filenames
Directory
Scan
Chain
Split
File
Reader
Chain
Control
output
output/rejected
output/load
output/statistics
Reject
Writer
Statistic
Writer
input
input/archive
input/failed
Output
filesystem
Input
filesystem
Chain
Finalizer
Record
ValidatorTransform
Lookup/
EnrichFile
Writer
Lookup
Data
Shared
Memory
Filetype
Validator
Filename
Dedup
• Scans files in one or more directories
• Duplicate filenames are moved to “duplicates“ directory
Parallel file
processing
per channel
• Distributes “file-info“ tuples to the file processing channels
13 © 2015 IBM Corporation
ITE application - File processing – data streaming
Directory
Scan
Chain
Split
File
Reader
Chain
Control
output
output/rejected
output/load
output/statistics
Reject
Writer
Statistic
Writer
input
input/archive
input/failed
Output
filesystem
Input
filesystem
Chain
Finalizer
Record
ValidatorTransform
Lookup/
EnrichFile
Writer
Lookup
Data
Shared
Memory
Filetype
Validator
Filename
Dedup
• FileReader parses the file and generates data tuples for
enrichment and transformation
Parallel file
processing
per channel
• ChainControl ensures that one file is processed after another
14 © 2015 IBM Corporation
ITE application - File processing – closing
Directory
Scan
Chain
Split
File
Reader
Chain
Control
output
output/rejected
output/load
output/statistics
Reject
Writer
Statistic
Writer
input
input/archive
input/failed
Output
filesystem
Input
filesystem
Chain
Finalizer
Record
ValidatorTransform
Lookup/
EnrichFile
Writer
Lookup
Data
Shared
Memory
Filetype
Validator
Filename
Dedup
Parallel file
processing
per channel
• Depending on file processing result, the input file is moved to
archive or failed directory
• Statistic tuple is generated when processing is completed
• File statistics are written to file
15 © 2015 IBM Corporation
ITE application - File processing
Directory
Scan
Chain
Split
File
Reader
Chain
Control
output
output/rejected
output/load
output/statistics
Reject
Writer
Statistic
Writer
input
input/archive
input/failed
Output
filesystem
Input
filesystem
Chain
Finalizer
Record
ValidatorTransform
Lookup/
EnrichFile
Writer
Lookup
Data
Shared
Memory
Filetype
Validator
Filename
Dedup
Parallel file
processing
per channel
16 © 2015 IBM Corporation
Shared memory on multiple hostsHost A Host B Host n
common::LookupManagerMain
demoapp::ITEMain
sample::ITEMain
Streams Job
Shared
Memory
Shared
Memory
Telecommunications applications
require very high throughput (millions
of records/second) and reference-
data lookup functionality
The application framework provides
shared memory functionality for
common, cross-server lookup tables
that can be used by multiple jobs –
efficiently and cost-effectively.
Streams PE Streams PE (SHM READ)
Streams PE (SHM WRITE)
...
... ...
... ...
...
...
...
Scalability – share lookup data across hosts
17 © 2015 IBM Corporation
The user can configure one of three storage types:
tableFile - one input file can result in many output files prepared to be loaded into a
database by the DBLoader application (available on GitHub)
recordFile - one input file results in one output file (simple mediation use case)
custom - user can plugin customized sink logic
File Writer configurations
18 © 2015 IBM Corporation
Proven, configurable, ready-to-use
framework for high-performance file
processing applications
Facilitates implementation of customer-
specific usecases processing telco
network data
Value-added operators and functions on
top of Streams standard, e.g. Lookup
Manager using shared memory
Setup Wizard (Eclipse plugin and script)
Application Framework - Summary
19 © 2015 IBM Corporation
Setup
Create projects using Streams Studio Wizard or using a command line tool
Configure
• Configure the application using configuration files to enable or disable features and to definelookup stores for data enrichment
Customize
Add custom SPL code to template operators
Add operators to template composites
Implement your business logic
Workflow to develop applications
20 © 2015 IBM Corporation
Providing the
operator
with a display of
health status, metrics
and statistics at a
glance
Demo Java
application
Data access via REST
Can be extended in
projects
Monitoring GUI
21 © 2015 IBM Corporation
Live demo - begin
Preparation steps for the toolkit
Create applications based on the framework LookupManager
ITE
Show customizing of Lookup Manager application
Show file processing of sample files
Show monitoring of applications
24 © 2015 IBM Corporation
No grouping used
- Reads and parsesinput files
- Business logic: enrich, transform tuples
Writes output files
Scans one or more directories for input files
Each chain processes one file after another.The more chains are configured the more files can be processed in parallel
Example use case:
Files are converted from
ASN.1 format to CSV format
No logic across file
boundaries required
ITE application - variant A
25 © 2015 IBM Corporation
Every single data entity (tuple) determines the group
Example use case:
Aggregate on transformed tuples across files per group (Campaign Management)
Each group represents a range of MSISDN numbers
ITE application - variant B
26 © 2015 IBM Corporation
The filename determines the group
Example use case:
Aggregate on transformed tuples across files per group (Campaign Management)
Each group represents a network element ID. Identifiers are part of the filename.
ITE application - variant C
27 © 2015 IBM Corporation
LookupMgrCustomizing.xml sample
<Application ApplicationNamespace="ite.workshop">
<CommandMappings>
<CommandMapping LookupCommand="init">
<SegmentName>DimMaster1</SegmentName>
<SegmentName>DimMaster2</SegmentName>
</CommandMapping>
<CommandMapping LookupCommand="update">
<SegmentName>DimMaster1</SegmentName>
<SegmentName>DimMaster2</SegmentName>
</CommandMapping>
</CommandMappings>
…
Application name
defined by namespace
Supported command type
LookupMgrCustomizing.xml
Repository segments
32 © 2015 IBM Corporation
Mediation use case demonstration (ASN.1 to CSV)
Customize the File Reader of the ITE application
ITE
ASN.1 input files CSV output files
33 © 2015 IBM Corporation
Statistics
Control
IngestFiles
Context
ChainDirScan
FileType Validator
ApplCtrl Scheduler
LogWriter
Dedup
Filename Dedup
ChainProcessorReader
ChainSink
ChainControl
ChainProcessorTransformer
PreFile Reader
RejectFileWriter
File Writer
Validator
Business Logic / Transform / EnrichTuple Group Split
Taps
Post Transformer
Tap
PostContext Processor
Tap
Chain Finalizer
(Files Mover)
Chain Split
File GroupSplit
Context Custom
FileReaderFileReader
Converter
ContextRestore Writer
PostContext Processor
Checkpoint Control
Legend Custom optionalCustomCommon Common or Custom Variant CVariant B
ITE application
ChainProcessorReader
ChainControl
PreFile Reader
ValidatorFileReaderFileReader
FileReaderASN1
34 © 2015 IBM Corporation
Live demo - begin
Create ITE application based on the framework
Configure ITE application project
Prepare the schema for the ASN.1 parser
Customize the File Reader
Build and launch application
Review output files