streaming real time data with vibe data stream

20
A Practical Guide to Improving the Big Data Ingestion Process Presented by Alan Lundberg and Amrish Thakkar July 22, 2014

Upload: informaticamarketplace

Post on 05-Dec-2014

140 views

Category:

Data & Analytics


0 download

DESCRIPTION

The process of streaming real-time data from a wide variety of machine data sources and entities can be very complex and unwieldy. Using an agent-based approach, Informatica has invented a new technique and open access product that makes this process much more user friendly and efficient, even when dealing with multiple environments such as Hadoop, Cassandra, Storm, Amazon Kinesis and Complex Event Processing.

TRANSCRIPT

Page 1: Streaming real time data with Vibe Data Stream

A Practical Guide to Improving the Big Data Ingestion Process

Presented by Alan Lundberg and Amrish ThakkarJuly 22, 2014

Page 2: Streaming real time data with Vibe Data Stream

Safe Harbor

The information being provided today is for informational purposes only. The development, release and timing of any Informatica product or functionality described today remain at the sole discretion of Informatica and should not be relied upon in making a purchasing decision. Statements made today are based on currently available information, which is subject to change. Such statements should not be relied upon as a representation, warranty or commitment to deliver specific products or functionality in the future.

Page 3: Streaming real time data with Vibe Data Stream

Informatica MarketplaceA Data Integration Ecosystem

Developers

Partners Consumers

Informatica

• Software, Services Vendors• Strengthen Partnership• Generate Awareness

• Discover Solutions• Evaluate Products• Request Ideas

• Administrators• Architects• Data Analysts• Contribute, Collaborate

• Enable Customers• Engage & Interact• Identify Whitespace

Page 4: Streaming real time data with Vibe Data Stream

Informatica Marketplace1300+ Apps, Add-ons and Services to jump-start your productivity

Data Integration

Mappings, Utilities, Connectors,Code Testing and Deployment, Monitoring, Job Scheduling

Data Quality

Rules & Reference Data, HealthCheck, Accelerators, Services

Cloud

Connectors, Templates, DataLoaders, Plugins, ProcessAutomation, Services

Page 5: Streaming real time data with Vibe Data Stream

The Internet of Thingsis opening a world of opportunities

Cardiac Monitors

Truck Tracking,Load

Water Meter,Electricity Meter

Fridge Supply Levels,Washing Machine Check

Gas Level,Car Indicators

Bus Delays,Engine Checks

Shop Inventory Levels

Dock Loads,Container Checks

Electronic Flight Bags, Luggage Tracking

Crop health Indicators

Info Traffic,Video Surveillance

Page 6: Streaming real time data with Vibe Data Stream

66

GPS Localization

NFC

Chemical Sensors

3D Cameras

Micro bolometersBarometers

Accelerometers

Gyroscopes

Glucometers

Magnetometers

Data / Sensor Diversity…

Page 7: Streaming real time data with Vibe Data Stream

Architectural Implications

Batch processing

Data structured, homogenous High Volume and variety

Distributed SystemsCentralized Database-centric Client Server Systems

Prioritize Modeling events as enterprise objects / assets

Real Time

Yesterday Today

Events treated as 2nd class citizens

Page 8: Streaming real time data with Vibe Data Stream

8

Transactions,OLTP, OLAP

Social Media, Web Logs

Machine Device, Scientific

Documents and Emails

Vibe Data Stream

Vibe Data Stream

Vibe Data Stream

Event Processing Engine

How to make sense of it all…

Page 9: Streaming real time data with Vibe Data Stream

Use Cases – Solving the Difficult Problems

Detect Patterns

ExceptionMonitoring

ProcessMonitoring

• Deviations from norm (Monitoring, Fraud, Error)

• Trending up/down to exceed a threshold

• SLA monitoring

• 3 events within 5 milliseconds• A then B then C occurs• Geospatial processing

• Are process workflows operating properly?

• Are manual processes completed on time?

• Detect Missing Work and Queued Work

Page 10: Streaming real time data with Vibe Data Stream

Architectural Approach for Streaming Analytics

Operational Data (Field Devices, Applications, Clickstream, IoT, logs, etc.)

Location Context

(e.g. GIS)

Event Based Applications

Various Source Applications / Technologies

Data WarehouseHadoop / NoSQL

Analytics

DataIntegration

PowerCenterCDC / DataAccess

CDCPWX

Ultra Messaging

StreamingCollection

Vibe Data Stream

Streaming Analytics RulePointCEP

Real Time Stream Transport / DeliveryUltraMessaging

StreamTransformation

B2B Data Transformation

Power Exchange

Page 11: Streaming real time data with Vibe Data Stream

Streaming Collection: Vibe Data Stream (VDS)

• Distribute collection across one or thousands of endpoints

• High performance/efficient streaming data collection over LAN/WAN

• Available ecosystem of light weight agents (sources & targets)

• Continuous ingestion of real-time generated data (sensors; logs; etc.) to multiple targets (batch/stream processing)

• Perform filtering, transformation, etc. “close to the source”

• Provide varying qualities of service

• Streaming, guaranteed, etc.

• Allow for dynamic configuration

• Highly available and scalable

Page 12: Streaming real time data with Vibe Data Stream

Low latency messaging is the foundation

• The core of Informatica’s Vibe Data Stream is based on the Ultra Messaging platform

• Stream transport is the core of any streaming analytics solution

• Required for key streaming analytics capabilities, including:

• Stream collection

• Stream distribution

• Load distribution and sharing

• Remote connectivity and routing

• Ultra Messaging has been proven in hundreds of low-latency, guaranteed delivery, and fault-tolerant deployments

Page 13: Streaming real time data with Vibe Data Stream

00:00:46: %LINK-3-UPDOWN: Interface Port-channel1, changed state to up

00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/1, changed state to up

00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to up

00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down

00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/1, changed state to down 2

*Mar 1 18:46:11: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

18:47:02: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

*Mar 1 18:48:50.483 UTC: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

00:00:46: %LINK-3-UPDOWN: Interface Port-channel1, changed state to up

00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/1, changed state to up

00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to up

00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down

00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/1, changed state to down 2

*Mar 1 18:46:11: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

18:47:02: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

*Mar 1 18:48:50.483 UTC: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)

10s100s1,000s?Market DataWeb Log Data

Device Data

Sensor Data

Location data

Call Records

Social data

Page 14: Streaming real time data with Vibe Data Stream

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Node

VDS Server

ZooKeeper

logserver1logserver2logserver3logserver4logserver5logserver6logserver7logserver8

?

Page 15: Streaming real time data with Vibe Data Stream

TransformationsTargetsSources

Ecosystem of Sources and Targets

Power Center

B2B Data TX

RulePoint

… and evolving

Page 16: Streaming real time data with Vibe Data Stream

Vibe Data Stream vs Flume

VDS Flume

Architecture Broker-less Non-messaging

Configuration Automatic Manual

Failover Automatic Automatic

Functionality Event Aggregation/ Messaging

Log Aggregation

Recommended QoS Guaranteed Guaranteed

Primary Application Trades/CDRs/logs/ etc.

logs

Monitoring Yes No

Enterprise Product integration

Informatica product line

No

Page 17: Streaming real time data with Vibe Data Stream

Vibe Data Stream performance vs Flume-ng

Vibe Data Stream

Flume

200

20.67

>

10x performance

Test SetupEvent Size: 300 bytesSource Type: Syslog

Number of Sources: 16Target Type: HDFS

Hadoop Cluster: 9-nodeVDS/Flume Nodes: 1

MB/sec

MB/sec

Page 18: Streaming real time data with Vibe Data Stream

• Demo

Page 19: Streaming real time data with Vibe Data Stream

Download Vibe Data Stream Free Today!

• Vibe Data Stream Open Access Download:http://www.marketplace.informatica.com/vds

Page 20: Streaming real time data with Vibe Data Stream

Thank You!Don’t build it. Find it.

marketplace.informatica.com