the big data imperative: smarter choices for an analytics infrastructure
TRANSCRIPT
The Big Data Imperative:Smarter Choices for an Analytics
InfrastructureJim Williams
Senior Software EngineerIBM Software Group
Text Documents Blogs Web Logs Mfg. Equipment
Email Weather Data Social Media Stock Trades
Text Documents BlogsText Documents Web LogsBlogs
Mfg. Equipment Utility Meters Medical Equip. Call Data Records
Point of Sale Data Video Cameras Audio Devices Oil Rigs
Where is the Big Data Coming From?
Data at restData is stored on disk
Huge volumes of unstructured data
No pre-defined schemas
Too large for traditional tools to process in a timely manner
Data in motion
Data is typically not stored
Tremendous velocity
Multiple data sources
Huge volumes of unstructured data
Ultra low latency required
Operational Data Store
Traditional data sources(ERP, CRM, databases, etc.)
Source data (Web, sensors, logs, media, etc. )
Applications
Big Data Platform: Gain Value From Unstructured Data Sources And Structured Enterprise Data
Unstructured data
Unstructured data
Structured data
Structured data
New Programming Models and Low Cost Hardware For Handling Unstructured Data
Apache Hadoop and InfoSphere Streams
Proven frameworks to process large amounts of data
Hadoop for data at rest, Streams for data in motion
Enable applications to transparently work with large clusters of nodes in parallel
InfoSphere Streams
Clusters of low cost PowerLinux
servers that are ideal for Hadoop
and Streams
Hadoop Cluster
Streams Cluster
Apache HadoopProcessing
StorageInput
Hadoop Cluster
MapReduceJava
Program
Result
Enables applications to transparently work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner
Hadoop node is a processor and disksNodes are combined into clustersInput is parceled out to nodes at load timeMapReduce jobs are sent to nodes at job run time
RR
R RR
Data BlockReplica
Hadoop Distributed File System
Node 1 Node 2 Node 3 Node n
…
A distributed file system that spans all the nodes in a Hadoop clusterFiles are split automatically at load time into blocks and spread among Data NodesElastically scalableAssumes nodes will fail - achieves reliability by replicating data across multiple nodes
Hadoop Distributed File System (HDFS)
Master node processesJobTracker for MapReduceNameNode for HDFS
Slave node processesTaskTrackers for MapReduceDataNode for HDFS
Hadoop Framework Processes
Slave Node
Master Node
TaskTracker Slave Node TaskTracker
NameNode
JobTracker
Slave Node TaskTracker
DataNode DataNodeDataNode
Data Data Data
Slave nodeSlave nodeSlave nodeSlave nodeSlave nodeSlave node
Map and Reduce are steps in the framework that a programmer implements
Hadoop framework orchestrates Map and Reduce steps
MapReduce jobs are sent out to each node to run
View Inside One MapReduce Job
Map Step
MapReduce jobs run in parallel across nodes
The steps process key/value pairs in some way
How the steps manipulate the pairs defines the solution
Key
Value
Value
Value
Framework Processing
Key
Value
MapReduce Programming Model
K3 V3 K2 V2
Key
Value
K4 V4 Reduce
StepInput
HDFSHDFS
BigInsights – Makes It Easy
Web based management console Security enhancements
LDAP authentication Administrator enhancements
Installation and configuration Data import/export tools Monitoring tools
Developer enhancements Eclipse tools Job management tools
Integration enhancements Database/warehouse integration
Business user enhancements Spreadsheet style tool for users
without Java skills
InfoSphere BigInsights Console
Demo: Using BigInsights To Determine Sentiment About A Cabinet Appointment
Approve Disapprove
The Smith nomination as
Secretary of the Exterior is bad
The Smith nomination as
Secretary of the Exterior is bad
Smith is a lousy choice for Exterior
Smith is a lousy choice for Exterior
Smith is a brilliant
choice for Exterior
Secretary
Smith is a brilliant
choice for Exterior
Secretary
Smith would be a great SOE
Smith would be a great SOE
Brilliant choice for Secretary of the Exterior
Love the selection of Smith for Exterior The Smith appointment makes sense Smith would be an excellent Secretary
of the Exterior
Horrible appointment for Exterior Secretary
Lousy choice for Exterior Secretary Don’t like Smith for Secretary of Exterior Smith would be the worst choice for
Secretary of the Exterior
What You Just Saw In The Demo
Large volumes of raw, unstructured data
Valuable insights into citizen sentiment
InfoSphere BigInsights
Key Benefits
Up to 17% lower power/cooling costs than x86 rack servers
Industry standard (Redhat & SUSE) Linux only servers, optimized for POWER architecture
Competitively priced compared to x86 Linux
BigInsights on PowerLinux runs 71% faster than Cloudera on x86
New PowerLinux Servers Ideal For Big Data
More Info: http://www.ibm.com/systems/power/software/linux/powerlinux/bigdata.htmlMore Info: http://www.ibm.com/systems/power/software/linux/powerlinux/bigdata.html
PowerLinux 7R2
•Linux only POWER7
•Two socket, 2U rack
•8 cores/socket•Up to 20 7R2’s per rack
For Additional Information
Visit the Agile Summit Solution Center for demonstrations of these capabilities.
Ask an IBM Ambassador for additional information (case study, white paper, solution brief, etc.) related to the content shared during this session.
For a follow up discussion, complete the IBM Response Card on the table in front of you.
Thank You !
®
© 2012 IBM Corporation
Trademarks and notes
© Copyright IBM Corporation 2012
IBM Corporation
Software Group
3565 Harbor Boulevard
Costa Mesa, CA 92626-1420
U.S.A.
Produced in the United States of America
May 2012
All Rights Reserved
IBM, the IBM logo, ibm.com, and smarter planet are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml
References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. Client success stories are available at
ibm.com/software/success/cssdb.nsf
The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this
documentation, it is provided “as is” without warranty of any kind, express or implied. In addition, this information is-based on IBM’s current product plans and strategy, which are subject to change by
IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this
documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license
agreement governing the use of IBM software.
IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer ’s sole responsibility to obtain advice of competent legal counsel as to the identification and
interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws.