integrating big data technologies in enterprise it nasscom
DESCRIPTION
Integrating big data technologies in enterprise it NasscomTRANSCRIPT
Integrating Big Data Technologies in Your IT
Portfolio
Vineet TyagiVP Technology, Head of Innovation Labs
blogs.impetus.com
Impetus Technologies Inc.
Impetus Proprietary
Outline
Big Data Big Data Technologies and the Ecosystem Transforming your Enterprise Data Warehouse to a Big Data
Warehouse Cost considerations Cloud considerations Operational Support People Aspect of Big Data
Use Case Selection – what where Q&A’s
2
Impetus Proprietary
Big Data
3
2.5 QUINTILLION2.5 quintillion bytes is produced every day
$6 TRILLIONbig data cost IDC/EMC
$650 BILLION$650 Billion / year: cost of wasted productivity due to Information overload.
1ZB 1 Zettabyte: Estimated Internet Traffic by 2015
1800EB
1,800 Exabytes: Size of the digital universe in 2011
90%90% of the data in the world today is less than two years old
1818 Months is the estimated time for the digital universe to double
Impetus Proprietary
Big Data
Not only the original content stored or being consumed but also about the information around its consumption
airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time
New York Stock Exchange collects 1 terabyte of structured trading data per day
Big Data Is Not the Created Content, nor Is It Even Its Consumption — It Is the Analysis of All the Data Surrounding or Swirling Around It
4
Impetus Proprietary
Age of Data
5
Age of Software Age of Data
Data Rich and Information Poor
Impetus Proprietary
Big Data - Technologies
There will be More content devices applications On - Demand Access
Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis
6
Finding answers where there are yet to be questions
Impetus Confidential
Big Data - Technologies
77
VoltDBHadaptClustrixXeround
BIG SQL
MembaseCassandraMongo DBHypertableCouchDB
NOSQL
Innovation of Mixware architecturesNVIDIA GPUCUDA LibraryGPGPUOpenCL standardsSIMD architecture based parallel programming
Programmable Hardware
Alchemi Proactive JPPFGridGainDistributed computationMicrosoft HPC Server
GRID Frameworks
Hadoop, Map-ReduceHBaseHive, Sqoop, Flume etcPIG, PIG LatinMapR, Platform Computing, Pervasive
DataRush
Hadoop Ecosystem
BIG SQLBIG SQL• Machine LearningApache Mahout, R, Weka, OrangeVoldemort, Oozie, Datameer
RainstorTeradata
Storage Optimization
Impetus Proprietary
EDW to Big Data Warehouse - Drivers
BUSINESS DRIVERS Better insight Faster turn-around Accuracy and timeliness
IT DRIVERS Reduce storage cost Reduce data movement Faster time-to-market Standardized toolset Ease of management and operation Security and governance
8
Impetus Proprietary
EDW to Big Data Warehouse
9
Traditional enterprise data models for application, database, and storage resources have grown over the years
cost and complexity of these models has increased along the way to meet the needs of larger sets of data
new models are based on a scaled-out, shared-nothing architecture
One size no longer fits all
Big Data Components Hadoop: Provides storage capability through a distributed, shared-
nothing file system, and analysis capability through MapReduce NoSQL: Provides the capability to capture, read, and update, in
real time, the large influx of unstructured data and data without schemas; examples include click streams, social media, log files, event data, mobility trends, and sensor and machine data
Impetus Proprietary
EDW to Big Data Warehouse
10
Impetus Proprietary
Cost Considerations of Big Data Warehouse
Initial Entry Costs Cost of Experimentation
Cost of Integration and Moving Data Cost of ETL
Query and analytics capability Manageability On-Going Maintenance
Monitoring and Tuning
Changing Capacity Additional Hardware
Cost of Compliance
11
Impetus Proprietary
Lowering TCO of Big Data
Initial Entry Costs Cost of Experimentation – Best Practice Patterns, learn or hire
Cost of Integration and Moving Data Cost of ETL – Remove costly licensed tools, switch to MR for ETL
or ELT
Manageability Provisioning, management tools – You will have more than single
vendor, look for multi-vendor management toolsets like Ankush from Impetus
On-Going Maintenance Monitoring and Tuning – Automate Automate Automate
Changing Capacity Additional Hardware – Do you know the GPU?
12
Impetus Proprietary
Lowering TCO of Big Data
Cost of Storage Compress Data – Rainstor type solutions
Do More with Less Faster MR – MapR type solutions Acunu type solutions for NoSQL
13
Impetus Proprietary
Cloud Considerations in Big Data Warehouse
More “virtual” servers shipped than “physical” servers in 2011 20% of all information running through servers by 2015 will be
doing so on virtualized systems
The challenges for cloud adoption include Data preparation for conversion to cloud Integrated cloud / non-cloud management Service-level agreements and termination strategies Security, backup, archiving, and disaster control strategies Intercountry data transfer and compliance
14
Impetus Proprietary
Operational Support Considerations in Big Data Warehouse
Big Data solution architectures Technology churn. Hadoop is the only constant as a paradigm
Impedance Mismatch : Is your IT organization geared up to transition Big Data technologies into the Enterprise?
Un Solved Challenges Rapidly, automatically or rule based single click provisioning of Big Data
Clusters Measure the boost provided by Clusters/Grids to your business data processing
capabilities. Need to change your choice of cluster software at any point of time when you
feel that it is not sufficiently delivering to your needs Manage big data solution from a single cluster management software umbrella
15
Mutli – Vendor Multi – Technology Cluster Management
Impetus Proprietary
People Aspect : Big Data Warehouse
New Skillsets Needed Data Scientists Developers who can think Parallel
New Definitions for older roles Big Data Administrator ?
Invest in Training
16
Use Case Selection : Big Data Scenarios
Big Input-Small Output
Small Input-Big Output
Big Input-Big Output
Stage 1 Use Case : Introduce Big Data
18
Stage 2 Use Case : Get Bolder
19
Stage 3 Use Case : Sophisticated
20
Impetus
We offer Innovative Product Engineering & Technology R&D Services and Products
Eighteen years of experience, numerous award winning products and success stories
Innovation based differentiated services 1300+ engineers, development centers in India Pioneers in Big Data Consulting Services
Since 2008, 10+ active in-production use cases at large Fortune 100
Products and Tools to help ease Big Data adoption Ankush, Jumbune, iLaDaP
21
Impetus OpenSource Contributions
Kundera (http://code.google.com/p/kundera/): This is an annotation-based Java library for NoSQL databases like Cassandra, Hbase & MongoDB.
Hadoop (http://hadoop-toolkit.googlecode.com): Hadoop Performance Monitoring tool provides an inbuilt solution with an Suggestion Engine for quickly finding performance bottlenecks. A visual representation helps suggest remediation.
Korus (http://code.google.com/p/korus/): This is a parallel and distributed programming framework, which improves the performance and scalability of Java applications.
22
Thank You