big data analytics the network is the bottleneck › us › images › 11_jack_norris.pdf ·...
TRANSCRIPT
![Page 1: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/1.jpg)
1/31/2012 ©MapR Technologies - Confidential 1
Big Data Analytics The Network is the Bottleneck
![Page 2: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/2.jpg)
1/31/2012 ©MapR Technologies - Confidential 2
Data Volume Growing 44x
2020: 35.2
Zettabytes
2010:
1.2
Zettabytes
Data is Growing Faster than Moore’s Law
Business Analytics Requires a New Approach
Source: IDC Digital Universe Study, sponsored by EMC, May 2010
IDC Digital Universe
Study 2011
![Page 3: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/3.jpg)
1/31/2012 ©MapR Technologies - Confidential 3
The Next Generation Distribution
• Complete Distribution for Apache Hadoop
• Integrated, tested, hardened
• Supported
• 100% Hadoop, HBase, HDFS API compatible
• Unique advanced features
• No changes required to Hadoop applications
• Runs on commodity hardware
![Page 4: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/4.jpg)
1/31/2012 ©MapR Technologies - Confidential 4
Innovations of Next Generation Distribution
• High Availability Architecture • Snapshots • Mirroring
• NFS Access • Graphical Management
• Speed jobs by more than 2X • Save $$$ on hardware
![Page 5: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/5.jpg)
1/31/2012 ©MapR Technologies - Confidential 5
Importance of File-based Access
File Browsers
Access Directly “Drag & Drop”
Random Read Random Write
Log directly
grep
sed
sort
tar
Standard Linux Commands & Tools
Applications
Hadoop Cluster
![Page 6: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/6.jpg)
1/31/2012 ©MapR Technologies - Confidential 6
High Availability and Data Protection
MapR Distribution
Hive Pig Oozie Sqoop Plume HBase
Mahout Cascading Nagios
Integration
Ganglia
Integration Flume More
MapReduce
MapR’s Lockless Storage Services ™
Distributed NameNode HA™
JobTracker HA ™
• High availability
• Stateful failover
• Unlimited number of files
A B D D’
Data Blocks
Active Files Snapshots
C
• Recover from app or user errors
• Zero performance loss on write
• Easy recovery with drag and drop
![Page 7: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/7.jpg)
1/31/2012 ©MapR Technologies - Confidential 7
File Create Benchmark
Out of box
Testing completed on 10 node cluster, 2x Quad-Core, 24G DRAM 12 x 1TB SATA Drives @ 7200 rpm
MapR Distribution
Standard Distributions
Out of box
Tuned
Total Files (M)
![Page 8: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/8.jpg)
1/31/2012 ©MapR Technologies - Confidential 8
MapR Performance Advantages
YCSB on HBase (higher is better)
Terasort (lower is better)
10 node cluster, 2x Quad-Core, 24G DRAM
12 x 1TB SATA Drives @ 7200 rpm, Quad NICs
Elap
sed
tim
e in
min
ute
s
Rec
ord
Inse
rts
per
sec
(0
00
s)
0
50
100
150
200
250
MapR
Other
3.5 TB 0
100
200
300
400
500
600
WAL Off WAL On
![Page 9: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages](https://reader036.vdocument.in/reader036/viewer/2022070814/5f0dd5017e708231d43c4e99/html5/thumbnails/9.jpg)
1/31/2012 ©MapR Technologies - Confidential 9 9