tr nimble hadoop best practices guide v2uploads.nimblestorage.com/wp-content/uploads/2015/... ·...
TRANSCRIPT
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1
BEST PRACTICES GUIDE
Nimble Storage for Hadoop 2.x on Oracle Linux and Red Hat Enterprise Linux 6
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 2
Document Revision
Table 1Table 1Table 1Table 1.
Date Revision Description
9/5/2014 1.0 Initial Draft
11/17/2014 1.1 Updated iSCSI & Multipath
THIS TECHNICAL TIP IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN
TYPOGRAPHICAL ERRORS AND TECHNICAL INACCUURACIES. THE CONTENT IS PROVIDED AS IS,
WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.
Nimble Storage: All rights reserved. Reproduction of this material in any manner whatsoever
without the express written permission of Nimble is strictly prohibited.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 3
Table of Contents
Introduction ................................................................................................................................................................................. 4
Audience ...................................................................................................................................................................................... 4
Scope ........................................................................................................................................................................................... 4
Nimble Storage Features .......................................................................................................................................................... 5
Nimble Benefits for Hadoop ..................................................................................................................................................... 6
Nimble Recommended Settings for Hadoop Nodes ............................................................................................................. 6
Creating Nimble Volumes for Hadoop HDFS ........................................................................................................................10
Nimble Reference Architecture .............................................................................................................................................12
Hadoop2.x Recommended Settings for Nimble Storage ...................................................................................................13
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 4
Introduction
The purpose of this technical white paper is to walk through the step-by-step for setting up and tuning Linux operating
system for Hadoop running on Nimble Storage.
Audience
This guide is intended for Hadoop solution architects, storage engineers, system administrators and IT managers who
analyze, design and maintain a robust Hadoop environment on Nimble Storage. It is assumed that the reader has a
working knowledge of iSCSI SAN network design, and basic Nimble Storage operations. Knowledge of Oracle Linux and
Red Hat Enterprise Linux is also required.
Scope
Most traditional Hadoop implementations today use local JBOD for storage. The main reason is because of how Hadoop
started out with leveraging cheap commodity servers. Today, Hadoop implementations of large enterprises can consist
of hundreds or thousands of nodes and each node consists of one or more local disk drives. For high availability,
Hadoop utilizes its replication feature for nodes failure.
During the design phase for a new Hadoop implementation, Architects and Storage Administrators often times work
together to come up with the best server and storage needs. They have to consider the number of compute nodes and
storage requirements to facilitate high performance, high availability, and capacity.
This white paper explains the Nimble technology and how it can help lower the TCO of your Hadoop environment and still
achieve the performance required. This paper also discusses the best practices for implementing Linux operating
system for Hadoop on Nimble Storage.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 5
Nimble Storage Features
Cache Accelerated Sequential Layout (CASL™)
Nimble Storage arrays are the industry’s first flash-optimized storage designed from the ground up to maximize
efficiency. CASL accelerates applications by using flash as a read cache coupled with a write-optimized data
layout. It offers high performance and capacity savings, integrated data protection, and easy lifecycle
management.
Flash-Based Dynamic Cache
Accelerate access to application data by caching a copy of active “hot” data and metatdata in flash for reads.
Customers benefit from high read throughput and low latency.
Write-Optimized Data Layout
Data written by a host is first aggregated or coalesced, then written sequentially as a full stripe with checksum
and RAID parity information to a pool of disk; CASL’s sweeping process also consolidates freed up disk space
for future writes. Customers benefit from fast sub-millisecond writes and very efficient disk utilization
Inline Universal Compression
Compress all data inline before storing using an efficient variable-block compression algorithm. Store 30 to 75
percent more data with no added latency. Customers gain much more usable disk capacity with zero
performance impact.
Instantaneous Point-in-Time Snapshots
Take point-in-time copies, which do not require data to be copied on future changes (redirect-on-write). Fast
restores without copying data. Customers benefit from a single, simple storage solution for primary and
secondary data, frequent and instant backups, fast restores and significant capacity savings.
Efficient Integrated Replication
Maintain a copy of data on a secondary system by only replicating compressed changed data on a set schedule.
Reduce bandwidth costs for WAN replication and deploy a disaster recovery solution that is affordable and easy
to manage.
Zero-Copy Clones
Instantly create full functioning copies or clones of volumes. Customers get great space efficient and
performance on cloned volumes, making them ideal for test, development, and staging Oracle databases.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 6
Nimble Benefits for Hadoop
With today data types such as sensors data, web logs, social data, and other exhaust data considered too
expensive to store and analyze in a traditional RDBMS, Hadoop has become a mean to do just that in a cost-
effective manner. Hadoop utilizes off-the-shelf commodity servers and local JBOD to store and perform
analytics of that data for businesses.
These data types are often unstructured in nature and the amount can be vast. Having local JBOD to store
these data requires adding additional server. These JBODs don’t have the intelligence to compress the data for
space savings and thus, as the data grows, more nodes are needed. Since capacity is the only requirement,
adding nodes to a Hadoop cluster also adds compute power which could potentially increase operating
expenses such as power and cooling.
Nimble Storage features for Hadoop:
• Compression
• Caching
• Data Protection such as Replication and Snapshot
• Price/Performance
• Higher Density to lower TCO
• Sequential I/O by MBPS
• Random I/O by IOPS
Having Nimble Storage as a storage device in a Hadoop cluster provides space savings, performance, and data
protection. When running MapReduce jobs, not all I/Os are sequential in nature. There is quite a bit of
randomness which can be beneficial on a Nimble array since all random I/Os will be served from flash. Aside
from performance gain, the Nimble in-line compression feature provides space saving anywhere from 1.5x to 2x
depending on the data type. This will allow more storage without adding compute nodes, hence, reducing
power and cooling costs.
Another benefit for Hadoop is the data protection. All nodes in a Hadoop cluster such as NameNode or
DataNode can fail. Leveraging the Nimble snapshot feature to take backups of the NameNode is critical in a
Hadoop deployment. The NameNode is the centerpiece of a HDFS file system. It keeps the directory tree of all
files in the file system and tracks where across the cluster the file data is kept. If a snapshot is available,
recovering a NameNode in a cluster can be done in a matter of seconds. In addition to snapshot, Nimble also
offers replication which can be used to replicate data to another data center for disaster recovery. Our Mother
Nature can be unpredictable sometimes so planning for DR is critical.
Nimble Recommended Settings for Hadoop Nodes
Nimble Array
• Nimble OS should be at least 2.1.4 on either CS500 or CS700 series array
Hadoop Cluster Nodes
• Nimble Storage highly recommends all Hadoop cluster nodes to have CPU speed of 2.9GHz or higher.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 7
Linux Operating System
• iSCSIiSCSIiSCSIiSCSI Timeout and Performance SettingsTimeout and Performance SettingsTimeout and Performance SettingsTimeout and Performance Settings
Understanding the meaning of these iSCSI timeouts allows administrators to set these timeouts appropriately. These iSCSI timeouts parameters in the /etc/iscsi/iscsi.conf file should be set as follow:
node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 node.session.nr_sessions = 4
node.session.cmds_max = 2048
node.session.queue_depth = 1024 = = = NOP= = = NOP= = = NOP= = = NOP----Out Interval/Timeout = = =Out Interval/Timeout = = =Out Interval/Timeout = = =Out Interval/Timeout = = = node.conn[0].timeo.noop_out_timeout = [ value ] iSCSI layer sends a NOP-Out request to each target. If a NOP-Out request times out (default - 10 seconds), the iSCSI layer responds by failing any running commands and instructing the SCSI layer to requeue those commands when possible. If dm-multipath is being used, the SCSI layer will fail those running commands and defer them to the multipath layer. The mulitpath layer then retries those commands on another path. If dm-multipath is not being used, those commands are retried five times (node.conn[0].timeo.noop_out_interval) before failing altogether. node.conn[0].timeo.noop_out_interval [ value ] Once set, the iSCSI layer will send a NOP-Out request to each target every [ interval value ] seconds. = = = SCSI Error= = = SCSI Error= = = SCSI Error= = = SCSI Error Handler = = =Handler = = =Handler = = =Handler = = = If the SCSI Error Handler is running, running commands on a path will not be failed immediately when a NOP-Out request times out on that path. Instead, those commands will be failed after replacement_timeout seconds. node.session.timeo.replacement_timeout = [ value ] ImportantImportantImportantImportant: Controls how long the iSCSI layer should wait for a timed-out path/session to reestablish itself before failing any commands on it. The recommended setting of 12The recommended setting of 12The recommended setting of 12The recommended setting of 120 seconds above 0 seconds above 0 seconds above 0 seconds above allows ample time for controller allows ample time for controller allows ample time for controller allows ample time for controller failoverfailoverfailoverfailover. Default is 120 seconds.
NoteNoteNoteNote: If set to 120 seconds, IO will be queued for 2 minutes before it can resume. The “1 queue_if_no_path1 queue_if_no_path1 queue_if_no_path1 queue_if_no_path” option in /etc/multipath.conf sets iSCSI timers to immediately defer commands to the multipath layer. This setting prevents IO errors from propagating to the application; because of this, you can set replacement_timeout to 60-120 seconds.
NoteNoteNoteNote: Nimble Storage strongly recommends using dm-multipath for all volumes.
• MultipathMultipathMultipathMultipath cccconfigurationsonfigurationsonfigurationsonfigurations
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 8
The multipath parameters in the /etc/multipath.conf file should be set as follow in order to sustain a failover.
Nimble recommends the use of aliases for mapped LUNs
defaults { user_friendly_names yes find_multipaths yes } devices { device { vendor "Nimble" product "Server" path_grouping_policy group_by_serial path_selector "round-robin 0" features "1 queue_if_no_path" path_checker tur rr_min_io_rq 10 rr_weight priorities failback immediate } } multipaths { multipath { wwid 20694551e4841f4386c9ce900dcc2bd34 alias hdfs-vol1 } }
• Disk IO SchedulerDisk IO SchedulerDisk IO SchedulerDisk IO Scheduler
IO Scheduler needs to be set at “noop”
To set IO Scheduler for all LUNs online, run the below command. NoteNoteNoteNote: multipath must be setup first before
running this command. Any additional LUNs added or server reboot will not automatically change to this
parameter. Run the same command again if new LUNs are added or a server reboot.
[root@mktg04 ~]# multipath -ll | grep sd | awk -F":" '{print $4}' | awk '{print $2}' | while read LUN; do echo
noop > /sys/block/${LUN}/queue/scheduler ; done
To set this parameter automatically, append the below syntax to /etc/grub.conf file under the kernel line.
elevator=noop
• CPU ScalingCPU ScalingCPU ScalingCPU Scaling GovernorGovernorGovernorGovernor
CPU Scaling Governor needs to be set at “performance”
To set the CPU scaling governor, run the below command.
[root@mktg04 ~]# for a in $(ls -ld /sys/devices/system/cpu/cpu[0-9]* | awk '{print $NF}') ; do echo
performance > $a/cpufreq/scaling_governor ; done
NoteNoteNoteNote: The setting above is not persistence after a reboot; hence the command needs to be executed when the
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 9
server comes back online. To avoid running the command after a reboot, place the command in the
/etc/rc.local file.
• iSCSI iSCSI iSCSI iSCSI Data NetworkData NetworkData NetworkData Network
Nimble recommends using 10GbE iSCSI for all databases.
2 separate subnets
2 x 10GbE iSCSI NICs
Use jumbo frames (MTU 9000) for iSCSI networks
Example of MTU setting for eth1: DEVICE=eth1 HWADDR=00:25:B5:00:00:BE TYPE=Ethernet UUID=31bf296f-5d6a-4caf-8858-88887e883edc ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=static IPADDR=172.18.127.134 NETMASK=255.255.255.0 MTU=9000 To change MTU on an already running interface: [root@bigdata1 ~]# ifconfig eth1 mtu 9000
• /etc/sysctl.conf /etc/sysctl.conf /etc/sysctl.conf /etc/sysctl.conf
net.core.wmem_max = 16780000
net.core.rmem_max = 16780000
net.ipv4.tcp_rmem = 10240 87380 16780000
net.ipv4.tcp_wmem = 10240 87380 16780000
Run sysctl –p command after editing the /etc/sysctl.conf file.
• max_sectors_kb max_sectors_kb max_sectors_kb max_sectors_kb
Change max_sectors_kb on all volumes to 1024 (default 512).
To change max_sectors_kb to 1024 for a single volume: [root@bigdata1 ~]# echo 1024 > /sys/block/sd?/queue/max_sectors_kb Change all volumes: multipath -ll | grep sd | awk -F":" '{print $4}' | awk '{print $2}' | while read LUN do echo 1024 > /sys/block/${LUN}/queue/max_sectors_kb
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 0
done
NoteNoteNoteNote: To make this change persistent after reboot, add the commands in /etc/rc.local file.
• VM dirty writeback and expireVM dirty writeback and expireVM dirty writeback and expireVM dirty writeback and expire
Change vm dirty writeback and expire to 100 (default 500 and 3000 respectively)
To change vm dirty writeback and expire: [root@bigdata1 ~]# echo 100 > /proc/sys/vm/dirty_writeback_centisecs [root@bigdata1 ~]# echo 100 > /proc/sys/vm/dirty_expire_centisecs
NoteNoteNoteNote: To make this change persistent after reboot, add the commands in /etc/rc.local file.
Creating Nimble Volumes for Hadoop HDFS
Table 1Table 1Table 1Table 1:
File TypeFile TypeFile TypeFile Type NumberNumberNumberNumber of Volumesof Volumesof Volumesof Volumes # of mountpoints# of mountpoints# of mountpoints# of mountpoints OS OS OS OS File System File System File System File System Nimble Nimble Nimble Nimble Storage Storage Storage Storage
CachCachCachCaching Policying Policying Policying Policy
Nimble Nimble Nimble Nimble Block Size Block Size Block Size Block Size
SettingSettingSettingSetting
HDFS
volumes
4 - system with 8 cores
8 - system with 16 cores
or more
One per disk EXT4 Yes 32KB
Example of 8 HDFS volumesExample of 8 HDFS volumesExample of 8 HDFS volumesExample of 8 HDFS volumes
[hduser@bigdata1 ~]$ df -h
/dev/mapper/hdfs1 493G 30G 438G 7% /hdfs1
/dev/mapper/hdfs2 493G 30G 438G 7% /hdfs2
/dev/mapper/hdfs3 493G 29G 439G 7% /hdfs3
/dev/mapper/hdfs4 493G 29G 439G 7% /hdfs4
/dev/mapper/hdfs5 493G 30G 439G 7% /hdfs5
/dev/mapper/hdfs6 493G 30G 438G 7% /hdfs6
/dev/mapper/hdfs7 493G 30G 438G 7% /hdfs7
/dev/mapper/hdfs8 493G 30G 439G 7% /hdfs8
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 1
EXT4 File System
When creating an EXT file system on a logical volume, the stridestridestridestride and stripestripestripestripe----widthwidthwidthwidth options must be used.
For example:
stride=2,stripe-width=16 (for Nimble performance policy 8KB block size with 8 volumes) stride=4,stripe-width=32 (for Nimble performance policy 16KB block size with 8 volumes) stride=8,stripe-width=64 (for Nimble performance policy 32KB block size with 8 volumes)
NoteNoteNoteNote: The stripe-width value depends on the number of volumes, and the stride size. The calculator can be
found here http://busybox.net/~aldot/mkfs_stride.html
For example: If there is one Nimble volume with 8KB block size performance policy, then it should look like this.
Creating Nimble Performance Policy
On the Nimble Management GUI, click on “Manage/Performance Policies” and click on the “New Performance
Policy” button. Enter the appropriate settings then click “OK”.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 2
Examples of EXTExamples of EXTExamples of EXTExamples of EXT4444 SetupSetupSetupSetup with 8 Volumeswith 8 Volumeswith 8 Volumeswith 8 Volumes::::
Create EXT file systemCreate EXT file systemCreate EXT file systemCreate EXT file system
[root@mktg04 ~]# for a in {1..8} ; do mkfs.ext4 /dev/mapper/hdfs$a -b 4096 -E stride=8,stripe-width=64; done
Mount options in Mount options in Mount options in Mount options in /etc/fstab/etc/fstab/etc/fstab/etc/fstab filefilefilefile
/dev/mapper/hdfs1 /hdfs1 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs2 /hdfs2 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs3 /hdfs3 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs4 /hdfs4 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs5 /hdfs5 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs6 /hdfs6 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs7 /hdfs7 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
/dev/mapper/hdfs8 /hdfs8 ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
Nimble Reference Architecture
NoteNoteNoteNote: The Hadoop nodes need to contain only a single local disk for Linux operating system.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 3
Hadoop2.x Recommended Settings for Nimble Storage
Core-site.xml
ParameterParameterParameterParameter
ValueValueValueValue
file.stream-buffer-size
32768
io.file.buffer.size
32768
HDFS-site.xml
ParameterParameterParameterParameter
ValueValueValueValue
dfs.blocksize
512MB
dfs.replication
2
Yarn-site.xml
ParameterParameterParameterParameter
ValueValueValueValue
yarn.scheduler.minimum-allocation-mb
Minimum RAM per container (system memory dependent)
yarn.scheduler.maximum-allocation-mb
25% be higher than mapreduce.reduce.memory.mb
yarn.nodemanager.resource.memory-mb
Maximum memory Yarn can use on the node (system memory dependent)
Mapred-site.xml
ParameterParameterParameterParameter
ValueValueValueValue
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 4
mapreduce.map.memory.mb
Twice the amount of yarn.scheduler.minimum-allocation-mb
mapreduce.map.java.opts
75% of mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
4 times yarn.scheduler.minimum-allocation-mb
mapreduce.reduce.java.opts
75% of mapreduce.reduce.memory.mb
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H ADOOP 2 . X 1 5
Nimble Storage, Inc.
211 River Oaks Parkway, San Jose, CA 95134
Tel: 877-364-6253) | www.nimblestorage.com | [email protected]
© 2014 Nimble Storage, Inc. Nimble Storage, InfoSight, SmartStack, NimbleConnect, and CASL are trademarks or registered trademarks of Nimble Storage, Inc. All other trademarks are the property of their respective owners. BPG-Hadoop-1114