building hadoop with chef
DESCRIPTION
Slides from my presentation at #ChefConf 2013 Big Data meets Configuration Management. Edmunds.com's first foray into Hadoop is a tale of challenges, discovery, and ultimately triumph. This is the story of how Edmunds.com leveraged Chef - and its community - to build a fully automated Hadoop cluster in the face of looming project deadlines.TRANSCRIPT
Build & Managing HadoopBuild & Managing Hadoopwith Chefwith Chef
John MartinSr Director, Production Engineering
IntroductionIntroduction
• Me, Me, Me
• 10+ years in .com & JEE space
• Project Crew
• Paul MacDougall
• Greg Rokita
• KC Braunschweig (former)
• Ryan Holmes (former)
• Edmunds.com
• Founded in 1966
• Gopher site in 1994
• HTTP site in 1995
Edmunds.com EnvironmentEdmunds.com Environment
• Nearing 3000 hosts
• Heavily virtualized(Xen, CloudStack, AWS)
• Tomcat with some WebLogic
• Coherence Solr Mongo
• Publishing built on ActiveMQ
• Newly launched DWH built around Hadoop + Netezza
• Explosive infrastructure growth
• Quick to bootstrap
• Easy integration with our tooling
• knife
• The Chef Community
Why Chef?Why Chef?
• Open framework for data-intensive distributed applications
• Reigning King of “Big Data”
• Many services
• HDFS
• MapReduce
• HBase
• ZooKeeper
• Designed to run on commodity hardware
What’s Hadoop?What’s Hadoop?
• Multiple Clusters
• Roughly 200Tb in total
• 40+ nodes in production
• Maintained by Ops + Dev
• Dell R410
• Six-core 2.40Ghz
• 24Gb RAM
• 4x 1Tb 7200RPMs
Edmunds Hadoop EnvironmentEdmunds Hadoop Environment
• First cluster was a Frankenstein
• Part BMC
• Part manual effort
• Part Puppet
• Staff changes & knowledge loss
• Time for a clean slate!
How We Got HereHow We Got Here
• True Dev + Ops effort
• Production built in 3 weeks
• Built with community cookbooks
• All services now administered with knife
• New nodes now cluster-ready within minutes
Building Hadoop with ChefBuilding Hadoop with Chef
• First highly-visible Chef success story at Edmunds
• Cemented Chef as our CM solution
• Engaged us with the community
• Completely automated Hadoop infrastructure
• New suite of administrative scripts
• knife-[start|stop]-all.sh $cluster
• knife-[start|stop]-hbase.sh $cluster
• knife-[start|stop]-mapred.sh $cluster
• knife-[start|stop]-oozie.sh $cluster
What We GainedWhat We Gained
• New cluster currently being built!
• Integration with Cloudera Manager
• Cluster replication
• Continue evangelism of Chef’s awesomeness
• Extend more of the toolchain around Chef
• See you around at the LA Chef UG!
Where Next?Where Next?