about this training - s3-ap-southeast · pdf filewith writing real scripts on a hadoop system...

7
ABOUT THIS TRAINING: The world of Hadoop and “Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This industry oriented program is a combination of the training courses in Hadoop developer, Hadoop administrator, Hadoop testing, and analytics. This Hadoop training will also prepare you for the “Big Data Certification of Cloudera- CCP and CCA”. With this course, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems! - So it's not just theory… It's filled with hands-on activities and exercises backed with plenty discussions of use cases of different clients across domains. We believe this will help you to build your confidence on the technology and its usage. You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines,

Upload: duongtu

Post on 05-Feb-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

ABOUTTHISTRAINING:

TheworldofHadoopand“BigData"canbeintimidating-hundredsofdifferenttechnologieswithcrypticnamesformtheHadoopecosystem.

Thiscomprehensivetraininghasbeendesignedbyindustryexpertsconsideringcurrentindustryjobrequirementstoprovidein-depthlearningonbigdataandHadoopModules.

ThisindustryorientedprogramisacombinationofthetrainingcoursesinHadoopdeveloper,Hadoopadministrator,Hadooptesting,andanalytics.

ThisHadooptrainingwillalsoprepareyouforthe“BigDataCertificationofCloudera-CCPandCCA”.

Withthiscourse,you'llnotonlyunderstandwhatthosesystemsareandhowtheyfittogether-butyou'llgohands-onandlearnhowtousethemtosolverealbusinessproblems!

-Soit'snotjusttheory…

It'sfilledwithhands-onactivitiesandexercisesbackedwithplentydiscussionsofusecasesofdifferentclientsacrossdomains.Webelievethiswillhelpyoutobuildyourconfidenceonthetechnologyanditsusage.

You'llfindarangeofactivitiesinthiscourseforpeopleateverylevel.Ifyou'reaprojectmanagerwhojustwantstolearnthebuzzwords,therearewebUI'sformanyoftheactivitiesinthecoursethatrequirenoprogrammingknowledge.Ifyou'recomfortablewithcommandlines,

Page 2: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

we'llshowyouhowtoworkwiththemtoo.Andifyou'reaprogrammer,we'llchallengeyouwithwritingrealscriptsonaHadoopsystemusingjava,Scala,PigLatin,andPython.

YOURTAKEAWAYFROMTRAINING:

You'llwalkawayfromthiscoursewithareal,deepunderstandingofHadoopanditsassociateddistributedsystems,andyoucanapplyHadooptoreal-worldproblems.

Plusavaluablecompletioncertificateiswaitingforyouattheend!

WHATYOUWILLLEARNINTHISBIGDATAHADOOPONLINETRAININGCOURSE?

1. DetailedunderstandingofBigDataanalytics2. MasterfundamentalsofHadoop2.8andYARNfordesigningdistributedsystemsthatmanages

"bigdata"usingHadoopandrelatedtechnologiesforstoringandanalyzingdataatscale.3. Understand the architecture of HDFS and MapReduce for parallel storage and parallel

processing.4. Understand the configuration choices you shouldmake for stability, reliability andoptimized

taskschedulingonyourdistributedsystem.5. AnalyzerelationaldatausingHiveandMySQL(ConnectingHadooptoOtherDBs)6. Analyzenon-relationaldatausingHBase7. UsePigandSparktocreatescriptstoprocessdataonaHadoopclusterinmorecomplexways.8. UnderstandhowHadoopclustersaremanagedbyYARN,ZookeeperandHue.9. KnowhowtoscheduleyourHadoopjobsusingOozie.10. Collectdatafromavarietyofsources toyourHadoopclusterusingSqoopandFlume11. MasterHadoopadministrationactivitieslikeclustermanaging,monitoring,administrationand

troubleshooting12. LearntestingHadoopapplicationsusingMRUnitandotherautomationtools.13. KnowbasicsofSpark,SparkRDD,Graphx,MLlibandwritingSparkapplications14. Practicereal-lifeprojectsusingHadoopandApacheSpark15. Discussiononindustryusecasesofdifferentclientsacrossdomains16. BeequippedtoclearBigDataHadoopCertification.(CCPandCCA)

Page 3: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

RECOMMENDEDSKILLSPRIORTOTAKINGTHISCOURSE

There is no pre-requisite to take this Big data training and tomaster Hadoop. But basics ofUNIX,SQLandjava/Pythonwouldbegood.AtGraduIT,weprovidecomplimentarySelf-PacedunixandJavacoursewithourBigDataHadooptrainingtobrush-uptherequiredskillssothatyouaregoodonyourHadooplearningpath.

WHOSHOULDTAKETHISBIGDATAHADOOPTRAININGCOURSE?

1. ProgrammingDevelopersandSystemAdministrators2. Experiencedworkingprofessionals,Projectmanagers3. Big Data Hadoop Developers eager to learn other verticals like Testing, Analytics,

Administration4. BusinessIntelligence,DatawarehousingandAnalyticsProfessionals5. Graduates,undergraduateseagertolearnthelatestBigDatatechnologycantakethisBigData

HadoopCertificationonlinetraining

BITONHADOOP:

HadoopenablestoBUILDANINSIGHT-DRIVENBUSINESS.

Tobespecific,Hadoopisanopen-sourcesoftwareframeworkforstoringdataandrunningapplicationsonclustersofcommodityhardware.Itprovidesmassivestorageforanykindofdata,enormousprocessingpowerandtheabilitytohandlevirtuallylimitlessconcurrenttasksorjobs.

AlmosteverylargecompanyyoumightwanttoworkatusesHadoopinsomeway,includingAmazon,Ebay,Facebook,Google,LinkedIn,IBM,Spotify,Twitter,andYahoo!Andit'snotjusttechnologycompaniesthatneedHadoop;eventheNewYorkTimesusesHadoopforprocessingimages.

Page 4: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

CURRICULUM:

Module1:UnderstandingBigDataandHadoop

• IntroductiontoBigDataAndHadoop• DiscussiononBigDataanditsSourcesandChallengesrelatedtoit• UnderstandingattributesofBigDataanddifferentdatavarieties• DiscussingUsesCaseson“OpportunityforBusiness”inBigData• DiscussionondifferentsolutionsforproblemsrelatedtoBigData• ComparisonofHadoopvstraditionalsystemsSolutions• UnderstandingCompleteSolutionarchitecturefromDataAcquisitiontoDataAnalysis

forBusiness• DiscussionontechnologiesgettingusedforEndtoEndsolutionforBigDataAnalysis

drivenbusiness• SessiononPYTHON,UNIXandJAVA(Self-pacedlearningvideos)

Module2:UnderstandingHadoopfromThousandfeetView

• BitonhistoryofHadoopanditsevolution• Understandingparallelprocessingandparallelstoragearchitecture• RelatingMAPREDUCEandHDFSArchitecturetoabove• OverviewofalltechnologystacksrelatedtoHadoopEcosystem.• DiscussiononDifferentDistributionsofHadoop,DifferentvendorsofHadoop• Installationandset-upofHadoopCluster(Clouderapreferably)

Module3:UnderstandingParallelStoragesolutionwithHDFS

• UnderstandingHDFSArchitectureindetail• UnderstandingHadoopMaster-SlaveArchitecture• UnderstandingNameNode,DataNode,SecondaryNameNode• DiscussiononBlocksandDataReplication• LearningcommonHDFScommandsandpracticingthem• DiscussiononTypicalProductionClusteranditsconfigurationsforstability,reliability

andOptimization• UnderstandingAnatomyoffileRead,WriteoperationsinHDFS• DiscussiononHadoop2.xClusterArchitecture-FederationandHighAvailability

Page 5: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

Module4:UnderstandingParallelProcesssolutionwithMapReduce

• UnderstandingHadoop2.xMapReduceArchitecture(YARNArchitecture)• UnderstandingHadoop2.xMapReduceComponents• DiscussiononYARNMRApplicationExecutionFlow• DiscussiononAnatomyofMapReduceProgram–WorkFlow• DiscussiononbasicMapReduceAPIConcepts• WritingMapReduceDriver,Mappers,andReducersusingJAVA• DemoonMapReduceprogramexecution• UnderstandingInputSplitsanditsrelationshipwithHDFSBlocks• DiscussonAdvancedConcepts:

o DistributedCacheo CombinerandPartitionero Counterso MapsideandReducesideJoinso UseofCompressiontechniques(Snappy,LZOandZip)o AdvancedDatatypesinMapReduce(WritableandWritableComparable)

• Hands-onexercisesonMapReduceProgramExecution• DemoonTelecomDataset

Module 5: Learn to analyze relational data using Hive

• UnderstandingofHiveFramework&Components• DiscussiononrelationshipbetweenHiveandMapReduce(WhentochooseWhat)• Understandingandhandson

o HiveDataTypeso HiveDDL–Create/Show/DropDataBaseandTableso HiveDML–LoadFiles&InsertData o HiveSQL-Select,Filter,Join,GroupBy

• UnderstandingofInternalandExternalTables• UnderstandingofProgrammingstructureinUDF,PartitionsandBuckets• DiscussiononLimitationsofHive.• DiscussiononHIVEonSPARK.

Module6:LearnDataFlowETLScriptingLanguage“Pig”

• Introduction to Apache Pig• DiscussiononrelationshipbetweenHive,MapReduceandPig• Grunt• ShellandUtilitycomponents • Different data types in Pig

Page 6: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

• ProgrammingStructureinPig • Modes Of Execution in Pig • Experiencing Pig Script by writing Evaluation, Filter, Load and Store functions • UDFs in Pig • Understand Integration of HBASE with Pig (After Completion of HBASE Module)

Module7:Learn to analyze non-relational data using HBase

• IntroductiontoNoSQLDatabases• UnderstandingHBasenon-relational distributedDatabaseanditsarchitecture• When/WhytouseHBase• UnderstandingHBaseClientAPI• KnowhowtoloadDataintoHBase• KnowhowtoqueryDatafromHBase• DiscussiononrelationshipbetweenHive,MapReduce,PigandHBase• Overviewofothernon-relationallikeCassandra,andMongoDBanditsadvantagesover

RDBMSandCareerpathinNOSQLDBs.

Module8:LearnhowtoCollectdatafromavarietyofsourcestoyourHadoopclusterusingSqoop,Flume

• UnderstandWhyandwhatisSQOOP• UnderstandSQOOPArchitecture• LearnhowtoImportingDataUsingSQOOPtoHDFS/HIVE/HBaseFromRDBMS• UnderstandWhyandwhatisFlume• UnderstandFlumeArchitecture• Learnhowtoefficientlycollect,aggregate,andmovelargeamountsofstreamingdataintothe

HadoopDistributedFileSystem(HDFS)

Module9:LearnhowtoscheduleHadoopjobsusingOozie

• IntroductiontoWorkFlowManagementandOozie• LearnOozieComponentsAndWorkflow• OzzieCommands• ExperienceschedulingjobsusingOozie

Module10:DiscussiononEmergingTechnologiesinBigDataLandscape

• Discussionon:o EmergingTrendsInBigDataLandscapeo IntroductiontoSPARK–InMemoryParallelProcessingFrameworkforReal

Time/NearRealTime(NRT)operations

Page 7: ABOUT THIS TRAINING - s3-ap-southeast · PDF filewith writing real scripts on a Hadoop system using java, Scala ... might want to work at uses Hadoop in some way, including Amazon,

• BestpracticesforHadoopDeveloper• DiscussiononInterviewpreparation