july 2012 hug: overview of oozie qualification process

40
Overview of Oozie QE Qualification Process Michelle Chiang 07/18/2012

Upload: yahoo-developer-network

Post on 26-Jan-2015

108 views

Category:

Technology


2 download

DESCRIPTION

The talk will cover the Oozie QE practice and process in Yahoo!, the types of tests that QE perform before release, and the roadmap.

TRANSCRIPT

  • 1. Overview of Oozie QE Qualification ProcessMichelle Chiang07/18/2012

2. Agenda What is Oozie Qualification stages Challenges Future tasks Q&A 3. What is Oozie? Scalable, secure workflow schedulingsystem for Hadoop. Three levels of jobs Workflow job Support actions such as MR, Pig, Java, Distcp Coordinator job Scheduling Bundle job Monitor status of coordinator jobs 4. Job Submission to HadoopOozieClientHadoop Cluster1. CLI Job TrackerActual2. Java Client API M/R Job3. WS API LauncherMapperOozieServer 5. QE Qualification Process Develop test plan in design cycle Design and implement test cases Execute tests Prepare release notes & certification Support production deployment andcustomers FAQs 6. Develop Test Plan Prepare test plan for new featuresdefined in PRD, or Prepare test plan for the selected newfeatures checked into the apache source Define test strategy Test plan is reviewed by QE and Dev 7. Test plan example test plan for shell actionCase Execution Expected results CommentIDTicket # Shell action1. Read env var, compare action data Pass/Fail, bug#/JIRA# 2. Read config env var 3. Hadoop fs ls; hadoop fs -cpTest_sh* Bash shell1, 2, 3 Perl script 1, 2 Python script 1, 2 Java1, 2 C++ 1, 2 8. Design and implement test cases Design PrepareBuild Verify/Bug Automate Demotest case test data 9. Unit tests Unit tests 784 unit tests code coverage: 72% Checked in with code by developers Executed by CI build as a Jenkins job 10. Functional tests Functional tests (including regressiontests) as of 3.2.0: Use real systems (hadoop, oozie), notminicluster or minioozie 1129 shell-based tests 146 Java OozieClient API tests (in testNG) Runtime: 36 hours, on 2 servers/clusters Manual setup time: 20min 11. Shell-based tests Assumptions: secure hadoop cluster is up oozie server is configured and up. 2 types of tests Individualized feature tests Customized validation Self-contained 1 script drives many tests Good for repetitive testing, e.g., schema tests 12. Example: run.sh Prepare: generate jobprepare property file based ongiven conf and template Upload: delete existing upload data, and uploadapplication/data to hdfs Submit: submit oozie jobssubmit Verify: check jobs finishsuccessfully verify 13. Test validation (1) Add validation into the workflow.xml Apply decision node to check wf:actionData fs:exists Other EL functions Apply Java action to verify capture-output 14. Test validation (2) Add validation into run.sh Apply oozie client commands to check Job status, log, configurations, definition, dryrun Apply shell commands to parse results Download output data, parse and compare 15. Integration Integration tests: 15 tests, within hadoop eco system Including Hadoop, Pig, Hcatalog, Distcp. Runtime: ~5 hours (oozie tests only) Manual setup time: 30min Plus, test package preparation & test run: 3 hr Examples Oozie and MapReduce Oozie and Pig Oozie and Hcatalog 16. Stress tests (1) Performance/stress/longevity tests: 10 tests Runtime: 12 hours for performance/stress tests 7 days for longevity testing. Manual setup & analysis time: ~ 10min per test 17. Stress tests (2) Performance metrics: job submission rate status update no failed jobs number of jobs submitted vs. completed Longevity tests: 300 wf jobs/min for 7 days ~= 3M jobs 18. Memory tests Memory/stress tests: 3 tests Runtime: ~ 10 hours. Manual setup & analysis: 30min per test Examples: Purge big amount of wf/coord/bundle jobs Query a coord job with 100k actions Query a coord job with 8k actions by N threads 19. Upgrade/installation tests Upgrade tests: 14 tests Runtime: 4 hours (manual setup: 2hr) steps: Submit wf/coord/bundle jobs Shut down oozie server Upgrade database schema, oozie version, oozieconfig Restart oozie server 20. Release notes and certification Release notes New features Package version and new settings New db schema Certification Number of tests being executed and passrate Known issues 21. Production and customer support Document FAQs, e.g., usage of newfeatures Support production deployment issues Meet customers SLA requirements 22. Experiences learned (1) Add time-out to the test script If the test fails to reach expected status Carefully timed the verification step tocatch transient states. Job status transition, e.g., from PREP toRUNNING to PAUSED 23. Experiences learned (2) Increase hadoop capacity Modify hadoop queue capacity property Modify user limit Increase database active connections 24. Experiences learned (3) Accumulate large number of jobs fortesting Increase materialization window Reduce materialization look up interval Coordinator jobs frequency, duration Also, check database memory usage 25. Experiences learned (4) Check oozie job log, tomcat server log,hadoop jobtracker log for debugging Dev adds debugging statements 26. Challenges - production issues Reproduce and debug issues in QEenvironment. Set up QE environment as close toproduction as possible. Recent story: using CNAME for oozie URL. 27. Challenges backward compatibility Oozie always guarantees backwardcompatibility Web-service API Job definitions Client API Verify old jobs continue to run in newrelease 28. Challenges multiple versions Compatibility of multiple versions of othercomponents Hadoop API Pig Hcatalog 29. Work in Progress (1) Increase test coverage Java based, testNG framework Server-side oozie white box testing Improved web service API testing 30. Work in Progress (2) Hadoop 2.x integration testing,including HDFS federation. Memory monitoring framework Performance benchmark framework Of course, new oozie releases 31. Open sourcing Short term: Shell based tests Review file/data structure Add readme, copyright, etc Work in progress Long term: Java based tests oozie-core, oozie-client, oozie-ws 32. Y! Oozie QE teamQE Architect Jane Q. Chen [email protected] Engineer Marcy [email protected] EngineerMichelle [email protected] 33. Acknowledgement All oozie developers in the community! http://incubator.apache.org/oozie/ [email protected] 34. Thank you! Q&A [email protected] 35. Back up slides 36. An Oozie Workflow MapReduceOKStreaming jobFS jobOKstartfork join(mkdir) Pig jobOK Case1 DecisionCase2MapReduce job OKJavaActionOKOK FS jobend(chmod) 37. Oozie Wordcount Workflow Example Non-Oozie (single map-reduce job)From Gateway,[yourid@gwgd2211 ~]$ hadoop jar hadoop-examples.jar wordcount-Dmapred.job.queue.name=queue_name inputDir outputDir OozieMapReduce OK Startwordcount EndWorkflow.xml ERRORKill 38. Example: shell-action workflow.xmlShell action${SCRIPT}-classpath./${SCRIPTFILE}:$CLASSPATHscript${SCRIPTFILE}#${SCRIPTFILE}wf:actionData matches?false true${wf:actionData(shell-sh)[PATH1] == Reset}end kill 39. Integration tests Compatible with other components No system failures, e.g., NN, JT,Hcat_server Run standalone utility to narrow downissues For example, pig, distcp Check oozies launcher log on Jobtracker 40. Production environment Total number of nodes: 42K+ Total number of Clusters: 25+ 1 oozie server per cluster Total number of processed jobs 750K/month