apache hadoop yarn - hortonworks meetup presentation
TRANSCRIPT
Apache Hadoop YARN
Page 1
A Cursory Look At The Architecture
© Hortonworks Inc. 2012. Confidential and Proprietary. Page 2
Global Scheduler (ResourceManager)
Page 3
• Pure resource arbitration • Multiple resource dimensions
–<priority, data-locality, memory, cpu, …>
• In-built support for data-locality –Node, Rack etc.– Unique to YARN
© Hortonworks Inc. 2012. Confidential and Proprietary.
Scheduler Concepts
Page 4
• Input from AM(s) is a dynamic list of ResourceRequests –<resource-name, resource-capability>– Resource name: (hostname / rackname / any)– Resource capability: (memory, cpu, …) – Essentially an inverted <name, capability> request map from AM to
RM– No notion of tasks!
• Output - Container–Resource(s) grant on a specific machine–Verifiable grant
© Hortonworks Inc. 2012. Confidential and Proprietary.
Scheduling Walkthrough
Page 5
MapReduce job with 2 maps and 1 reduce
© Hortonworks Inc. 2012. Confidential and Proprietary.
Scheduling Walkthrough
Page 6
Container allocation on r22/h2121:
© Hortonworks Inc. 2012. Confidential and Proprietary.
Scheduling Walkthrough
Page 7
Container allocation on r11/h1010:
© Hortonworks Inc. 2012. Confidential and Proprietary.
Writing Custom Applications
Page 8
• Grand total of 3 protocols–ClientRMProtocol
– Application launching program– submitApplication
–AMRMProtocol– Protocol between AM & RM for resource allocation– registerApplication / allocate / finishApplication
–ContainerManagerProtocol– Protocol between AM & NM for container start/stop– startContainer / stopContainer
© Hortonworks Inc. 2012. Confidential and Proprietary.
© Hortonworks Inc. 2012
API improvements
• Overload of the ‘*’ entry.• Release / reject containers• Ask for specific nodes/racks (only)• Don’t give me containers on this racks/nodes• Single client thread allowed to request containers• Overloaded allocate call
Page 9
© Hortonworks Inc. 2012
Recent advancements
• Tools for debugging AMs–Unmanaged AM
• Generic AM – Utility libraries for writing –YARN-103, YARN-29
• YARN project split and how multiple versions of MapReduce can coexist.
Page 10
© Hortonworks Inc. 2012
Roadmap
• MapReduce container reuse• RM restart capability• Multi-resource scheduling• Generic application history server
Page 11
Questions?
Page 12
Thank You!
© Hortonworks Inc. 2012. Confidential and Proprietary.