october 2014 hug : oozie ha

19
Oozie High Availability Hadoop User Group meetup 10/15/14 Robert Kanter (Cloudera) Ryota Egashira (Yahoo!)

Upload: yahoo-developer-network

Post on 16-Apr-2017

1.269 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: October 2014 HUG : Oozie HA

Oozie High AvailabilityHadoop User Group meetup 10/15/14

Robert Kanter (Cloudera)Ryota Egashira (Yahoo!)

Page 2: October 2014 HUG : Oozie HA

Agenda● What is High Availability?● Architectural Overview● Security● Authentication● HCatalog Integration in HA● Other Challenges in HA● Future Work

Page 3: October 2014 HUG : Oozie HA

What is High Availability?● A system without non-planned downtime when partial

failures occuro Typically achieved by having redundancies and

removing single-points of failure

● Our Goalso Don’t change the API or usage patternso User doesn’t even have to know it’s HA

Page 4: October 2014 HUG : Oozie HA

Architectural Overview:Database● Oozie stores most of its state in a database

o (submitted jobs, workflow definitions, etc)

● Instead of a failover model, we want to run many Oozie servers against the same databaseo Active-Active HAo Also provides horizontal scalability

● Zookeeper for coordination

Page 5: October 2014 HUG : Oozie HA
Page 6: October 2014 HUG : Oozie HA

Architectural Overview:Access● Users and client programs need a single address to

connect too Web UI, REST/Java API,

JobTracker/ResourceManager callbacks, etc

● Load balancer, Virtual IP, or DNS round-robin can be used to provide a single entry point to the Oozie serverso Technically also needs to be HA

Page 7: October 2014 HUG : Oozie HA
Page 8: October 2014 HUG : Oozie HA

Architectural Overview:Log Streaming● Oozie’s log files are not in the database

o Each Oozie server only has access to its own logs

● Jobs are not assigned to a specific Oozie server

● What if the user asks Oozie Server 1 for logs for a job processed by Oozie Server 2?o Oozie Server 1 can ask Oozie Server 2 for its logs

● Caveat: If an Oozie Server goes down, any logs from it will be unavailable

Page 9: October 2014 HUG : Oozie HA
Page 10: October 2014 HUG : Oozie HA

Security● Existing security features continue to work

● authentication tokenso Signed cookies for authenticating users to Oozie servero Each Oozie server uses it’s own randomly generated secret

Problem: Won’t accept cookies signed by other Oozie servers

● Hadoop-auth in Hadoop 2.6.0 will add support for pluggable secret providerso Includes a Zookeeper-backed implementation that

synchronizes a rolling randomly generated secret across multiple servers No locking required!

Page 11: October 2014 HUG : Oozie HA
Page 12: October 2014 HUG : Oozie HA
Page 13: October 2014 HUG : Oozie HA

Authentication

Load Balancer

Oozie Server 1

Oozie Server 2

Oracle DB

Zookeeper

Hadoop Cluster

user submit request

Load balanced request redirection

Inter server communication for log streaming, sharelib.etc

Zookeeper for lock and management

ApacheCurator

Page 14: October 2014 HUG : Oozie HA

Authentication

Load Balancer

Oozie Server 1

Oozie Server 2

Oracle DB

Zookeeper

Hadoop Cluster

user submit request

Security: https + kerberos / cookie-based auth

Load balanced request redirectionSecurity: https + kerberos / cookie-based-auth

Inter server communication for log streaming, sharelib.etcSecurity: https+kerberos

Security: kerberos

Zookeeper for lock and managementSecurity: Kerberos

ApacheCurator

Page 15: October 2014 HUG : Oozie HA

HCatalog Integration in HA

• HCatalog : metadata management service for HDFS datasets– Oozie receive notification from Hcatalog through JMS

(e.g., ActiveMQ)– Start a job immediately after data becomes ready

Oozie 1

JMS(e.g, ActiveMQ)

HCatalog

3. Push notification<New Partition>

2. Register Topic

4. Notify New Partition

1. Query/Poll Partition

Oozie 2

Job

Page 16: October 2014 HUG : Oozie HA

HCatalog Integration in HA

• To support HA– Keep consistency in in-memory data structure

• store list of jobs waiting for a data partition– Create and cleanup topic listener for JMS

Oozie 1 HCatalog

3. Push notification<New Partition>

2. Register Topic

4. Notify New Partition

1. Query/Poll Partition

Oozie 2

Job

JMS(e.g, ActiveMQ)

Page 17: October 2014 HUG : Oozie HA

Other Challenges in HA

• SLA support – Oozie has in-memory data structure to track sla status for

each job (start/duration/end met/miss and notifications)– add check of sla status against Database– use ZK lock to synchronize update on the same job from

multiple servers.

• Distributed Locks– Reentrant distributed lock using Apache Curator +

Zookeeper

Page 18: October 2014 HUG : Oozie HA

Other Challenges in HA

● Distributed Job IDo Maintain distributed sequence number for Job ID

using Apache Curator + Zookeeper

● Zookeeper Failure Handlingo Oozie servers automatically shutdown when

Zookeeper is down

Page 19: October 2014 HUG : Oozie HA

Future work

• Learn from experience for stability– At Y!, HA running on non-prod grids >1 month, and

prod deployment in Q4

• Faster job fail-over – currently wait for a thread (Recovery Service) to pick

non-progressing jobs every few minutes– Oozie server should immediately notice when other

server is down and fail-over job (e.g, using ZK watcher)

• Improve log streaming