distributed queue system using gearman

32
Distributed Queue System using Gearman Taehun Cho CTO @ IMcompany http://iamcompany.net/

Upload: eric-cho

Post on 17-May-2015

13.195 views

Category:

Technology


5 download

DESCRIPTION

We use Gearman for managing queue system. This covers why we should use a queue in many situations on web-based interface as well as server-side application.

TRANSCRIPT

Page 1: Distributed Queue System using Gearman

Distributed Queue Systemusing Gearman

Taehun ChoCTO @ IMcompany

http://iamcompany.net/

Page 2: Distributed Queue System using Gearman

What is a Queue?

copyright@ www.mathworks.com

Page 3: Distributed Queue System using Gearman

Multiple Queue

With multiple teller, all customers must form multiple queue to be served

Page 4: Distributed Queue System using Gearman

Some Situation on the WEB

You want to send an email, push notification, messages, import big file, etc...

+

Page 5: Distributed Queue System using Gearman

Problem is that...

DO NOT LET YOUR USERS WAIT

Page 6: Distributed Queue System using Gearman

Run Asynchronously

Page 7: Distributed Queue System using Gearman

Running Task in a Single Thread?

Loading behavior may affect the main GUI

Page 8: Distributed Queue System using Gearman

Run in Background!

The user can do 'next' action, and the task should be done in backend in a distributed manner

Page 9: Distributed Queue System using Gearman

Job Queue Systems

Celery (http://www.celeryproject.org/)RabbitMQ (http://www.rabbitmq.com/)Zend Server Job Queue (http://www.zend.com/)ZeroMQ (http://www.zeromq.org/)BeanstalkdPeafowl, Starling Apache ActiveMQ

and many others..

Page 10: Distributed Queue System using Gearman

Introducing Gearman

LiveJournal

Page 11: Distributed Queue System using Gearman

Introducing Gearman

In LiveJournal, many photos had uploaded every day and it lead to a heavy load of image processing, and this was a motivation to build such a queue system.● Yahoo!: 120+ servers, 12M jobs/day● Digg: 45+ servers, 400K jobs/day● LiveJournal, SixApart, DealNews, Xing.com,

and many others. - Expert PHP and MySQL - Andrew et al, (2010, Wrox)

● Grooveshark, GoDaddy.com, IMcompany

Page 12: Distributed Queue System using Gearman

Features of Gearman

● Open Source● Simple & Fast (rewritten in C)● Support a variety of languages

: build Worker in Python, Client in PHP● Flexible ● Load Balance● Failover

Page 13: Distributed Queue System using Gearman

Example of Architectures

(from http://gearman.org/#what_is_gearman)

Page 14: Distributed Queue System using Gearman

Architecture

ClientWorker

Gearman Job Server

Connect, submit a job

Acks the job, finds all sleeping workers

Sends a 'noop' command to wake them up

Awake, asks for jobs to server

Page 15: Distributed Queue System using Gearman

Installation

● Compiletar xzvf gearmand-X.Y.tar.gzcd gearmand-X.Y./configuremakemake install

● Start Server$ gearmand -d

(for PHP APIs)● Pecl Extensionsudo pecl install gearman

● Add below to php.iniextension="gearman.so"

Page 16: Distributed Queue System using Gearman

Use Cases

- Crawling a website- Image Manipulation- Push Notification - Sending Email/Messages- File verification/compressing- Fetching RSS Feeds- Indexing on Search Engine

Page 17: Distributed Queue System using Gearman

Samples - Worker

Page 18: Distributed Queue System using Gearman

Samples - Client

Page 19: Distributed Queue System using Gearman

Samples - Monitoring

A good tool for monitoring gearman, is available at https://github.com/yugene/Gearman-Monitor

Page 20: Distributed Queue System using Gearman

Result

Worker #1 Worker #2

The incomplete job will re-queue to available workersfor fault-tolerance

Page 21: Distributed Queue System using Gearman

Motivation

● At the beginning state, we run 3 computers for crawling each school's information. (articles, schedules of the school)

● One job at a time, too much time to finish all of them, sometimes machines do the same job as the others do.

● That was a motivation to make a job queue system that could do jobs in parallel. And we've found Gearman!

Page 22: Distributed Queue System using Gearman

Gearman in IMcompany

But there were some challenges!● How many workers should be up for a

server? (How efficiently leverage the load?)● How can we handle unexpected termination

of workers?● What if the server's resource is exhausted

due to the jobs that given by workers? (Then the server would not respond to other's requests/connections related to WEB, SVN, MySQL)

Page 23: Distributed Queue System using Gearman

Exceptional Case #1

Page 24: Distributed Queue System using Gearman

Reported bugs when using PHP

Bug #63041 "Failed to set exception option" on connect when any gearman server is downhttps://bugs.php.net/bug.php?id=63041

Bug #63648 Gearman worker stops with segfault after 1-2 hour of workinghttps://bugs.php.net/bug.php?id=63648

Page 25: Distributed Queue System using Gearman

Supervisord for sanity

"PHP was not built for long running request""Sometimes it occurs memory leaks"

Supervisord helps you in above cases!- Auto restart the processes based on custom configurations

* Installation guide - http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html

Page 26: Distributed Queue System using Gearman

Exceptional Case #2

PHP sometimes slows down after hundreds of executions, kill it off if you know this will happen. - Mike Willbanks, "Gearman: A Job Server made for Scale"

Page 27: Distributed Queue System using Gearman

Server Seems Fine for Now

Page 28: Distributed Queue System using Gearman

What We Learned

● Gearman's queue list is unstable so persistent queueing was highly needed in our system

● Integrating MySQL with Gearman was failed in both 1.0.2, 0.34

● Tried SQLite, but performance was very poor

Do NOT Reserve Too Much Jobs in a Queue

Page 29: Distributed Queue System using Gearman

Also We've Tried...

● Firing queueing jobs over HTTP request is sometimes not working and may lead to freezing the server eventually

● And doesn't support additional functions for the HTTP connection such as authentication

● And is not customizable

Gearman Seems Too Young at This Moment

Page 30: Distributed Queue System using Gearman

Limitations

● Queue makes no guarantees - use MySQL, memcached, Redis, PostgreSQL, etc..

● There are few administration tools● Jobs don't expire● If a job is dropped, the client is never be

notified-from "http://inside.godaddy.com/cloud-processing-with-gearman/"

Page 31: Distributed Queue System using Gearman

Join Community!

http://gearman.org/http://groups.google.com/group/gearman/

Page 32: Distributed Queue System using Gearman

We're hiring!

● Work in Daejeon, Korea● Flexible, Small Company● Excellent Benefits● We Need Senior HackersFind more information at http://iamcompany.net/

Thank you!Any questions?