distributed queue system using gearman
DESCRIPTION
We use Gearman for managing queue system. This covers why we should use a queue in many situations on web-based interface as well as server-side application.TRANSCRIPT
Distributed Queue Systemusing Gearman
Taehun ChoCTO @ IMcompany
http://iamcompany.net/
What is a Queue?
copyright@ www.mathworks.com
Multiple Queue
With multiple teller, all customers must form multiple queue to be served
Some Situation on the WEB
You want to send an email, push notification, messages, import big file, etc...
+
Problem is that...
DO NOT LET YOUR USERS WAIT
Run Asynchronously
Running Task in a Single Thread?
Loading behavior may affect the main GUI
Run in Background!
The user can do 'next' action, and the task should be done in backend in a distributed manner
Job Queue Systems
Celery (http://www.celeryproject.org/)RabbitMQ (http://www.rabbitmq.com/)Zend Server Job Queue (http://www.zend.com/)ZeroMQ (http://www.zeromq.org/)BeanstalkdPeafowl, Starling Apache ActiveMQ
and many others..
Introducing Gearman
LiveJournal
Introducing Gearman
In LiveJournal, many photos had uploaded every day and it lead to a heavy load of image processing, and this was a motivation to build such a queue system.● Yahoo!: 120+ servers, 12M jobs/day● Digg: 45+ servers, 400K jobs/day● LiveJournal, SixApart, DealNews, Xing.com,
and many others. - Expert PHP and MySQL - Andrew et al, (2010, Wrox)
● Grooveshark, GoDaddy.com, IMcompany
Features of Gearman
● Open Source● Simple & Fast (rewritten in C)● Support a variety of languages
: build Worker in Python, Client in PHP● Flexible ● Load Balance● Failover
Example of Architectures
(from http://gearman.org/#what_is_gearman)
Architecture
ClientWorker
Gearman Job Server
Connect, submit a job
Acks the job, finds all sleeping workers
Sends a 'noop' command to wake them up
Awake, asks for jobs to server
Installation
● Compiletar xzvf gearmand-X.Y.tar.gzcd gearmand-X.Y./configuremakemake install
● Start Server$ gearmand -d
(for PHP APIs)● Pecl Extensionsudo pecl install gearman
● Add below to php.iniextension="gearman.so"
Use Cases
- Crawling a website- Image Manipulation- Push Notification - Sending Email/Messages- File verification/compressing- Fetching RSS Feeds- Indexing on Search Engine
Samples - Worker
Samples - Client
Samples - Monitoring
A good tool for monitoring gearman, is available at https://github.com/yugene/Gearman-Monitor
Result
Worker #1 Worker #2
The incomplete job will re-queue to available workersfor fault-tolerance
Motivation
● At the beginning state, we run 3 computers for crawling each school's information. (articles, schedules of the school)
● One job at a time, too much time to finish all of them, sometimes machines do the same job as the others do.
● That was a motivation to make a job queue system that could do jobs in parallel. And we've found Gearman!
Gearman in IMcompany
But there were some challenges!● How many workers should be up for a
server? (How efficiently leverage the load?)● How can we handle unexpected termination
of workers?● What if the server's resource is exhausted
due to the jobs that given by workers? (Then the server would not respond to other's requests/connections related to WEB, SVN, MySQL)
Exceptional Case #1
Reported bugs when using PHP
Bug #63041 "Failed to set exception option" on connect when any gearman server is downhttps://bugs.php.net/bug.php?id=63041
Bug #63648 Gearman worker stops with segfault after 1-2 hour of workinghttps://bugs.php.net/bug.php?id=63648
Supervisord for sanity
"PHP was not built for long running request""Sometimes it occurs memory leaks"
Supervisord helps you in above cases!- Auto restart the processes based on custom configurations
* Installation guide - http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html
Exceptional Case #2
PHP sometimes slows down after hundreds of executions, kill it off if you know this will happen. - Mike Willbanks, "Gearman: A Job Server made for Scale"
Server Seems Fine for Now
What We Learned
● Gearman's queue list is unstable so persistent queueing was highly needed in our system
● Integrating MySQL with Gearman was failed in both 1.0.2, 0.34
● Tried SQLite, but performance was very poor
Do NOT Reserve Too Much Jobs in a Queue
Also We've Tried...
● Firing queueing jobs over HTTP request is sometimes not working and may lead to freezing the server eventually
● And doesn't support additional functions for the HTTP connection such as authentication
● And is not customizable
Gearman Seems Too Young at This Moment
Limitations
● Queue makes no guarantees - use MySQL, memcached, Redis, PostgreSQL, etc..
● There are few administration tools● Jobs don't expire● If a job is dropped, the client is never be
notified-from "http://inside.godaddy.com/cloud-processing-with-gearman/"
Join Community!
http://gearman.org/http://groups.google.com/group/gearman/
We're hiring!
● Work in Daejeon, Korea● Flexible, Small Company● Excellent Benefits● We Need Senior HackersFind more information at http://iamcompany.net/
Thank you!Any questions?