distributed queue system using gearman

Post on 17-May-2015

13.195 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

We use Gearman for managing queue system. This covers why we should use a queue in many situations on web-based interface as well as server-side application.

TRANSCRIPT

Distributed Queue Systemusing Gearman

Taehun ChoCTO @ IMcompany

http://iamcompany.net/

What is a Queue?

copyright@ www.mathworks.com

Multiple Queue

With multiple teller, all customers must form multiple queue to be served

Some Situation on the WEB

You want to send an email, push notification, messages, import big file, etc...

+

Problem is that...

DO NOT LET YOUR USERS WAIT

Run Asynchronously

Running Task in a Single Thread?

Loading behavior may affect the main GUI

Run in Background!

The user can do 'next' action, and the task should be done in backend in a distributed manner

Job Queue Systems

Celery (http://www.celeryproject.org/)RabbitMQ (http://www.rabbitmq.com/)Zend Server Job Queue (http://www.zend.com/)ZeroMQ (http://www.zeromq.org/)BeanstalkdPeafowl, Starling Apache ActiveMQ

and many others..

Introducing Gearman

LiveJournal

Introducing Gearman

In LiveJournal, many photos had uploaded every day and it lead to a heavy load of image processing, and this was a motivation to build such a queue system.● Yahoo!: 120+ servers, 12M jobs/day● Digg: 45+ servers, 400K jobs/day● LiveJournal, SixApart, DealNews, Xing.com,

and many others. - Expert PHP and MySQL - Andrew et al, (2010, Wrox)

● Grooveshark, GoDaddy.com, IMcompany

Features of Gearman

● Open Source● Simple & Fast (rewritten in C)● Support a variety of languages

: build Worker in Python, Client in PHP● Flexible ● Load Balance● Failover

Example of Architectures

(from http://gearman.org/#what_is_gearman)

Architecture

ClientWorker

Gearman Job Server

Connect, submit a job

Acks the job, finds all sleeping workers

Sends a 'noop' command to wake them up

Awake, asks for jobs to server

Installation

● Compiletar xzvf gearmand-X.Y.tar.gzcd gearmand-X.Y./configuremakemake install

● Start Server$ gearmand -d

(for PHP APIs)● Pecl Extensionsudo pecl install gearman

● Add below to php.iniextension="gearman.so"

Use Cases

- Crawling a website- Image Manipulation- Push Notification - Sending Email/Messages- File verification/compressing- Fetching RSS Feeds- Indexing on Search Engine

Samples - Worker

Samples - Client

Samples - Monitoring

A good tool for monitoring gearman, is available at https://github.com/yugene/Gearman-Monitor

Result

Worker #1 Worker #2

The incomplete job will re-queue to available workersfor fault-tolerance

Motivation

● At the beginning state, we run 3 computers for crawling each school's information. (articles, schedules of the school)

● One job at a time, too much time to finish all of them, sometimes machines do the same job as the others do.

● That was a motivation to make a job queue system that could do jobs in parallel. And we've found Gearman!

Gearman in IMcompany

But there were some challenges!● How many workers should be up for a

server? (How efficiently leverage the load?)● How can we handle unexpected termination

of workers?● What if the server's resource is exhausted

due to the jobs that given by workers? (Then the server would not respond to other's requests/connections related to WEB, SVN, MySQL)

Exceptional Case #1

Reported bugs when using PHP

Bug #63041 "Failed to set exception option" on connect when any gearman server is downhttps://bugs.php.net/bug.php?id=63041

Bug #63648 Gearman worker stops with segfault after 1-2 hour of workinghttps://bugs.php.net/bug.php?id=63648

Supervisord for sanity

"PHP was not built for long running request""Sometimes it occurs memory leaks"

Supervisord helps you in above cases!- Auto restart the processes based on custom configurations

* Installation guide - http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html

Exceptional Case #2

PHP sometimes slows down after hundreds of executions, kill it off if you know this will happen. - Mike Willbanks, "Gearman: A Job Server made for Scale"

Server Seems Fine for Now

What We Learned

● Gearman's queue list is unstable so persistent queueing was highly needed in our system

● Integrating MySQL with Gearman was failed in both 1.0.2, 0.34

● Tried SQLite, but performance was very poor

Do NOT Reserve Too Much Jobs in a Queue

Also We've Tried...

● Firing queueing jobs over HTTP request is sometimes not working and may lead to freezing the server eventually

● And doesn't support additional functions for the HTTP connection such as authentication

● And is not customizable

Gearman Seems Too Young at This Moment

Limitations

● Queue makes no guarantees - use MySQL, memcached, Redis, PostgreSQL, etc..

● There are few administration tools● Jobs don't expire● If a job is dropped, the client is never be

notified-from "http://inside.godaddy.com/cloud-processing-with-gearman/"

Join Community!

http://gearman.org/http://groups.google.com/group/gearman/

We're hiring!

● Work in Daejeon, Korea● Flexible, Small Company● Excellent Benefits● We Need Senior HackersFind more information at http://iamcompany.net/

Thank you!Any questions?

top related