Gearman provides a generic application framework to farm out work to other machines or processes that are
better suited to do the work. It allows you to do work in parallel, to load balance processing, and to
call functions between languages. It can be used in a variety of applications, from high-availability web
sites to the transport of database replication events. In other words, it is the nervous system for how
distributed processing communicates. A few strong points about Gearman:
• Open Source - It's free! (in both meanings of the word) Gearman has an active open source community
that is easy to get involved with if you need help or want to contribute.
• Multi-language - There are interfaces for a number of languages, and this list is growing. You also
have the option to write heterogeneous applications with clients submitting work in one language and
workers performing that work in another.
• Flexible - You are not tied to any specific design pattern. You can quickly put together distributed
applications using any model you choose, one of those options being Map/Reduce.
• Fast - Gearman has a simple protocol and interface with a new optimized server in C to minimize your
application overhead.
• Embeddable - Since Gearman is fast and lightweight, it is great for applications of all sizes. It is
also easy to introduce into existing applications with minimal overhead.
• No single point of failure - Gearman can not only help scale systems, but can do it in a fault tolerant
way.
ThreadModel
The -t option to gearmand allows you to specify multiple I/O threads, this is enabled by default. There
are currently three types of threads in the job server:
Listening and management thread - only one I/O thread - can have many Processing thread - only one
When no -t option is given or -t 0 is given, all of three thread types happen within a single thread.
When -t 1 is given, there is a thread for listening/management and a thread for I/O and processing. When
-t 2 is given, there is a thread for each type of thread above. For all -t option values above 2, more
I/O threads are created.
The listening and management thread is mainly responsible for accepting new connections and assigning
those connections to an I/O thread (if there are many). It also coordinates startup and shutdown within
the server. This thread will have an instance of libevent for managing socket events and signals on an
internal pipe. This pipe is used to wakeup the thread or to coordinate shutdown.
The I/O thread is responsible for doing the read and write system calls on the sockets and initial packet
parsing. Once the packet has been parsed it it put into an asynchronous queue for the processing thread
(each thread has it's own queue so there is very little contention). Each I/O thread has it's own
instance of libevent for managing socket events and signals on an internal pipe like the listening
thread.
The processing thread should have no system calls within it (except for the occasional brk() for more
memory), and manages the various lists and hash tables used for tracking unique keys, job handles,
functions, and job queues. All packets that need to be sent back to connections are put into an
asynchronous queue for the I/O thread. The I/O thread will pick these up and send them back over the
connected socket. All packets flow through the processing thread since it contains the information needed
to process the packets. This is due to the complex nature of the various lists and hash tables. If
multiple threads were modifying them the locking overhead would most likely cause worse performance than
having it in a single thread (and would also complicate the code). In the future more work may be pushed
to the I/O threads, and the processing thread can retain minimal functionality to manage those tables and
lists. So far this has not been a significant bottleneck, a 16 core Intel machine is able to process
upwards of 50k jobs per second.
For thread safety to work when UUID are generated, you must be running the uuidd daemon.
PersistentQueues
Inside the Gearman job server, all job queues are stored in memory. This means if a server restarts or
crashes with pending jobs, they will be lost and are never run by a worker. Persistent queues were added
to allow background jobs to be stored in an external durable queue so they may live between server
restarts and crashes. The persistent queue is only enabled for background jobs because foreground jobs
have an attached client. If a job server goes away, the client can detect this and restart the foreground
job somewhere else (or report an error back to the original caller). Background jobs on the other hand
have no attached client and are simply expected to be run when submitted.
The persistent queue works by calling a module callback function right before putting a new job in the
internal queue for pending jobs to be run. This allows the module to store the job about to be run in
some persistent way so that it can later be replayed during a restart. Once it is stored through the
module, the job is put onto the active runnable queue, waking up available workers if needed. Once the
job has been successfully completed by a worker, another module callback function is called to notify the
module the job is done and can be removed. If a job server crashes or is restarted between these two
calls for a job, the jobs are reloaded during the next job server start. When the job server starts up,
it will call a replay callback function in the module to provide a list of all jobs that were not
complete. This is used to populate the internal memory queue of jobs to be run. Once this replay is
complete, the job server finishes its initialization and the jobs are now runnable once workers connect
(the queue should be in the same state as when it crashed). These jobs are removed from the persistent
queue when completed as normal. NOTE: Deleting jobs from the persistent queue storage will not remove
them from the in-memory queue while the server is running.
The queues are implemented using a modular interface so it is easy to add new data stores for the
persistent queue.
A persistent queue module is enabled by passing the -q or –queue-type option to gearmand. Run gearmand
–help to see which queue modules are supported on your system. If you are missing options for one you
would like to use, you will need to install any dependencies and then recompile the gearmand package.
ExtendedProtocols
The protocol plugin interface allows you to take over the packet send and receive functions, allowing you
to pack the buffers as required by the protocol. The core read and write functions can (and should) be
used by the protocol plugin.
HTTP
This protocol plugin allows you to map HTTP requests to Gearman jobs. It only provides client job
submission currently, but it may be extended to support other request types in the future. The plugin can
handle both GET and POST data, the latter being used to send a workload to the job server. The URL being
requested is translated into the function being called.
For example, the request:
POST /reverse HTTP/1.1
Content-Length: 12
Hello world!
Is translated into a job submission request for the function “reverse” and workload “Hello world!”. This
will respond with:
HTTP/1.0 200 OK
X-Gearman-Job-Handle: H:lap:4
Content-Length: 12
Server: Gearman/0.8
!dlrow olleH
The following headers can be passed to change the behavior of the job:
* X-Gearman-Unique: <unique key>
* X-Gearman-Background: true
* X-Gearman-Priority: <high|low>
For example, to run a low priority background job, the following request can be sent:
POST /reverse HTTP/1.1
Content-Length: 12
X-Gearman-Background: true
X-Gearman-Priority: low
Hello world!
The response for this request will not have any data associated with it since it was a background job:
HTTP/1.0 200 OK
X-Gearman-Job-Handle: H:lap:6
Content-Length: 0
Server: Gearman/0.8
The HTTP protocol should be considered experimental.