Just to be clear: increasing the values of threadpool_workers in galaxy.ini or the number of plugin workers in job_conf.xml will not make you Galaxy server much more responsive. The key to scaling Galaxy is the ability to run multiple Galaxy servers which co-operatively work on the same database.
A simple configuration:
- 1 "job handler" process - responsible for starting and monitoring jobs, submitting jobs to a cluster (if configured), and for setting metadata (externally or internally).
- 1 "web server" process - responsible for servicing web pages to users.
An advanced configuration:
- Multiple "job handler" processes.
- Multiple "web server" processes, proxied through a load-balancing capable web server (e.g. nginx or apache).
There are a few different ways you can run multiple web server processes:
Standalone Paste-based processes:
- Simplest setup, especially if only using a single web server process
- No additional dependencies
- Proxy not required if only using a single web server process
- Not as resilient to failure
- Load balancing typically round-robin regardless of individual process load
- No dynamic scaling
- Higher performance server than Paste
- Better scalability and fault tolerance
- Easier process management and Galaxy server restartability
- Requires uWSGI
Using uWSGI for production servers is recommended by the Galaxy team.
Standalone Paste-based processes
In galaxy.ini, define one or more [server:...] sections:
Two are shown, you should create as many as are suitable for your usage and hardware. On our eight-core server, I run six web server processes. You may find you only need one, which is a slightly simpler configuration.
In galaxy.ini, define a [uwsgi] section:
Port numbers for stats and socket can be adjusted as desired. Moreover, in the [app:main] section, you must set:
You will also need to have uWSGI installed. There are a variety of ways to do this. It can be installed system-wide by installing from your system's package manager (on Debian and Ubuntu systems, the uwsgi and uwsgi-plugin-python provide the necessary components), or with the easy_install or pip commands (which will install it to the system's Python site-packages directory). Alternatively, if you are already running Galaxy from a Python virtualenv, you can use pip install uwsgi with that virtualenv's copy of pip to install to that virtualenv as your unprivileged Galaxy user.
Also, make sure you have installed PasteDeploy, you can follow the same ways from above.
The web processes can then be started under uWSGI using:
The --daemonize option can be used to start in the background. uWSGI has an astounding number of options, see its documentation for help.
Once started, a proxy server (typically Apache or nginx) must be configured to proxy requests to uWSGI (using uWSGI's native protocol). Configuration details for these can be found below.
In galaxy.ini, define one or more additional [server:...] sections:
Using web processes as handlers is possible, but it is not recommended since handler operations can impact web UI performance.
Remaining configuration options
If you do not have a job_conf.xml file, you will need to create one. There are samples for a basic configuration and an advanced configuration provided in the distribution. Please note that creating job_conf.xml overrides any legacy job running settings in galaxy.ini. See Admin/Config/Jobs for more detail on job configuration.
In job_conf.xml, create <handler> tags with id attributes that match the handler server names you defined in galaxy.ini. For example, using the configuration above, the <handlers> section of job_conf.xml would look like:
Any tool not set to an explicit job destination will then be serviced by one of the handlers with the handlers tag. It is possible to dedicate handlers to specific destinations or tools. For details on how to do this, please see the job configuration documentation.
Starting and Stopping
Since you need to run multiple processes, the typical run.sh method for starting and stopping Galaxy won't work. The current recommended way to manage these multiple processes is with Supervisord. You can use a supervisord config file like the following or be inspired by this example. Be sure to supervisord restart or supervisord reread && supervisord update whenever you make configuration changes.
1 [program:galaxy_uwsgi] 2 command = /usr/bin/uwsgi --plugin python --ini-paste /path/to/galaxy/config/galaxy.ini 3 directory = /path/to/galaxy 4 umask = 022 5 autostart = true 6 autorestart = true 7 startsecs = 10 8 user = gxprod 9 environment = PATH=/path/to/galaxy/venv:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin,PYTHON_EGG_CACHE=/path/to/galaxy/.python-eggs,PYTHONPATH=/path/to/galaxy/eggs/PasteDeploy-1.5.0-py2.7.egg 10 numprocs = 1 11 stopsignal = INT
This configuration defines a "program" named "galaxy_uwsgi" which represents our galaxy uWSGI frontend. You'll notice that we've set a command, a directory, a umask, all of which you should be familiar with. Additionally we've specified that the process should autostart on boot, and autorestart if it ever crashes. We specify startsecs to say "the process must stay up for this long before we consider it OK. If the process crashes sooner than that (e.g. bad changes you've made to your local installation) supervisord will try again a couple of times to restart the process before giving up and marking it as failed. This is one of the many ways supervisord is much friendly for managing these sorts of tasks.
Next, we set up our job handlers:
1 [program:handler] 2 command = /path/to/galaxy/venv/bin/python ./scripts/paster.py serve config/galaxy.ini --server-name=handler%(process_num)s --pid-file=/path/to/galaxy/handler%(process_num)s.pid --log-file=/path/to/galaxy/handler%(process_num)s.log 3 directory = /path/to/galaxy 4 process_name = handler%(process_num)s 5 numprocs = 2 6 umask = 022 7 autostart = true 8 autorestart = true 9 startsecs = 15 10 user = gxprod 11 environment = PYTHON_EGG_CACHE=/path/to/galaxy/.python-eggs,SGE_ROOT=/var/lib/gridengine
Nearly all of this is the same as above, however, you'll notice that we use $(process_num)s. That's a variable substitution in the command and process_name fields. We've set numproces=2 which says to launch two handler processes. Supervisord will launch loop over 0..numprocs and launch a handler0 and handler1 process automatically for us, templating out the command string so each handler receives a different log file and name.
Lastly, we collect the two tasks above into a single group:
This will let us manage these tasks more globally with the supervisorctl command line tool:
This command shows us the status of our jobs, and we can easily restart all of the processes at once by naming the group. Familiar commands like start and stop are also available.
If using only one web process, you can proxy as per the normal instructions for a Admin/Config/Performance/ProductionServer. Otherwise, you'll need to set up load balancing.
If you have specified a separate job runner and you want to use the "Manage jobs" interface as administrator you also have to define a proxy for the job runner as shown below.
Be sure to consult the Apache proxy documentation for additional features such as proxying static content and accelerated downloads.
Standalone Paste-based processes
To balance on Apache, you'll need to enable mod_proxy_balancer in addition to mod_proxy, which is available in Apache 2.2 (but not older versions such as 1.3 or 2.0). Add the following to your Apache configuration to set up balancing for the two example web servers defined above:
And replace the following line from the regular proxy configuration:
1 RewriteRule ^(.*) http://localhost:8080$1 [P]
1 RewriteRule ^(.*) balancer://galaxy$1 [P]
mod_uwsgi is available in apache2.4 and later. This means you must be on Ubuntu 14.04 or later. There are ways to do this on older systems, which is outside the scope of this documentation. You'll need to enable mod_uwsgi, and then add the following to your Apache configuration:
Be sure to consult the nginx proxy documentation for additional features such as proxying static content and accelerated downloads.
Standalone Paste-based processes
To proxy with nginx, you'll simply need to add all of the web applications to the upstream section, which already exists. The relevant parts of the configuration would look like this:
uWSGI support is built in to nginx, so no extra modules or recompiling should be required. To proxy to Galaxy, use the following configuration:
uwsgi_read_timeout can be adjusted as appropriate for your site. This is the amount of time connections will block between while nginx waits for a response from uWSGI and is useful for holding client (browser) connections while uWSGI is restarting Galaxy subprocesses.
Notes on legacy configurations
Previously it was necessary to create two separate Galaxy config files to use multiple processes. This is no longer necessary, and if you have multiple config files in your existing installation, it is suggested that you merge them in to a single file.
Galaxy previously used a single "job manager" process to assign jobs to handlers. This is no longer necessary as handlers are selected by the web processes at the time of job creation.
The track_jobs_in_database option in galaxy.ini can still be set but should be unnecessary. If there are more than one [server:...] sections in the file, database job tracking will be enabled automatically.