Locked History Actions

Diff for "Events/GCC2014/TrainingDay/AdminWalkthrough"

Differences between revisions 41 and 42
Revision 41 as of 2014-06-30 11:11:36
Size: 38113
Editor: JohnChilton
Comment: Added note about how to find the terminal.
Revision 42 as of 2014-07-20 20:57:47
Size: 38358
Editor: DaveClements
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
<<Include(Events/GCC2014/Header)>>
<<BR>><<BR>>
Line 2: Line 5:

<<Include(Events/GCC2014/LinkBox)>>

<<div(right)>> [[Events/GCC2014/TrainingDay|{{attachment:Images/Logos/GCC2014TrainingDayLogoSquare.png|GCC2014 Training Day|width="100"}}]]<<div>>
Line 14: Line 21:
You will most likely want to copy and paste between this wiki page and the VM. To allow this, after launching the VM, click the '''Devices''' menu, then '''Shared Clipboard''' -> '''Bidirectional'''. The key combination '''Shift+Ctrl+V''' in the VM will paste the contents of the clipboard. You will most likely want to copy and paste between this wiki page and the VM. To allow this, after launching the VM, click the '''Devices''' menu, then '''Shared Clipboard''' &rarr; '''Bidirectional'''. The key combination '''Shift+Ctrl+V''' in the VM will paste the contents of the clipboard.



Tutorial: Galaxy Installation and Administration

GCC2014 Training Day

Nate Coraor and John Chilton


https://dev.twitter.com/sites/default/files/images_documentation/bird_blue_16.png #usegalaxy

This this follow-along page assumes you have already installed and started a Training Day VM - please do this before arriving.

This tool assumes you are using Firefox bundled with the VM. Galaxy is compatible with all modern versions of Firefox, Chrome, Safari, and Internet Explorer. However, for the purposes of this tutorial it will help us when assisting people encountering problems if they are only using Firefox.

You will most likely want to copy and paste between this wiki page and the VM. To allow this, after launching the VM, click the Devices menu, then Shared ClipboardBidirectional. The key combination Shift+Ctrl+V in the VM will paste the contents of the clipboard.

If your host system has enough memory, we suggest increasing the memory allocated to the VM to 2 GB.

Setting up a Local Galaxy Tutorial (Part I)

Clone (download) Galaxy

The Galaxy distribution is found at https://bitbucket.org/galaxy/galaxy-dist/, but for the purposes of this tutorial, we'll use a local copy already on the VM to save time.

   1 galaxy@gcc2014:~$ hg clone /home/galaxy/galaxy-central galaxy-dist
   2 requesting all changes
   3 adding changesets
   4 adding manifests
   5 adding file changes
   6 added 8 changesets with 3443 changes to 3443 files
   7 updating to branch default
   8 3443 files updated, 0 files merged, 0 files removed, 0 files unresolved
   9 galaxy@gcc2014:~$ 

(One can launch a terminal in the VM by clicking the little mouse icon in the upper left corner and then clicking the "Terminal Emulator" icon in the pop-up menu.)

References

Update to the stable branch

   1 galaxy@gcc2014:~$ cd galaxy-dist
   2 galaxy@gcc2014:~/galaxy-dist$ hg update stable
   3 264 files updated, 0 files merged, 144 files removed, 0 files unresolved
   4 galaxy@gcc2014:~/galaxy-dist$

Start Galaxy

   1 galaxy@gcc2014:~/galaxy-dist$ cp -r ../galaxy-central/eggs eggs  # cache dependencies to speed up deploy (optional)
   2 galaxy@gcc2014:~/galaxy-dist$ sh run.sh
   3 Initializing datatypes_conf.xml from datatypes_conf.xml.sample
   4   ...
   5 Some eggs are out of date, attempting to fetch...
   6 Fetched http://eggs.galaxyproject.org/amqp/amqp-1.4.3-py2.7.egg
   7   ...
   8 Fetch successful.
   9 python path is: /home/ubuntu/galaxy-dist/eggs/mercurial-2...
  10   ...
  11 Starting server in PID 3091.
  12 serving on http://127.0.0.1:8080
  13 

Installing tools from the Tool Shed

Topics for this section

  • What is a Galaxy tool?
    • Basic: An XML description of the tool's interface, allowing Galaxy to render a UI (tool form), as well as the command line to execute.
    • Complex: XML + wrapper scripts and interfaces to external dependencies
  • Exploring preinstalled tools and filesystem layout
    • galaxy-dist/tool_conf.xml

    • galaxy-dist/tools/

    • Example simple tool: galaxy-dist/tools/filters/sorter.xml (tool xml), galaxy-dist/tools/filters/sorter.py (tool code)

    • Example tool with unmet dependencies: galaxy-dist/tools/sr_mapping/mosaik.xml

  • The Tool Shed
    • Installing tools from the Tool Shed
    • Exploring the filesystem layout of tools installed from the Tool Shed

Galaxy configuration for Tool Shed installs

Stop Galaxy by hitting CTRL-c:

   1 serving on http://127.0.0.1:8080
   2 ^Cgalaxy.jobs.handler INFO 2014-04-23 13:27:59,203 sending stop signal to worker thread
   3 galaxy.jobs.handler INFO 2014-04-23 13:27:59,204 job handler queue stopped
   4 galaxy.jobs.runners INFO 2014-04-23 13:27:59,204 LWRRunner: Sending stop signal to monitor thread
   5 galaxy.jobs.runners INFO 2014-04-23 13:27:59,204 LWRRunner: Sending stop signal to 3 worker threads
   6 galaxy.jobs.runners INFO 2014-04-23 13:27:59,204 LocalRunner: Sending stop signal to 5 worker threads
   7 galaxy.jobs.handler INFO 2014-04-23 13:27:59,205 sending stop signal to worker thread
   8 galaxy.jobs.handler INFO 2014-04-23 13:27:59,205 job handler stop queue stopped
   9 galaxy@gcc2014:~/galaxy-dist$

There may be some additional interruption exceptions reported - these are not important.

Edit the primary Galaxy configuration file, universe_wsgi.ini. If you are not familiar with vi, consider using nano instead.

   1 galaxy@gcc2014:~/galaxy-dist$ vi universe_wsgi.ini

We need to set two options. The file is large and it's easiest to search for these (/<pattern> in vi, CTRL-w <pattern> in nano):

   1 [app:main]
   2 admin_users = nate@bx.psu.edu
   3 tool_dependency_dir = /home/galaxy/tool_deps

Then save and quit (CTRL-x y ENTER in nano). Start Galaxy again:

   1 galaxy@gcc2014:~/galaxy-dist$ sh run.sh 
   2 ...
   3 Starting server in PID 3298.
   4 serving on 127.0.0.1:8080 view at http://127.0.0.1:8080
   5 

Install a tool from the Tool Shed

  • Register an account that matches the address you set in admin_users

  • Follow the tutorial on installing tools from the Tool Shed. In brief:

    • Click Admin from the masthead

    • Click Search and browse tool sheds from the left panel

    • Click the popup icon for Galaxy main tool shed and select Search for valid tools

    • Search for Tool names that contain the name bwa

    • Click the popup icon for bwa_wrappers owned by devteam and select Install to Galaxy

    • Select the NGS: Mapping section and click Install to Galaxy

  • Tools (XML, wrapper scripts) are installed in /home/galaxy/shed_tools

  • Tool dependencies (binaries) are installed in /home/galaxy/tool_deps

References

Managing local data

Topics for this section

  • What is local data?
  • Manual data inclusion
    • Building indexes on the command line
    • .loc files

    • tool_data_table_conf.xml

    • Example of the above with bwa and S. cerevisiae (sacCer2) transcript

  • Galaxy Data Managers

Adding local data with a Galaxy Data Manager

Install data managers for fetching the reference genome and building BWA indexes:

  • Click Admin from the masthead

  • Click Search and browse tool sheds from the left panel

  • Click the popup icon for Galaxy main tool shed and select Search for valid tools

  • Search for Tool ids that contain the name manager

  • Check data_manager_fetch_genome_all_fasta and data_manager_bwa_index_builder, then click Install to Galaxy (at the bottom of the page)

  • Click Install

Fetch reference genome (fasta):

  • Click Manage local data (beta) from the left panel

  • Click Reference Genome - fetching

  • In the DBKEY to assign to data: field, type sacCer2

  • In the UCSC's DBKEY for source FASTA: field, type sacCer2

  • Click Execute

Build BWA indexes:

  • Click Manage local data (beta) from the left panel

  • Click BWA index - builder

  • Click Execute

Upon completion:

  • galaxy-dist/tool-data/sacCer2/seq/sacCer2.fa has been created from chromosome fastas fetched from UCSC

  • galaxy-dist/tool-data/sacCer2/bwa_index/sacCer2.fa.* indexes have been generated with bwa index

  • galaxy-dist/tool-data/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/2ebc856bce29/all_fasta.loc contains a reference to the newly created sacCer2.fa

  • galaxy-dist/tool-data/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bwa_index_builder/367878cb3698/bwa_index.loc contains a reference to the newly built bwa indexes

  • galaxy-dist/shed_tool_data_table_conf.xml includes entries for the all_fasta and bwa_indexes data tables

References

Setting up a Local Galaxy Tutorial (Part II)

Documentation for the features used in this section can be found at usegalaxy.org/production (forwards to Admin/Config/Performance/ProductionServer)

Install and configure PostgreSQL

galaxy@gcc2014:~/galaxy-dist$ sudo apt-get install postgresql
[sudo] password for galaxy: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
postgresql is already the newest version.
...
galaxy@gcc2014:~/galaxy-dist$ sudo -u postgres createuser gxprod
galaxy@gcc2014:~/galaxy-dist$ sudo -u postgres createdb -O gxprod gxprod

Create a new user for Galaxy

   1 galaxy@gcc2014:~/galaxy-dist$ sudo useradd -m -s /bin/bash gxprod
   2 galaxy@gcc2014:~/galaxy-dist$ 

Create a Python virtual environment

   1 galaxy@gcc2014:~/galaxy-dist$ sudo apt-get install python-virtualenv
   2 Reading package lists... Done
   3 Building dependency tree
   4 Reading state information... Done
   5 The following extra packages will be installed:
   6   ...
   7 Do you want to continue? [Y/n]
   8   ...
   9 Get:1 http://us.archive.ubuntu.com/ubuntu/ trusty/main libasan0 amd64 4.8.2-19ubuntu1 [63.0 kB]
  10   ...
  11 Processing triggers for libc-bin (2.19-0ubuntu6) ...
  12 galaxy@gcc2014:~/galaxy-dist$ sudo -iu gxprod
  13 gxprod@gcc2014:~$ virtualenv venv
  14 New python executable in venv/bin/python
  15 Installing setuptools, pip...done.
  16 gxprod@gcc2014:~$ source ./venv/bin/activate
  17 (venv)gxprod@gcc2014:~$ 

Clone (download) Galaxy

   1 gxprod@gcc2014:~$ hg clone ~galaxy/galaxy-central galaxy-dist
   2 not trusting file /home/galaxy/galaxy-central/.hg/hgrc from untrusted user galaxy, group galaxy
   3 requesting all changes
   4 adding changesets
   5 adding manifests
   6 adding file changes
   7 added 13944 changesets with 47333 changes to 8458 files
   8 updating to branch default
   9 3658 files updated, 0 files merged, 0 files removed, 0 files unresolved
  10 gxprod@gcc2014:~$ cd galaxy-dist
  11 gxprod@gcc2014:~/galaxy-dist$ hg update stable
  12 264 files updated, 0 files merged, 144 files removed, 0 files unresolved
  13 gxprod@gcc2014:~/galaxy-dist$ cp -r ~galaxy/galaxy-central/eggs .
  14 0 files updated, 0 files merged, 0 files removed, 0 files unresolved
  15 gxprod@gcc2014:~/galaxy-dist$ 

Configure Galaxy

   1 gxprod@gcc2014:~/galaxy-dist$ cp universe_wsgi.ini.sample universe_wsgi.ini
   2 gxprod@gcc2014:~/galaxy-dist$ vi universe_wsgi.ini

Add the following section for uWSGI's configuration and the job handler processes

   1 [uwsgi]
   2 socket = 127.0.0.1:4001
   3 stats = 127.0.0.1:9191
   4 processes = 2
   5 threads = 4
   6 master = True
   7 logto = /home/gxprod/uwsgi.log
   8 pythonpath = lib
   9 
  10 [server:handler0]
  11 use = egg:Paste#http
  12 port = 9010
  13 use_threadpool = True
  14 threadpool_workers = 10
  15 
  16 [server:handler1]
  17 use = egg:Paste#http
  18 port = 9011
  19 use_threadpool = True
  20 threadpool_workers = 10

Set the following settings in the [app:main] section at the bottom of the file:

   1 database_connection = postgresql:///gxprod?host=/var/run/postgresql
   2 database_engine_option_server_side_cursors = True
   3 database_engine_option_strategy = threadlocal
   4 tool_dependency_dir = /home/gxprod/tool_deps
   5 static_enabled = False
   6 nginx_x_accel_redirect_base = /_x_accel_redirect
   7 nginx_upload_store = /home/gxprod/uploads
   8 nginx_upload_path = /_upload
   9 log_events = False
  10 log_actions = False
  11 debug = False
  12 use_interactive = False
  13 id_secret = <random text>
  14 admin_users = nate@bx.psu.edu
  15 library_import_dir = /home/gxprod/library
  16 allow_library_path_paste = True

Explanations of these options:

  • database_connection = postgresql:///gxprod?host=/var/run/postgresql - Use a PostgreSQL database via a local UNIX domain socket (the socket is in /var/run/postgresql). documentation

  • database_engine_option_server_side_cursors = True - Keep large SQL query results on the PostgreSQL server, rather the transferring the entire result set to the Galaxy processes.

  • database_engine_option_strategy = threadlocal - Only use one database connection per thread

  • tool_dependency_dir = /home/gxprod/tool_deps - The directory that will house tool dependencies

  • static_enabled = False - Static content will be served by the proxy server

  • nginx_x_accel_redirect_base = /_x_accel_redirect - Delegate dataset downloads to nginx

  • nginx_upload_store = /home/gxprod/uploads - Delegate uploads to nginx, set temporary directory

  • nginx_upload_path = /_upload - Special path configured in nginx where uploads will be POSTed

  • log_events = False - Don't log events in the database (faster)

  • log_actions = False - Don't log actions in the database (faster)

  • debug = False - Disables debugging middleware that loads server responses in to memory (can crash the server when handling large files)

  • use_interactive = False - Disables live client browser debugging (insecure).

  • id_secret = <random text> - Ensures that the encoded IDs used by Galaxy (especially session IDs) are unique. One simple way to generate a value for this is with a shell command like $ date | md5sum

  • admin_users = nate@bx.psu.edu - Make nate@example.org an administrator. Galaxy's Admin UI is only accessible if you define administrators here!

  • library_import_dir = /home/gxprod/library - Administrators can directly import datasets from this directory on the server to Data Libraries. This includes an option that allows an effective "symlink" to the data, rather than copying it in to Galaxy's file_path directory. documentation

  • allow_library_path_paste = True - Administrators can import datasets from anywhere on the server's filesystem(s) by entering their paths in to a text box

Honorable mentions for features we won't use today but that are common in big setups:

  • ftp_upload_dir and ftp_upload_site - Allow users to upload data to the server using FTP

  • use_remote_user and remote_user_maildomain - Use your institution's existing authentication system to log in to Galaxy. Apache documentation or nginx documentation

  • allow_user_impersonation - Users configured as administrators (with admin_users) can "become" other users to view Galaxy exactly as the impersonated user does. Useful for providing support.

  • user_library_import_dir - Non-administrators can directly import datasets from this directory on this server to Data Libraries from which they have been given write permission. documentation

  • object_store_config_file - Configure Galaxy's "object storage" layer to store data in multiple filesystems, Amazon S3, iRODS, etc.

  • error_email_to (with smtp_server) - Allow users to send bug reports directly to you

  • user_activation_on and related options - Require new users to verify their email address

  • allow_user_dataset_purge = True - Allow users to forcibly remove their datasets from disk (note that the data is only actually removed if all versions of a shared dataset are purged by all users who are sharing the dataset). By default, Galaxy does not remove data, as this is done at a later time by the dataset cleanup scripts.

  • enable_quotas = True - Enable Galaxy's quota system. Quotas are configured by administrators in the Galaxy Admin UI

Create a job system configuration:

   1 gxprod@gcc2014:~/galaxy-dist$ vi job_conf.xml

In the editor, paste:

   1 <?xml version="1.0"?>
   2 <job_conf>
   3     <plugins workers="2">
   4         <plugin id="gridengine" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
   5             <param id="drmaa_library_path">/usr/lib/gridengine-drmaa/lib/libdrmaa.so</param>
   6         </plugin>
   7         <!--
   8         <plugin id="torque" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
   9             <param id="drmaa_library_path">/usr/lib/pbs-drmaa/lib/libdrmaa.so</param>
  10         </plugin>
  11         <plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner">
  12             <param id="drmaa_library_path">/usr/lib/slurm-drmaa/lib/libdrmaa.so</param>
  13         </plugin>
  14         -->
  15     </plugins>
  16     <handlers default="handlers">
  17         <handler id="handler0" tags="handlers"/>
  18         <handler id="handler1" tags="handlers"/>
  19     </handlers>
  20     <destinations default="cluster">
  21         <destination id="cluster" runner="gridengine"/>
  22         <!--
  23         <destination id="cluster" runner="torque"/>
  24         <destination id="cluster" runner="slurm"/>
  25         -->
  26     </destinations>
  27     <limits>
  28         <limit type="registered_user_concurrent_jobs">2</limit>
  29         <limit type="unregistered_user_concurrent_jobs">1</limit>
  30         <limit type="job_walltime">24:00:00</limit>
  31     </limits>
  32 </job_conf>

This VM comes preconfigured with Torque, Grid Engine, and SLURM - Galaxy can submit jobs to any of these as well as LSF, Condor, etc... but the above configuration will just target the VM's default - Grid Engine.

Start a Galaxy server to complete first run setup

   1 gxprod@gcc2014:~/galaxy-dist$ sh run.sh
   2 Initializing datatypes_conf.xml from datatypes_conf.xml.sample
   3   ...
   4 serving on http://127.0.0.1:8080
   5 ^C
   6   ...
   7 gxprod@gcc2014:~/galaxy-dist$ logout
   8 galaxy@gcc2014:~$ 

Install Ansible and nginx

We would like to add a 3rd party module not available in Ubuntu's nginx packages (nginx_upload_module), which means recompiling nginx. However, we have already created an Ansible Play for compiling and packaging nginx. Ansible is an automation, configuration management and application deployment tool rapidly growing in popularity (it is similar to older projects such as Puppet and Chef).

Ansible

Create a new virtualenv and install Ansible:

   1 galaxy@gcc2014:~$ virtualenv ansible
   2 New python executable in ansible/bin/python
   3 Installing setuptools, pip...done.
   4 galaxy@gcc2014:~$ . ansible/bin/activate
   5 (ansible)galaxy@gcc2014:~$ pip install ansible
   6 Downloading/unpacking ansible
   7   Downloading ansible-1.6.5.tar.gz (651kB): 651kB downloaded
   8   Running setup.py (path:/home/galaxy/ansible/build/ansible/setup.py) egg_info for package ansible
   9   ...
  10 Successfully installed ansible paramiko jinja2 PyYAML pycrypto ecdsa markupsafe
  11 Cleaning up...
  12 (ansible)galaxy@gcc2014:~$ 

Fetch the nginx build playbook:

   1 (ansible)galaxy@gcc2014:~$ wget https://bitbucket.org/natefoo/galaxy-ansible/raw/e21adb25a3d585ab476e83f599028d73bf8d408c/build/nginx.yml
   2 --2014-06-28 12:07:26--  https://bitbucket.org/natefoo/galaxy-ansible/raw/e21adb25a3d585ab476e83f599028d73bf8d408c/build/nginx.yml
   3 Resolving bitbucket.org (bitbucket.org)... 131.103.20.168, 131.103.20.167
   4 Connecting to bitbucket.org (bitbucket.org)|131.103.20.168|:443... connected.
   5 HTTP request sent, awaiting response... 200 OK
   6 Length: 6521 (6.4K) [text/plain]
   7 Saving to: ‘nginx.yml’
   8 
   9 100%[=========================================>] 6,521       --.-K/s   in 0s      
  10 
  11 2014-06-28 12:07:26 (974 MB/s) - ‘nginx.yml’ saved [6521/6521]
  12 
  13 (ansible)galaxy@gcc2014:~$ 

Build nginx

Run the playbook to generate an nginx package:

   1 (ansible)galaxy@gcc2014:~$ ansible-playbook -i localhost, nginx.yml --ask-sudo-pass --extra-vars work_dir=/home/galaxy/nginx-build
   2 sudo password: 
   3 
   4 PLAY [localhost] ************************************************************** 
   5 
   6 GATHERING FACTS *************************************************************** 
   7 ok: [localhost]
   8 
   9   ...
  10 
  11 TASK: [Create deb] ************************************************************ 
  12 changed: [localhost]
  13 
  14 PLAY RECAP ******************************************************************** 
  15 localhost                  : ok=12   changed=3    unreachable=0    failed=0   
  16 
  17 (ansible)galaxy@gcc2014:~$ deactivate
  18 galaxy@gcc2014:~$ 

Install nginx

Uninstall conflicting nginx packages:

   1 galaxy@gcc2014:~$ sudo apt-get remove nginx nginx-core nginx-common
   2 Reading package lists... Done
   3 Building dependency tree       
   4 Reading state information... Done
   5 The following packages were automatically installed and are no longer required:
   6   libcurses-perl libslurmdb-perl libslurmdb26
   7 Use 'apt-get autoremove' to remove them.
   8 The following packages will be REMOVED:
   9   nginx nginx-common nginx-core
  10 0 upgraded, 0 newly installed, 3 to remove and 12 not upgraded.
  11 After this operation, 1,295 kB disk space will be freed.
  12 Do you want to continue? [Y/n] 
  13 (Reading database ... 244985 files and directories currently installed.)
  14 Removing nginx (1.4.6-1ubuntu3) ...
  15 Removing nginx-core (1.4.6-1ubuntu3) ...
  16 Removing nginx-common (1.4.6-1ubuntu3) ...
  17 Processing triggers for man-db (2.6.7.1-1) ...
  18 galaxy@gcc2014:~$ 

Install the new nginx package:

   1 galaxy@gcc2014:~$ sudo dpkg -i /home/galaxy/nginx-build/nginx-galaxy_1.4.7-gxydev+trusty_amd64.deb 
   2 Selecting previously unselected package nginx-galaxy.
   3 (Reading database ... 244964 files and directories currently installed.)
   4 Preparing to unpack .../nginx-galaxy_1.4.7-gxydev+trusty_amd64.deb ...
   5 Unpacking nginx-galaxy (1.4.7-gxydev+trusty) ...
   6 Setting up nginx-galaxy (1.4.7-gxydev+trusty) ...
   7 galaxy@gcc2014:~$ sudo mkdir /var/opt/nginx
   8 galaxy@gcc2014:~$ 

Install uWSGI and supervisord

   1 galaxy@gcc2014:~/galaxy-dist$ sudo apt-get install uwsgi uwsgi-plugin-python supervisor
   2   ...
   3 After this operation, 4,497 kB of additional disk space will be used.
   4 Do you want to continue? [Y/n]
   5   ...
   6 galaxy@gcc2014:~/galaxy-dist$ sudo update-rc.d -f uwsgi remove
   7  Removing any system startup links for /etc/init.d/uwsgi ...
   8    /etc/rc0.d/K20uwsgi
   9    /etc/rc1.d/K20uwsgi
  10    /etc/rc2.d/S20uwsgi
  11    /etc/rc3.d/S20uwsgi
  12    /etc/rc4.d/S20uwsgi
  13    /etc/rc5.d/S20uwsgi
  14    /etc/rc6.d/K20uwsgi
  15 galaxy@gcc2014:~/galaxy-dist$ 

Configure nginx

   1 galaxy@gcc2014:/opt/nginx/conf$ sudo vi /opt/nginx/conf/nginx.conf 

Replace the entire contents of the file with:

   1 user  gxprod;
   2 worker_processes  1;
   3 daemon off;
   4 
   5 
   6 events {
   7     worker_connections  1024;
   8 }
   9 
  10 
  11 http {
  12     include       mime.types;
  13     default_type  application/octet-stream;
  14 
  15     sendfile        on;
  16 
  17     keepalive_timeout  65;
  18 
  19     gzip  on;
  20     gzip_vary on;
  21     gzip_proxied any;
  22     gzip_comp_level 6;
  23     gzip_buffers 16 8k;
  24     gzip_http_version 1.1;
  25     gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
  26 
  27     client_max_body_size 50g;
  28     uwsgi_read_timeout 300;
  29 
  30     server {
  31         listen 80 default_server;
  32         server_name  localhost;
  33 
  34         # pass to uWSGI by default
  35         location / {
  36             uwsgi_pass 127.0.0.1:4001;
  37             include uwsgi_params;
  38         }
  39 
  40         # serve static content
  41         location /static {
  42             alias /home/gxprod/galaxy-dist/static;
  43             gzip on;
  44             gzip_types text/plain text/xml text/javascript text/css application/x-javascript;
  45             expires 24h;
  46         }
  47         location /static/style {
  48             alias /home/gxprod/galaxy-dist/static/style/blue;
  49             gzip on;
  50             gzip_types text/plain text/xml text/javascript text/css application/x-javascript;
  51             expires 24h;
  52         }
  53         location /static/scripts {
  54             alias /home/gxprod/galaxy-dist/static/scripts/packed;
  55             gzip on;
  56             gzip_types text/plain text/javascript application/x-javascript;
  57             expires 24h;
  58         }
  59         location ~ ^/plugins/visualizations/(?<vis_name>.+?)/static/(?<static_file>.*?)$ {
  60             alias /home/gxprod/galaxy-dist/config/plugins/visualizations/$vis_name/static/$static_file;
  61         }
  62 
  63         # delegated downloads
  64         location /_x_accel_redirect {
  65             internal;
  66             alias /;
  67         }
  68 
  69         # delegated uploads
  70         location /_upload {
  71             upload_store /home/gxprod/uploads;
  72             upload_store_access user:rw;
  73             upload_pass_form_field "";
  74             upload_set_form_field "__${upload_field_name}__is_composite" "true";
  75             upload_set_form_field "__${upload_field_name}__keys" "name path";
  76             upload_set_form_field "${upload_field_name}_name" "$upload_file_name";
  77             upload_set_form_field "${upload_field_name}_path" "$upload_tmp_path";
  78             upload_pass_args on;
  79             upload_pass /_upload_done;
  80         }
  81         location /_upload_done {
  82             set $dst /api/tools;
  83             if ($args ~ nginx_redir=([^&]+)) {
  84                 set $dst $1;
  85             }
  86             rewrite "" $dst;
  87         }
  88     }
  89 }

Configure supervisord

   1 galaxy@gcc2014:/opt/nginx/conf$ cd /etc/supervisor/conf.d
   2 galaxy@gcc2014:/etc/supervisor/conf.d$ sudo vi galaxy.conf

In the editor, paste:

   1 [program:nginx]
   2 command         = /opt/nginx/sbin/nginx
   3 directory       = /
   4 umask           = 022
   5 autostart       = true
   6 autorestart     = unexpected
   7 startsecs       = 5
   8 exitcodes       = 0
   9 user            = root
  10 
  11 [program:galaxy_uwsgi]
  12 command         = /usr/bin/uwsgi --plugin python --ini-paste /home/gxprod/galaxy-dist/universe_wsgi.ini
  13 directory       = /home/gxprod/galaxy-dist
  14 umask           = 022
  15 autostart       = true
  16 autorestart     = true
  17 startsecs       = 10
  18 user            = gxprod
  19 environment     = PATH=/home/gxprod/venv:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin,PYTHON_EGG_CACHE=/home/gxprod/.python-eggs,PYTHONPATH=/home/gxprod/galaxy-dist/eggs/PasteDeploy-1.5.0-py2.7.egg
  20 numprocs        = 1
  21 stopsignal      = INT
  22 
  23 [program:handler]
  24 command         = /home/gxprod/venv/bin/python ./scripts/paster.py serve universe_wsgi.ini --server-name=handler%(process_num)s --pid-file=/home/gxprod/handler%(process_num)s.pid --log-file=/home/gxprod/handler%(process_num)s.log
  25 directory       = /home/gxprod/galaxy-dist
  26 process_name    = handler%(process_num)s
  27 numprocs        = 2
  28 umask           = 022
  29 autostart       = true
  30 autorestart     = true
  31 startsecs       = 15
  32 user            = gxprod
  33 environment     = PYTHON_EGG_CACHE=/home/gxprod/.python-eggs,SGE_ROOT=/var/lib/gridengine
  34 
  35 [group:galaxy]
  36 programs = handler

Start Galaxy and nginx

   1 galaxy@gcc2014:/etc/supervisor/conf.d$ sudo supervisorctl update
   2 galaxy_uwsgi: added process group
   3 handler: added process group
   4 nginx: added process group
   5 galaxy@gcc2014:/etc/supervisor/conf.d$ sudo supervisorctl status
   6 galaxy:handler0                  STARTING
   7 galaxy:handler1                  STARTING
   8 galaxy_uwsgi                     STARTING
   9 nginx                            STARTING
  10 

You can now visit your Galaxy server at - http://localhost/.

References

Load external data in a data library

First, place some data in the import directory:

   1 galaxy@gcc2014:~$ sudo -iu gxprod
   2 gxprod@gcc2014:~$ mkdir library/run1
   3 gxprod@gcc2014:~/library/run1$ cp galaxy-dist/test-data/*.fastqsanger library/run1

Next, import the data in the Galaxy UI:

  • Register an account that matches the address you set in admin_users

  • Click Admin from the masthead

  • Click Manage data libraries from the left panel

  • Click Create new data library

  • Enter a Name and Description (and optionally a Synopsis) and click Create

  • Click Add datasets:

  • On the upload form:
    • From the Upload option: menu, select Upload directory of files

    • In the File Format: field, enter fastqsanger

    • From Server Directory, select run1

    • From the Copy data into Galaxy? menu, select Link to files without copying into Galaxy

    • Click Upload to library

  • Click the Shared Data menu in the masthead and select Data Libraries

  • Click on the library you created
  • Select some datasets, then click Go to import them to your history

Galaxy will use the data in its original location, and does not make additional copies for each user who imports the data. Access controls can be set on libraries and their contacts to limit them to certain users or groups.

Extra Activities (Time and Interest Permitting)

Using Slurm or Torque/PBS

The job configuration in job_conf.xml we used above included commented-out configurations for using Torque and Slurm. To use one of these other resource managers:

   1 galaxy@gcc2014:~$ sudo -u gxprod vi ~gxprod/galaxy-dist/job_conf.xml
   2   ... modify XML comments ...
   3 galaxy@gcc2014:~$ sudo supervisorctl restart galaxy:*

Shipping Jobs to Remote Resources with Pulsar

Pulsar (formerly called the LWR) allows Galaxy jobs to be staged to remote clusters without shared disk.

We only have one VM for this workshop - so we are going to simulate this by shipping gxprod jobs in the Galaxy instance we just configured back to a Pulsar server running as the galaxy user.

Installing Pulsar with Ansible

   1 galaxy@gcc2014:~$ . ansible/bin/activate
   2 (ansible)galaxy@gcc2014:~$ hg clone https://bitbucket.org/natefoo/galaxy-ansible
   3 (ansible)galaxy@gcc2014:~$ cd galaxy-ansible
   4 (ansible)galaxy@gcc2014:~/galaxy-ansible$ ansible-playbook -i localhost, local.yml -e "pulsar_server_dir=/home/galaxy/pulsar pulsar_configure_galaxy=False galaxy_server_dir=/home/galaxy/galaxy-dist pulsar_private_token=puls0rt0ken" --tags pulsar
   5 (ansible)galaxy@gcc2014:~/galaxy-ansible$ deactivate
   6 galaxy@gcc2014:~/galaxy-ansible$ cd ~/pulsar
   7 galaxy@gcc2014:~/pulsar$ ./run.sh -m paster --daemon
   8 Entering daemon mode
   9 galaxy@gcc2014:~/pulsar$ 

Next up we will configure Galaxy to submit some jobs to this Pulsar server. Pulsar isn't available in Galaxy's stable branch yet however so we are going to checkout the default branch. (Older versions of Galaxy can target an LWR server for similar functionality - but we would encourage everyone to use Pulsar going forward.)

   1 galaxy@gcc2014:~$ sudo -iu gxprod
   2 gxprod@gcc2014:~$ cd galaxy-dist/
   3 gxprod@gcc2014:~/galaxy-dist$ hg checkout default
   4 350 files updated, 0 files merged, 58 files removed, 0 files unresolved
   5 gxprod@gcc2014:~/galaxy-dist$ vi job_conf.xml

Copy the following contents - it routes one tool's jobs to the Pulsar server setup above.

   1 <?xml version="1.0"?>
   2 <job_conf>
   3     <plugins workers="2">
   4         <plugin id="gridengine" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>
   5         <plugin id="pulsar_rest" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner" />
   6     </plugins>
   7     <handlers default="handlers">
   8         <handler id="handler0" tags="handlers"/>
   9         <handler id="handler1" tags="handlers"/>
  10     </handlers>
  11     <destinations default="gridengine">
  12         <destination id="gridengine" runner="gridengine"/>
  13         <destination id="local_pulsar" runner="pulsar_rest">
  14           <param id="url">http://localhost:8913/</param>
  15           <param id="private_token">puls0rt0ken</param>
  16         </destination>
  17     </destinations>
  18     <limits>
  19         <limit type="registered_user_concurrent_jobs">2</limit>
  20         <limit type="unregistered_user_concurrent_jobs">1</limit>
  21         <limit type="job_walltime">24:00:00</limit>
  22     </limits>
  23     <tools>
  24       <tool id="cat1" destination="local_pulsar" />
  25     </tools>
  26 </job_conf>

Now as the galaxy user restart the job handlers

   1 gxprod@gcc2014:~/galaxy-dist$ exit
   2 galaxy@gcc2014:~$ sudo supervisorctl restart galaxy:*
   3 [sudo] password for galaxy: 
   4 handler0: stopped
   5 handler1: stopped
   6 handler0: started
   7 handler1: started
   8 

Open Galaxy, upload a file, and use the "Concatenate Datasets" tool on it to test Pulsar.

Additional topic candidates

  • Dynamic job resource assignment

Transcripts

Adding local data by hand

   1 galaxy@gcc2014:~$ mkdir -p local_data/sacCer2/seq/work
   2 galaxy@gcc2014:~$ cd local_data/sacCer2/seq/work
   3 galaxy@gcc2014:~/local_data/sacCer2/seq/work$ wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/bigZips/chromFa.tar.gz
   4 --2014-06-29 14:09:25--  http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/bigZips/chromFa.tar.gz
   5 Resolving hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)... 128.114.119.163
   6 Connecting to hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)|128.114.119.163|:80... connected.
   7 HTTP request sent, awaiting response... 200 OK
   8 Length: 3823174 (3.6M) [application/x-gzip]
   9 Saving to: ‘chromFa.tar.gz’
  10 
  11 100%[==================================================>] 3,823,174   1.01MB/s   in 4.3s   
  12 
  13 2014-06-29 14:09:29 (859 KB/s) - ‘chromFa.tar.gz’ saved [3823174/3823174]
  14 
  15 galaxy@gcc2014:~/local_data/sacCer2/seq/work$ tar zxvf chromFa.tar.gz
  16 2micron.fa
  17 chrI.fa
  18 chrII.fa
  19 chrIII.fa
  20 chrIV.fa
  21 chrIX.fa
  22 chrM.fa
  23 chrV.fa
  24 chrVI.fa
  25 chrVII.fa
  26 chrVIII.fa
  27 chrX.fa
  28 chrXI.fa
  29 chrXII.fa
  30 chrXIII.fa
  31 chrXIV.fa
  32 chrXV.fa
  33 chrXVI.fa
  34 galaxy@gcc2014:~/local_data/sacCer2/seq/work$ cat *.fa >../sacCer2.fa
  35 galaxy@gcc2014:~/local_data/sacCer2/seq/work$ cd ../..
  36 galaxy@gcc2014:~/local_data/sacCer2$ mkdir bwa_index
  37 galaxy@gcc2014:~/local_data/sacCer2$ cd bwa_index
  38 galaxy@gcc2014:~/local_data/sacCer2/bwa_index$ ln -s ../seq/sacCer2.fa 
  39 galaxy@gcc2014:~/local_data/sacCer2/bwa_index$ ~/tool_deps/bwa/0.5.9/devteam/package_bwa_0_5_9/ec2595e4d313/bin/bwa index -a bwtsw sacCer2.fa
  40 [bwa_index] Pack FASTA... 0.18 sec
  41 [bwa_index] Reverse the packed sequence... 0.06 sec
  42 [bwa_index] Construct BWT for the packed sequence...
  43 [BWTIncConstructFromPacked] 10 iterations done. 2968163 characters processed.
  44 [BWTIncConstructFromPacked] 20 iterations done. 5189523 characters processed.
  45 [BWTIncConstructFromPacked] 30 iterations done. 6938035 characters processed.
  46 [BWTIncConstructFromPacked] 40 iterations done. 8313523 characters processed.
  47 [BWTIncConstructFromPacked] 50 iterations done. 9394739 characters processed.
  48 [BWTIncConstructFromPacked] 60 iterations done. 10243827 characters processed.
  49 [BWTIncConstructFromPacked] 70 iterations done. 10909811 characters processed.
  50 [BWTIncConstructFromPacked] 80 iterations done. 11431331 characters processed.
  51 [BWTIncConstructFromPacked] 90 iterations done. 11838867 characters processed.
  52 [BWTIncConstructFromPacked] 100 iterations done. 12156547 characters processed.
  53 [bwt_gen] Finished constructing BWT in 101 iterations.
  54 [bwa_index] 3.30 seconds elapse.
  55 [bwa_index] Construct BWT for the reverse packed sequence...
  56 [BWTIncConstructFromPacked] 10 iterations done. 2968163 characters processed.
  57 [BWTIncConstructFromPacked] 20 iterations done. 5189523 characters processed.
  58 [BWTIncConstructFromPacked] 30 iterations done. 6938035 characters processed.
  59 [BWTIncConstructFromPacked] 40 iterations done. 8313523 characters processed.
  60 [BWTIncConstructFromPacked] 50 iterations done. 9394739 characters processed.
  61 [BWTIncConstructFromPacked] 60 iterations done. 10243827 characters processed.
  62 [BWTIncConstructFromPacked] 70 iterations done. 10909811 characters processed.
  63 [BWTIncConstructFromPacked] 80 iterations done. 11431331 characters processed.
  64 [BWTIncConstructFromPacked] 90 iterations done. 11838867 characters processed.
  65 [BWTIncConstructFromPacked] 100 iterations done. 12156547 characters processed.
  66 [bwt_gen] Finished constructing BWT in 101 iterations.
  67 [bwa_index] 3.41 seconds elapse.
  68 [bwa_index] Update BWT... 0.08 sec
  69 [bwa_index] Update reverse BWT... 0.07 sec
  70 [bwa_index] Construct SA from BWT and Occ... 1.03 sec
  71 [bwa_index] Construct SA from reverse BWT and Occ... 1.02 sec
  72 galaxy@gcc2014:~/local_data/sacCer2/bwa_index$ cd ~/galaxy-dist/tool-data
  73 galaxy@gcc2014:~/galaxy-dist/tool-data$ vi bwa_index.loc
  74   ...
  75   sacCer2byhand   sacCer2 S. cerevisiae June 2008 (index by hand) /home/galaxy/local_data/sacCer2/bwa_index/sacCer2.fa
  76   ...
  77 galaxy@gcc2014:~/galaxy-dist/tool-data$ cd ..
  78 galaxy@gcc2014:~/galaxy-dist$ vi tool_data_table_conf.xml
  79   ...
  80   <table name="bwa_indexes" comment_char="#">
  81       <columns>value, dbkey, name, path</columns>
  82       <file path="tool-data/bwa_index.loc" />
  83   </table>
  84   ...
  85 galaxy@gcc2014:~/galaxy-dist$ sh run.sh
  86   ...
  87