Revision 8 as of 2012-05-07 18:44:46

Clear message
Locked History Actions

CloudMan/CustomizeGalaxyCloud

Customizing your CloudMan deployment

It is possible to use the cloud infrastructure management functionality offered by CloudMan while customizing the default deployment it offers. In the context of Galaxy, this means that it is possible to run a custom version of Galaxy, use your own set of tools, and reference genomes. Note that the process of customizing an instance may require use of the command line tools. In order to modify your cloud deployment, these are the general steps that need to be performed:

  1. Start a CloudMan cluster instance
  2. ssh to the EC2 instance and perform desired customizations
  3. Use CloudMan Admin interface to persist changes to the file system

Detailed steps

Step 1:

Follow directions on this page to start a cloud cluster instance. Wait for the cluster to complete initialization and all of the services are running.

Step 2:

From the command prompt, connect to the newly created cluster using the following command, filling in the appropriate details:

   1 ssh -o StrictHostKeyChecking=no -i <path to your private key> ubuntu@<instance public DNS>

Next, perform the desired changes to the system. The changes supported at this level of instance customization include modifications to the file systems managed by CloudMan. The available file systems are listed on the CloudMan Admin console under entry Persist changes to file system and are mounted on the underlying system under /mnt (e.g., /mnt/galaxyTools and /mnt/galaxyIndices). Modifying contents of these file systems allows you to customize your instance of Galaxy, install or modify tools, as well as modify reference genomes used by Galaxy tools. As you perform the changes, you should respect the ownership of the directories; currently all of these are owned by galaxy user. Note that if you plan on modifying the Galaxy application, stop the process first from the CloudMan Admin console. 

Step 3:

After you have completed all of the desired modifications, quit from the ssh session so that your login cannot interfere with filesystem unmounts/remounts. On the CloudMan Admin console, simply click (under Persist changes to file system) on the name of the file system you wish to preserve and CloudMan will perform the required steps to persist any changes. Note that depending on the amount of changes you performed to the given file system, this process may take a while (Amazon is making a snapshot of the EBS volume and that can take a long time). Once the process completes, you can go back to using the cluster as you normally would; all of the changes will have been preserved after you terminate the cluster. 

If you made any modifications to the Galaxy tool configuration file tool_conf.xml or the universe_wsgi.ini file, you should modify those files in the cluster's bucket so the changes take effect at next cluster instantiation. When uploading those files, append .cloud file extension before saving them to the bucket (i.e., tool_conf.xml.cloud and universe_wsgi.ini.cloud). You can find out what the cluster's bucket name is fromCloudMan's console; it will be something like cm-ca8d43f924f3ba1x5b63dabcdcd524ec.


An Alternative Approach: Manual

CloudMan automates several steps that are required to persist changes to the underlying file systems. However, if you would like (or something goes wrong), these steps can also be performed manually. For the manual steps, perform Steps 1 and 2 from above and then continue with new Steps 3, 4, 5 as follows: 

 3. Create new snapshots of the modified EBS volume(s)
 4. Repair the cluster, then terminate it
 5. Update cluster configuration in S3 to point to the new snapshots

Step 3:

After you have completed all of the desired modifications, stop the Galaxy process with the Stop button for the Galaxy service in the Cloudman Admin Console. As 'ubuntu' user, remove the file system where Galaxy is installed from NFS and unmount it: 

   1 ubuntu@<ip>$ sudo vi /etc/exports
   2 # Comment out the line referencing /mnt/galaxyTools
   3 ubuntu@<ip>$ sudo /etc/init.d/nfs-kernel-server restart
   4 ubuntu@<ip>$ cd    # get out of /mnt/galaxyTools, or umount may fail
   5 ubuntu@<ip>$ sudo umount /mnt/galaxyTools

Next, from the AWS console, detach the EBS volume where Galaxy is installed. You can discover which EBS volume to detach by looking at the CloudMan log located in /mnt/cm/paster.log and searching for a line similar to this one: Successfully mounted file system '/mnt/galaxyTools' from '/dev/sdg2'. Once the volume is detached, from the AWS console, create a snapshot of the volume. Note the newly created snapshot ID and the snapshot size, which will be needed in Step 5.

Note that if you modified available genomes in /mnt/galaxyIndices/genomes/, you will need to unmount that file system as well (umount /mnt/galaxyIndices/), detach the corresponding EBS volume, and create a snapshot of it as well.

Step 4

In order to ensure a clean cluster shutdown, it is necessary to patch everything that was broken during the update. So, first, from the AWS console, reattach the original EBS volume to the instance using the same device ID (e.g., /dev/sdg2 in the above example), mount the device as the corresponding file system:

   1 ubuntu@<ip>$ sudo mount /dev/sdg2 /mnt/galaxyTools

Lastly, as 'galaxy' system user, start Galaxy:

   1 ubuntu@<ip>$ sudo su galaxy
   2 galaxy@<ip>$ cd /mnt/galaxyTools/galaxy-central/
   3 galaxy@<ip>$ sh run.sh --daemon

Back on the CloudMan web interface, once Galaxy is running again, log out of your ssh session (so that the unmount of /mnt/galaxyTools and other shutdown actions proceed smoothly). From the cluster log on the Cloudman admin page, or the "Show current user data" button, note down the name of the cluster's bucket. It will be something like cm-ca8d43f924f3ba1x5b63dabcdcd524ec. Click "Terminate cluster" and include "Terminate Master Instance", but not "Delete this cluster" (or we would lose all this work !)

Step 5:

Once the cluster has terminated (including the master instance), in the AWS console, click on the S3 tab. Find the desired cluster's bucket (as noted in Step 4) and download file persistent_data.yaml. Edit this file under - filesystem: galaxyTools replace the snap_id with the snapshot ID you got in Step 3. Also, if you happened to have changed the size of the filesystem in Step 3 (and thus the size of the snapshot), reflect that change in this file as well. If you changed any of the genomes and thus the galaxyIndices file system, and created a new snapshot, modify that snapshot ID as well. Save the file and replace the one in your cluster's bucket with the new version. The universe_wsgi.ini.cloud and tool_conf.xml.cloud files in the bucket should already be updated with your customizations, but you could check that.

That is it. Instantiate the cluster as you normally would (i.e., using the same user data) and your customizations will be preserved for the given cluster. You may still need to install system software that is independent of Galaxy, eg. additional Perl modules or Linux packages not installed by default in the base Ubuntu Linux.