Customizing your Galaxy on the Cloud cluster
It is possible to use the cloud infrastructure management functionality offered by CloudMan while customizing the default deployment it offers. In the context of Galaxy, this means that it is possible to run a custom version of Galaxy, use your own set of tools, and reference genomes. Note that the process of customizing an instance may require use of the command line tools. In order to modify your cloud deployment, these are the general steps that need to be performed:
- Start a CloudMan cluster instance
- ssh to the EC2 instance and perform desired customizations
- Use CloudMan Admin interface to persist changes to the file system
Follow directions on this page to start a cloud cluster instance. Wait for the cluster to complete initialization and all of the services are running.
From the command prompt, connect to the newly created cluster using the following command, filling in the appropriate details:
1 ssh -o StrictHostKeyChecking=no -i <path to your private key> ubuntu@<instance public DNS>
Next, perform the desired changes to the system. The changes supported at this level of instance customization include modifications to the file systems managed by CloudMan. The available file systems are listed on the CloudMan Admin console under entry Persist changes to file system and are mounted on the underlying system under /mnt (e.g., /mnt/galaxy and /mnt/galaxyIndices). Modifying contents of these file systems allows you to customize your instance of Galaxy, install or modify tools, as well as modify reference genomes used by Galaxy tools. As you perform the changes, you should respect the ownership of the directories; currently all of these are owned by galaxy user. Note that if you plan on modifying the Galaxy application, stop the process first from the CloudMan Admin console.
After you have completed all of the desired modifications, quit from the ssh session so that your login cannot interfere with filesystem unmounts/remounts. On the CloudMan Admin console, click (under Persist changes to file system) on the name of the file system you wish to preserve and CloudMan will perform the required steps to persist any changes. Note that depending on the amount of changes you performed to the given file system, this process may take a while (Amazon is making a snapshot of the EBS volume and that can take a long time). Once the process completes, you can go back to using the cluster as you normally would; all of the changes will have been preserved after you terminate & relaunch the cluster.
An Alternative Approach: Manual & Deprecated
CloudMan automates several steps that are required to persist changes to the underlying file systems. However, if you would like (or something goes wrong), these steps can also be performed manually. For the manual steps, perform Steps 1 and 2 from above and then continue with new Steps 3, 4, 5 as follows:
3. Create new snapshots of the modified EBS volume(s)
4. Repair the cluster, then terminate it
5. Update cluster configuration in S3 to point to the new snapshots
After you have completed all of the desired modifications, stop the Galaxy process with the Stop button for the Galaxy service in the Cloudman Admin Console. As 'ubuntu' user, remove the file system where Galaxy is installed from NFS and unmount it:
Next, from the AWS console, detach the EBS volume where Galaxy is installed. You can discover which EBS volume to detach by looking at the CloudMan log located in /mnt/cm/paster.log and searching for a line similar to this one: Successfully mounted file system '/mnt/galaxy' from '/dev/sdg2'. Once the volume is detached, from the AWS console, create a snapshot of the volume. Note the newly created snapshot ID and the snapshot size, which will be needed in Step 5.
Note that if you modified available genomes in /mnt/galaxyIndices/genomes/, you will need to unmount that file system as well (umount /mnt/galaxyIndices/), detach the corresponding EBS volume, and create a snapshot of it as well.
In order to ensure a clean cluster shutdown, it is necessary to patch everything that was broken during the update. So, first, from the AWS console, reattach the original EBS volume to the instance using the same device ID (e.g., /dev/sdg2 in the above example), mount the device as the corresponding file system:
1 ubuntu@<ip>$ sudo mount /dev/sdg2 /mnt/galaxy
Lastly, as 'galaxy' system user, start Galaxy:
Back on the CloudMan web interface, once Galaxy is running again, log out of your ssh session (so that the unmount of /mnt/galaxy and other shutdown actions proceed smoothly). From the cluster log on the CloudMan admin page, or the "Show current user data" button, note down the name of the cluster's bucket. It will be something like cm-ca8d43f924f3ba1x5b63dabcdcd524ec. Click "Terminate cluster" and include "Terminate Master Instance", but not "Delete this cluster" (or we would lose all this work!)
Once the cluster has terminated (including the master instance), in the AWS console, click on the S3 tab. Find the desired cluster's bucket (as noted in Step 4) and download file persistent_data.yaml. Edit this file under - filesystem: galaxyTools replace the snap_id with the snapshot ID you got in Step 3. Also, if you happened to have changed the size of the filesystem in Step 3 (and thus the size of the snapshot), reflect that change in this file as well.
That is it. Instantiate the cluster as you normally would (i.e., using the same user data) and your customizations will be preserved for the given cluster. You may still need to install system software that is independent of Galaxy, eg. additional Perl modules or Linux packages not installed by default in the base Ubuntu Linux.
Using custom CloudMan application
When you launch a new CloudMan cluster, the most recent released version of the application will be used. If you relaunch a previously existing cluster, the same version of the CloudMan application will be reused as the last time the cluster was launched (even if a newer version of the application was released); this is done to maintain the reproducibility principles of your cluster.
When launching a cluster, the CloudMan application is automatically retrieved as a tar ball from a central location (currently, cloudman bucket on S3 for new clusters or your cluster's bucket for previously existing clusters). To use a custom version of CloudMan it is necessary to provide a custom version of the application source code; depending on the intent, this can be done in one of the following ways:
- For an existing cluster: note down the cluster bucket from CloudMan Admin page and replace files cm.tar.gz and cm_boot.py in the bucket with your custom version (see more below on this)
- For (any number of) new clusters: create an S3 bucket and place cm.tar.gz and cm_boot.py files in that bucket. When launching a new cluster, provide the name of the bucket in the ''Default bucket'' form field on the launcher application.
File cm_boot.py is available in the CloudMan source code repository. If changes are required to this file, take a look at the module cm/boot and make changes there. Once done, generate the cm_boot.py script by invoking the following command from the CloudMan root directory: python make_boot_script.py and then upload the file to the appropriate bucket.
File cm.tar.gz is a tar ball of the entire CloudMan repository. If you are editing an older version of the the applicaton, first download the cm.tar.gz file from your cluster's bucket and extract the archive, make desired edits, recreate the tar.gz file and upload it to your cluster's bucket. If editing the development version from the source control repository, create the tar.gz file with all the files while sitting in the application's root directory.