Locked History Actions

Diff for "CloudMan/Building"

Differences between revisions 13 and 14
Revision 13 as of 2015-07-22 16:45:44
Size: 7532
Editor: EnisAfgan
Comment: Simplify the image building instructions to focus on using Packer
Revision 14 as of 2015-07-22 17:16:25
Size: 8276
Editor: EnisAfgan
Comment: Update galaxyFS building docs
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
The Galaxy File System (''galaxyFS'') contains the Galaxy application, the PostgreSQL database, installed Galaxy tools, and the accompanying configurations. The aim here is to create a snapshot of the ''galaxyFS'' that can be replicated when instances of the system are launched while permitting the changes to the file system (e.g., user-uploaded data, analysis results). This replication is realized by !CloudMan either by creating a volume from a volume snapshot or by downloading a tarball of the file system. The process of building ''galaxyFS'' snapshot is the following:
 1. Launch a new instance of the machine image created in Step 1 using Cloud Launch (either the one available at https://launch.usegalaxy.org/ or one you have installed yourself from https://github.com/galaxyproject/cloudlaunch). When !CloudMan comes up, choose ''Test cluster'' as the cluster type;
 2. From !CloudMan's Admin page, add a new volume-based file system called ''galaxy'' of desired size (default 10GB);
 3. Follow the instructions from the [[https://github.com/galaxyproject/galaxy-cloudman-playbook|CloudMan playbook]] to build the ''galaxyFS''; note that this playbook has an option to automatically install Galaxy tools and genome reference data;
 4. Create a snapshot of the file system: stop any services that might be running (''e.g.,'' Galaxy, PostgreSQL) and create a volume snapshot from the CloudMan's Admin page.
The Galaxy File System (''galaxyFS'') contains the Galaxy application, the PostgreSQL database, installed Galaxy tools, and the accompanying configurations. The aim here is to create a snapshot of the ''galaxyFS'' that can be replicated when instances of the system are launched while permitting the changes to the file system (e.g., user-uploaded data, analysis results). This replication is realized by !CloudMan at runtime. To build the ''galaxyFS'', we need to do the following:
 1. Launch a new instance of the machine image created in Step 1 using Cloud Launch (this will require you have installed your own instance of [[https://github.com/galaxyproject/cloudlaunch|Cloud Launch]]). When !CloudMan comes up, choose the ''Cluster only'' with ''transient storage'' option (under ''Additional startup options'');
 2. Follow the instructions from the [[https://github.com/galaxyproject/galaxy-cloudman-playbook|CloudMan playbook]] to build the ''galaxyFS''; note that this playbook has an option to automatically install Galaxy tools and create an archive of the file system;

When building the ''galaxyFS'', there are a few things to keep in mind. The technical details are documented in the playbook so we'll only highlight a few high-level things concepts here. The ''galaxyFS''-building process can be thought of as a 3-stage process: first, the core components get installed and configured (e.g., Galaxy, the database, etc). Second, Galaxy tools need to be installed. For this, the Galaxy application needs to be started, which can be done from the !CloudMan Admin page (note that Postgres needs to be started first, then ProFTPd and only then Galaxy; see the playbook README for more details). Once started, you can install the tools from the Tool Shed by hand or using a role within the playbook to do this automatically. Once it's all been done, the third stage of the ''galaxyFS''-build process is to create a file system archive. Again, the playbook README has more details but the core idea is that we create a tar ball of the entire file system and upload it to an object store so it can be retrieved by launched instances of the overall system.

CloudMan

Building Galaxy CloudMan components

Hey - this page is under construction! Feel free to contribute and help in getting it wrapped up or bug someone on the CloudMan Team.

Launching a default version of CloudMan and Galaxy on the Cloud is a pretty straightforward process. The underlying system, however, is more complex and consists of a number of components. This page describes the steps required to build your own version of the components and deploy the system. You may want to do this if you are deploying the system on your own Cloud. If you would just like to have a custom version of the existing system on AWS, perhaps this can help?

Overview

The process of building your own instance of the system is time consuming (although we are continuing to simplify and streamline this) and will require some technical skills and understanding of the process. Before starting this endeavor, it is highly recommended that you read the following papers:

  1. "Galaxy CloudMan: delivering cloud compute clusters" - which gives you an overview of the ideology behind what's being done here; and

  2. "Building and Provisioning Bioinformatics Environments on Public and Private Clouds" - which gives many of the technical details of the overall build process

  3. (optional) "A reference model for deploying applications in virtualized environments" - which gives you the technical background why things are being done they way they are being done.

The process of building the system consists of a number of steps. Each step creates a component that, when joined together, compose the complete system. To have a functional Galaxy on the Cloud system, it is required to have all the components in place; CloudMan alone requires the machine image (step 1) and an S3 bucket (or a Swift container) (step 4). Ideally, instances are launched via a Cloud Launch application. A public instance of the Cloud Launch is available at https://launch.usegalaxy.org/ and you will likely want to install your own version (instructions are available here). The overall process of building the components has been automated via an Ansible playbook: galaxy-cloudman-playbook.

Step 1: the Machine Image

The machine image, often called the AMI (for Amazon Machine Image, although other cloud middleware solutions use the same term), represents the base operating system required to run the system; it contains the required system level applications and libraries as well as hooks for starting CloudMan and the rest of the system. Once built, we'll use the machine image to launch instances for building the rest of the components as well to launch instance of the complete system once it's all done.

To build your machine image, we need to download the Ansible playbook and follow the instructions there on how to build the machine image. As things currently stand, the image can be built using a single command: packer build [--only amazon-ebs|openstack] image.json. Before running that command though, just make sure you followed the initial setup/configuration instructions to provide your cloud access credentials.

Step 2: the galaxyFS

The Galaxy File System (galaxyFS) contains the Galaxy application, the PostgreSQL database, installed Galaxy tools, and the accompanying configurations. The aim here is to create a snapshot of the galaxyFS that can be replicated when instances of the system are launched while permitting the changes to the file system (e.g., user-uploaded data, analysis results). This replication is realized by CloudMan at runtime. To build the galaxyFS, we need to do the following:

  1. Launch a new instance of the machine image created in Step 1 using Cloud Launch (this will require you have installed your own instance of Cloud Launch). When CloudMan comes up, choose the Cluster only with transient storage option (under Additional startup options);

  2. Follow the instructions from the CloudMan playbook to build the galaxyFS; note that this playbook has an option to automatically install Galaxy tools and create an archive of the file system;

When building the galaxyFS, there are a few things to keep in mind. The technical details are documented in the playbook so we'll only highlight a few high-level things concepts here. The galaxyFS-building process can be thought of as a 3-stage process: first, the core components get installed and configured (e.g., Galaxy, the database, etc). Second, Galaxy tools need to be installed. For this, the Galaxy application needs to be started, which can be done from the CloudMan Admin page (note that Postgres needs to be started first, then ProFTPd and only then Galaxy; see the playbook README for more details). Once started, you can install the tools from the Tool Shed by hand or using a role within the playbook to do this automatically. Once it's all been done, the third stage of the galaxyFS-build process is to create a file system archive. Again, the playbook README has more details but the core idea is that we create a tar ball of the entire file system and upload it to an object store so it can be retrieved by launched instances of the overall system.

Step 3: the galaxyIndicesFS

Required step but no docs exist here yet...

Step 4: Hooking it all up

Once you have above components built, it is necessary to connect them. This is done by creating a publicly accessible S3 bucket/Swift container (e.g., my_cloudman) and placing a YAML specification file along with a copy of CloudMan's source packaged in a .tar.gz:

  1. Create a bucket/container, whether it be is on S3 or Swift
  2. Place a copy of CloudMan's source code in the bucket. Call it cm.tar.gz. This can be your own modified version of CloudMan's source downloaded/forked from the source code repository or the default one, available from http://s3.amazonaws.com/cloudman/cm.tar.gz

  3. Place a copy of CloudMan's boot script in the bucket. Call it cm_boot.py. This can be your own modified version of the script downloaded from the source code repository or the default one, available from http://s3.amazonaws.com/cloudman/cm_boot.py

  4. Create a file called snaps.yaml and also place it in the bucket. See the main snaps.yaml for an example of what the structure of this file must look like. Just make sure to provide the snapshot ID's for the snapshots you built in the previous steps. Note that if you would like for others to be able to use your version of the system, it is necessary to make the provided snapshots public.

  5. Finally, provide access a launcher application by installing and configuring Cloud Launch application from https://github.com/galaxyproject/cloudlaunch

Additional resources

Over time, the community has developed a few more documents and resources that can help with setting up the system and these are aggregated here: