Locked History Actions

CloudMan/AWS/GettingStarted

CloudMan

This is an old and currently outdated guide only kept for historical reasons. Do not use the instructions on this page. Use the new getting started guide available here: https://wiki.galaxyproject.org/CloudMan/GettingStarted

Getting Started with Galaxy CloudMan

This page provides a step-by-step instructions on how to start your own instance of Galaxy on Amazon Web Services (AWS) Elastic Compute Cloud (EC2). More general information and instructions about Galaxy CloudMan can be found here.

Screencast

Here's a screencast that walks through the process of setting up your own Galaxy cloud cluster. Note, the screencast skips one step, detailed setting up of the Inbound TCP rules. See "Inbound Rules" below.

Step 1: One Time Amazon Setup

Step 1 Screenshots

Set region; click to enlarge
1.2. Set region

Click on key pairs; click to enlarge
1.3. Click on key pairs

Create security group; click to enlarge
1.5. Create security group

Create security group; click to enlarge
1.7. Add inbound rules

  1. Because AWS services implement pay-as-you-go access model for compute resources, it is necessary for every user of the service to register with Amazon. You will need a credit card to register. (You can apply for a AWS Education Grant after you register).

  2. Once your account has been approved by Amazon (note that this may take up to one business day), log into the EC2 AWS Management Console and set your AWS Region to US East (Virginia). This is the only region Galaxy CloudMan is fully supported in at this time (see screenshot 1.2).

  3. Click Network & Security → Key Pairs or My Resources → n Key Pairs (see screenshot 1.3 - if it does not look like this, then try using the Chrome browser) and then click Create Key Pair. Enter a memorable name for the key pair, e.g., GalaxyCloud and click Create.

  4. Save your private key! The previous step creates the key pair and downloads a copy to your machine with the name MemorableName.pem. Save this file and protect it like you would your password. The key pair can be used to access started instances from the command line.

  5. Create a Security Group by clicking Network & Security → Security Groups → Create Security Group (see screenshot 1.5). Specify a name (e.g., GalaxyGroup) and provide a brief description. VPC should be No VPC. Click the Yes, Create button. The new group now appears in the list and details about the group are listed at the bottom of the page.

  6. Add Inbound Rules to the new group by clicking the Inbound tab. For each new rule you will need to select the protocol (the rule type) from the Create a new rule: pulldown, fill in the fields for that rule, and then click Add Rule. Define these rules:

    1. Protocol: HTTP
      Source: 0.0.0.0/0

      Unless you want to restrict access based on the source IP
    2. Protocol: SSH
      Source: 0.0.0.0/0

    3. Protocol: Custom TCP rule
      Port range: 42284
      Source: 0.0.0.0/0

      This rule opens a port on the remote instance allowing access to the cloud controller web interface.
    4. Protocol: Custom TCP rule
      Port range: 20-21
      Source: 0.0.0.0/0

      This rule opens ports required for FTP file transfer.
    5. Protocol: Custom TCP rule
      Port range: 30000-30100
      Source: 0.0.0.0/0

      This rule opens ports required for passive FTP file transfer.
    6. Protocol: All TCP
      Source: name of group, e.g. GalaxyGroup

      The Source will automatically change to the security group ID. This action will enable multiple instances in the same security group to communicate with each other on Amazon EC2's internal network.
    7. Click Apply Rule Changes. The window should look like the screenshot 1.7.

All of these inbound rules are necessary for proper functioning of CloudMan and Galaxy.

Step 2: Starting a Master Instance

Step 2 Screenshots

Find & Select AMI; click to enlarge
2.3. Find & Select AMI

Set instance details; click to enlarge
2.4. Instance details

set user data; click to enlarge
2.5. Set User Data.

Get to credentials; click to enlarge
2.5.3. Get to credentials.

Credentials; click to enlarge
2.5.4. Credentials.

Review & launch; click to enlarge
2.9. Review & Launch!

AWS EC2 instances list; click to enlarge
2.10. AWS instance is up

Galaxy not running yet; click to enlarge
2.11. Galaxy CloudMan is
ready to configure

This step is required every time a new cloud instance of Galaxy is desired.

CloudMan works in a master-worker fashion: the master is used to control all of the needed services as well as worker instances. Worker instances are needed to run analysis jobs submitted through Galaxy that runs on the master instance (for a more detailed description of running Galaxy in cluster environments, see the cluster performance page). So, in order to start a Galaxy CloudMan cluster, we need to start a master instance.

  1. Go to the AWS Management Console for EC2 and click Launch Instance

  2. Select the Classic Wizard in the popup window.

  3. In the Request Instances Wizard, click on the Community AMIs tab and search for galaxy. Choose the current AMI (see below).

    Current AMI:

    • AMI: ami-b45e59de
    • Name: Galaxy-CloudMan-1457720469 (active dates: 2016-03-24 -> present)

    Note that the current AMI represents the environment required to run CloudMan (in the format of a machine image) and the machine image release date does not represent the most recent update or version of either CloudMan or Galaxy. Versions of those tools can be seen (and automatically updated, with the Update button in the CloudMan Admin page) once an instance has been instantiated (we are also looking into a more explicit form of making this information available).

  4. Set the number of instances, the instance type, and the availability zone (see screenshot 2.4).

    1. Set Number of Instances to 1. This is the head node of the cluster.

    2. To determine which Instance Type to select, consult these resources.

    3. It does not matter which Availability Zone you choose the first time, but once selected, you must select this same zone every time you instantiate the given cluster!

  5. Supply user data. The next popup asks for Kernel ID, RAM Disk ID, and User Data. User data specifies a desired name of the cloud cluster and provides Galaxy CloudMan with user account information. User data must follow the following format:

    • cluster_name: <DESIRED CLUSTER NAME>
      password: <DESIRED Galaxy CloudMan WEB UI PASSWORD>
      access_key: <YOUR AWS ACCESS KEY>
      secret_key: <YOUR AWS SECRET KEY>

      See screenshot 2.5. What each means and where they come from

    • cluster_name: What the cluster will be called. Because nothing is stopping a given user from simultaneously starting multiple clusters on the cloud, cluster name is needed by CloudMan (and the user) to identify a given cluster. Pick a meaningful name.

    • password: This password will be required to access the CloudMan web interface. Without a password, anyone could start/stop instances on your behalf if they know your instance's URL. Pick something secure that you can also remember.

    • access_key: CloudMan needs user account information because it will need to create persistent data storage volumes as well as start user-specified number of additional cloud instances. These keys are available on your Security Credentials. One way to reach that is to go to the AWS home page and click on My Account / Console → Security Credentials (see screenshot 2.5.3). Scroll down and copy and paste the Access Key ID (see screenshot 2.5.4).

    • secret_key: The secret key is also obtained on the credentials page, by clicking Show under Secret Access Key (see screenshot 2.5.4).

  6. The next popup allows you to Set Metadata Tags for this instance. This is valuable if you are going to have many instances. At least set the Name tag for this instance, as that will appear in the instance list of the AWS EC2 Management Console.

  7. Choose the key pair you created during the initial setup.

  8. Select the security group you created above and continue.

  9. Lastly, check your entries one more time, and then Launch the instance and wait (about 5 minutes on average) for the instance and CloudMan to boot (see screenshot 2.9). You should see a final popup that says "Your instances are now launching."

    • Note: The launch is a multistep process. First the AWS instance has to launch, then Galaxy CloudMan. We'll need to verify that both launched.

  10. Check the status of the instance in AWS. Go to the AWS management console, and click Instances, then select the instance you just launched. You need to wait until the instance state is Running, and Status checks says "2/2 checks passed" (see screenshot 2.10). If you are impatient, you can click the Refresh button in AWS.

  11. Check that Galaxy CloudMan is ready to be configured. In the previous step, a details panel for your instance appears at the bottom of the screen when you select that instance (see screenshot 2.10). Copy the URL that appears at the top of the instance details panels into a web browser and hit enter. You should see a "Welcome to Galaxy on the cloud" page that also tells you "There is no Galaxy running on this host, or ..." and end with a link "please use the cloud console" (see screenshot 2.11). If you see this page, then Galaxy CloudMan is ready to configure.

Step 3: Galaxy CloudMan Web Interface

Step 3 Screenshots

Galaxy CloudMan login; click to enlarge
3.1. Galaxy CloudMan login

Initial Cluster Configuration; click to enlarge
3.2. Set initial disk size

Master is up and running; click to enlarge
3.3. Master up and running

Configure autoscaling
3.5. Configure autoscaling

Worker nodes are up
3.6. Worker nodes up

Congratulations, you are elastic galactic
3.7. Congratulations

The Galaxy CloudMan web interface acts as a control panel to the Galaxy CloudMan cluster and provides an overview of all the services needed to run Galaxy in the cloud. The main actions supported on the CloudMan interface are starting a cluster, stopping the cluster, and scaling the number of worker instances.

  1. Once the master instance boots and the CloudMan application starts, it is accessible through a web browser at the IP address of the master instance, under the "/cloud" subdirectory, for example: http://ec2-184-73-10-5.compute-1.amazonaws.com/cloud. You can also get there by clicking the "cloud console" link on the instance's main URL. Login to the instance by entering the password you specified in User Data when starting the master instance. You can leave the the User field empty.

  2. The first time you login to this instance's Galaxy CloudMan interface, an "Initial Cluster Configuration" popup will appear, asking you how much disk space you want to allocate for your data (see screenshot 3.2). See the Capacity Planning page for guidance on what to set this at. This number can be increased, but not decreased, later. Click Start Cluster.

    • In order for CloudMan to start Galaxy and thus allow access to it, the given cluster needs to create a data storage volume where all user-uploaded and analyzed data will be stored. This data is stored on an external EBS volume and attached to the master instance each time given cluster is instantiated and and a master instance is running. So, the first time a cluster is created/instantiated, you have to decide on the size of this user data volume. The size should be based on the expected usage of given Galaxy instance (see also Capacity Planning). Note that the current limit for the size of this user data volume is 1000GB.

  3. It will take a few minutes for the master node to come up. Several indicators on the screen will change (see screenshot 3.3):

    • The Access Galaxy button will go from grayed out to active

    • Disk status will show a disk with a green plus on it

    • Service status for both Applications and Data will be green (instead of yellow).

    • The Access Galaxy button is no longer grayed out.

  4. Once the Access Galaxy button is no longer grayed out, you can add nodes to the cluster by either clicking the Add Nodes button, or turning Autoscaling on. For this example, let's enable autoscaling. Go to autoscaling configuration by clicking the "on" link in the autoscaling box at the right.

  5. Set Minimum and Maximum number of nodes to maintain, and the type of nodes. See Capacity Planning for guidance. See screenshot 3.5.

    • Autoscaling automates the elasticity offered by cloud computing. Once turned on, autoscaling takes over the control over the size of your cluster. Autoscaling specifies the cluster size limits: The cluster will not automatically shrink to less than the minimum number of worker nodes you specify and it will never grow larger than the maximum number of worker nodes you specify. While respecting the set limits, if there are more jobs than the cluster can comfortably process at a given time autoscaling will automatically add compute nodes; if there are cluster nodes sitting idle at the end of an hour autoscaling will terminate those nodes reducing the size of the cluster and your cost. Autoscaling limits can be adjusted later, and autoscaling can also be turned off.
  6. Galaxy CloudMan will bring up the minimum number or worker nodes as specified in the previous steps. The Galaxy CloudMan Console is updated as this happens. The Worker Status will go from

    Idle: 0 Available: 0 Requested: n
    to
    Idle: n Available: n Requested: n
    See screenshot 3.6.
  7. Once the worker nodes are up, click the Access Galaxy button. This opens up a new window with Galaxy on the Cloud (see screenshot 3.7. Congratulations. You are now running an elastic and fully loaded and populated version of Galaxy on the cloud.

From here on, use Galaxy as you normally would. Note that this represents a new and clean instance of Galaxy so you will have to register your user name and upload any desired data. You will have to do this for each and every cloud cluster you create but only once per cluster.

The main CloudMan web interface allows you to get a quick glimpse of your cloud cluster's status. The interface provides a color-coded overview of the status of each worker instance. In addition to the general status of instances, inside of each box representing individual instances you can observe the machine load for the given instance. Based on the load of worker instances, you can decide when and how to scale your cluster. Scaling the cluster is easily done by clicking 'Add/Remove instances' and providing desired number of instances. In the provided screenshot, there were four ready but fully loaded instances associated with the given cluster so we added additional eight instances that were, at the time the screenshot was taken, in pending state.

Once requested worker nodes became ready and available, Galaxy workload is automatically distributed across all of them. Note that clicking on any worker instance box provides additional details about the given instance, including status of individual services.

Finally, once the need for the given cluster subsides, simply power it off by clicking 'Terminate Cluster' button. Note that clicking that button will terminate all worker instances and all underlying services. You will also want to check "Automatically terminate the master instance?" when shutting down. If you don't the you will need to visit the AWS console again and terminate the master instance manually. Also, do a sanity check and make sure all other instances have been terminated. Note that once the button is clicked and the shutdown process complete, given cluster can only be instantiated by terminating the current master instance and starting another one.

Next time the same cluster is needed, follow these steps and start a new instance of the cluster. Note that all data uploaded to the cloud and analyzed through Galaxy will be preserved even though the cluster is not running. In order to access this data, an instance of the cluster must be running.

Step 4: Use Galaxy as you normally would

One can access the Galaxy application by clicking the 'Access Galaxy' button or simply pasting the instance URL into a browser's address bar and start using Galaxy as they normally would. At any point following the initial cluster setup, on the Galaxy CloudMan Console, one can scale the size of the compute cluster by clicking 'Add instances' or 'Remove instances' and specifying the desired number of instances to add or remove.

Step 5: Shutting Down

Note: This section is still being modified.

In Amazon Web Services and Galaxy CloudMan, there are two distinct ways to shut down your instance.

Terminate:

Stop and delete the instance. If you are done and you don't want to pay to store your instance than use this option. Your instance and all its data will just go away. The next time you want a Galaxy instance on the cloud you will have to configure it again (starting at "Step 2."

Galaxy CloudMan can be terminated using either the Cloudman console or the AWS management console.

Stop:

This stops the instance but leaves a copy of it in storage at Amazon. The next time you want it, all you have to do is start it up. However, you will have to pay for storage as long as the copy exists.

Note that if you restart a stopped instance it will come up with a different URL. You can request an "Elastic IP" address from AWS and assign it to your restarted AWS EC2 instance to maintain a constant IP for your instance, to make access simple and continuous for users.