Please use a more selective search term instead of ""

Clear message
Locked History Actions

CloudMan/AWS/CapacityPlanning

CloudMan

Galaxy CloudMan CapacityPlanning for Amazon Web Services

http://aws.amazon.com/

This page offers advice on how much cloud infrastructure you will need to run your Galaxy instance on Amazon Web Services (AWS). See the general capacity planning page for advice that applies across different cloud infrastructures.

Amazon Web Services

CloudMan was initially developed for the Amazon Web Services (AWS) cloud platform. Before we cover AWS, we'll need to introduce some terminology:

Terminology

EC2

  • Amazon's Elastic Compute Cloud (EC2) provides the compute part of their cloud. How many CPUs, and how much memory any instance has is determined by that instance's EC2 Instance Type.

EBS

  • Amazon's Elastic Block Storage (EBS) provides virtual disk drives for EC2 instances.

S3

  • Amazon's Simple Storage Service (s3) is "storage for the internet." It provides a web services interface to net-accessible storage. It is not used at runtime by Galaxy cloud instances, but can be used to create archives of EBS virtual disks.

How Much EC2?

Which EC2 instance type(s) should you use for your Galaxy?

EC2 Recommendations

Scenario

Head

Worker

1: Light usage

Standard Large or Extra Large

Standard Large or Extra Large

2: Occasional heavy

High-Memory Double or Quadruple Extra Large

High-Memory Extra Large

3: Continuous variable

High-Memory Double or Quadruple Extra Large

High-Memory Extra Large

EC2 Instance Type Comments

Instance Type

Recommended for Usage Scenarios

Comments

1

2

3

H

W

H

W

H

W

Micro

N

N

N

N

N

N

Galaxy may come up on these instances, but it can't run any analysis.

Small

Medium

Large

Y

Y

N

N

N

N

Recommended for Scenario 1: Light Usage, head and worker nodes.

Extra Large

High-
Mem-
ory

Extra Large

N

Y

N

Y

Recommended for Scenarios 2 & 3: heavy or variable usage head nodes.

Double Extra Large

Y

Y

Recommended head node for heavy/variable usage (Scenarios 2 & 3)

Quadruple Extra Large

Y

Y

The Galaxy Team uses this head node in workshops that run TopHat. It can support ~30 concurrent TopHat jobs without significant slowdown, whereas the Double Extra Large option gets bogged down.

Com-
pute

Cluster Any

X

X

X

X

X

X

These are not supported by CloudMan

GPU Any

Key:

N

Not recommended

Y

Recommended

X

Can't go there

See also

How Much EBS?

Galaxy CloudMan comes with two standard volumes:

  1. Tools Volume (10GB): Contains the tools used by the instance

  2. Indices Volume (700GB): Reference data for number of species.

In addition, you will need a data volume to contain the data used by and produced in your analysis. You don't control the size of the tools and indices volumes, but you specify the size of the data volume at setup time. The size of your data volume is determined by the size of your datasets. Unfortunately, we don't have any hard and fast guidelines or multipliers for how much you will need, given the size of your datasets.

For Scenario 1, Light usage, it is fine to specify a large data volume (up to the 1 terabyte max). However for Scenarios 2 and 3, where the storage may or will exist for a long time, allocating too much storage can incur significant cost. AWS charges for allocated storage, not actually used storage, by the hour.