Revision 5 as of 2014-03-27 14:23:34

Clear message
Locked History Actions


Writing Functional Tests

If you are thinking of contributing your tool to the Galaxy Tool Shed, it should have functional test that covers most aspects and usages of the tool. Everybody benefits from a good automatic testing - the tool author ensures quality of tool, admins can easily separate good tools from bad tools and users use tools that are reliable and error-protected. An examples below explains how to write a test for a tool.

Tests can be specified in the tool config file using <tests> and test tags (for more information see description of test configuration tags. For example, the cluster tool specifies the following tests:

   1   <tests>
   2     <test>
   3       <param name="input1" value="5.bed" />
   4       <param name="distance" value="1" />
   5       <param name="minregions" value="2" />
   6       <param name="returntype" value="1" />
   7       <output name="output" file="gops-cluster-1.bed" />     
   8     </test>
   9     <test>
  10       <param name="input1" value="gops_cluster_bigint.bed" />
  11       <param name="distance" value="1" />
  12       <param name="minregions" value="2" />
  13       <param name="returntype" value="1" />
  14       <output name="output" file="gops-cluster-1.bed" />     
  15     </test>
  16     <test>
  17       <param name="input1" value="5.bed" />
  18       <param name="distance" value="1" />
  19       <param name="minregions" value="2" />
  20       <param name="returntype" value="2" />
  21       <output name="output" file="gops-cluster-2.bed" />     
  22     </test>    
  23     <test>
  24       <param name="input1" value="5.bed" />
  25       <param name="distance" value="1" />
  26       <param name="minregions" value="2" />
  27       <param name="returntype" value="3" />
  28       <output name="output" file="gops-cluster-3.bed" />     
  29     </test>
  30 </tests>

To explain what this means let's first take a look at the inputs and outputs of the cluster tool. It takes four inputs (input1, distance, minregions, and returntype) and produces a single output:

   1   <inputs>
   2     <param format="interval" name="input1" type="data">
   3       <label>Cluster intervals of</label>
   4     </param>
   5     <param name="distance" size="5" type="integer" value="1" help="(bp)">
   6       <label>max distance between intervals</label>
   7     </param>
   8     <param name="minregions" size="5" type="integer" value="2">
   9       <label>min number of intervals per cluster</label>
  10     </param>
  11 	<param name="returntype" type="select" label="Return type">
  12 		<option value="1">Merge clusters into single intervals</option>
  13 		<option value="2">Find cluster intervals; preserve comments and order</option>
  14 		<option value="3">Find cluster intervals; output grouped by clusters</option>
  15 		<option value="4">Find the smallest interval in each cluster</option>
  16 		<option value="5">Find the largest interval in each cluster</option>
  17 	</param>
  18    </inputs>
  19   <outputs>
  20     <data format="input" name="output" metadata_source="input1" />
  21   </outputs>

Now let's take a look at the first test:

   1     <test>
   2       <param name="input1" value="5.bed" />
   3       <param name="distance" value="1" />
   4       <param name="minregions" value="2" />
   5       <param name="returntype" value="1" />
   6       <output name="output" file="gops-cluster-1.bed" />     
   7     </test>

All this does is specify parameters that will be used by test framework to run this test. For most input types, the value should be what would be entered by the user when running the tool through the web, with the exception of input and output. The input (5.bed) and output (gops-cluster-1.bed) files reside within the ~/test-data directory. Once the test is executed the framework simply compares generated output with an example file (gops-cluster-1.bed in this case). If there are no differences - test is declared success.

To run the Galaxy functional tests see Running Tests.

Advanced Test Settings

Output File Comparison Methods


The default comparison method (diff) simply compares line by line in a file to check if the result of the test run of the tool matches the expected output specified in the <output> tag. A lines_diff attribute can be provided to allow the declared number of lines to differ between outputs. A 'change' in a line is equivalent to a count of 2 line differences: one line removed, one line added.

      <output name="output" file="variable_output_file.bed" lines_diff="10"/>     


re_match is used to compare, line-by-line, the output from a tool test run to a file containing regular expression patterns. The helper script scripts/tools/ can be used to turn an 'ordinary' output file into a regular expression escaped format. One can then edit the escaped file and replace content with the necessary regular expressions to match the variable output. lines_diff can also be optionally declared when using this matching style; in this case, files are matched line-by-line, so a 'change' in one line is equivalent to a lines_diff count of 1.

      <output name="output" file="variable_output_file.bed" compare="re_match" lines_diff="1"/>     


re_match_multiline is used to compare the output from a tool test run to a file containing a multiline regular expression pattern. The helper script scripts/tools/ can be used to turn an 'ordinary' output file into a regular expression escaped format (when -m/--multiline option is used). One can then edit the escaped file and replace content with the necessary regular expressions to match the variable output. lines_diff is not applicable when doing multiline regular expression matching.

      <output name="output" file="variable_output_file.bed" compare="re_match_multiline" />     

When doing regular expression matching, this link maybe of interest:


sim_size is used to compare the file size of output from a tool test run to a test file. The delta attribute is used to specify the maximum size difference, in bytes, allowed; default delta is 100.

      <output name="output" file="variable_output_file.bed" compare="sim_size" delta="976245" />     


contains can be used to check if the test file from your test-data folder is part of the output from a tool test run.

        <output name="out_bam" file="empty_file.dat" compare="contains" />

Checking extra_files_path contents

Several tools, including those that use Composite Datatypes such as rGenetics, create additional files which are stored in a directory associated with the main history item. If you have a tool that creates these extra files, it is a good idea to write tests which also verify their correctness. This can be done on a per extra file basis or by comparing an entire directory; all of the previously mentioned comparison methods are applicable.

The two examples below are from tools/peak_calling/macs_wrapper.xml.

File-by-file comparison

Here two outputs are being tested; the first file has no extra files, but the second file has five extra files (in addition to the primary file) which are being tested.

      <output name="output_bed_file" file="peakcalling_macs/macs_test_1_out.bed" />
      <output name="output_html_file" file="peakcalling_macs/macs_test_1_out.html" compare="re_match" >
        <extra_files type="file" name="Galaxy_Test_Run_model.pdf" value="peakcalling_macs/test2/Galaxy_Test_Run_model.pdf" compare="re_match"/>
        <extra_files type="file" name="Galaxy_Test_Run_model.r" value="peakcalling_macs/test2/Galaxy_Test_Run_model.r" compare="re_match"/>
        <extra_files type="file" name="Galaxy_Test_Run_model.r.log" value="peakcalling_macs/test2/Galaxy_Test_Run_model.r.log"/>
        <extra_files type="file" name="Galaxy_Test_Run_negative_peaks.xls" value="peakcalling_macs/test2/Galaxy_Test_Run_negative_peaks.xls" compare="re_match"/>
        <extra_files type="file" name="Galaxy_Test_Run_peaks.xls" value="peakcalling_macs/test2/Galaxy_Test_Run_peaks.xls" compare="re_match"/>

Directory comparison

Here four outputs are being tested; the first three files have no extra files, but the last file has 5 extra files (in addition to the primary file) which are being tested by the directory method. Each file in the specified directory of output_html_file will be tested against the files of the same name in the history item's extra files path.

      <output name="output_bed_file" file="peakcalling_macs/macs_test_1_out.bed" />
      <output name="output_xls_to_interval_peaks_file" file="peakcalling_macs/macs_test_2_peaks_out.interval" lines_diff="4" />
      <output name="output_xls_to_interval_negative_peaks_file" file="peakcalling_macs/macs_test_2_neg_peaks_out.interval" />
      <output name="output_html_file" file="peakcalling_macs/macs_test_1_out.html" compare="re_match" >
        <extra_files type="directory" value="peakcalling_macs/test2/" compare="re_match"/>

Beware of twill bug

See the following e-mail for explanation of a workaround that deals with "dashed" options:

Hello Assaf,

This is a known bug in twill 0.9.  The work-around is to use the label rather than the value in your functional test.  So, in your example, the test should be changed to the following.  Let me know if this does not work.

One of the tests looks like this:
  <!-- ASCII to NUMERIC -->
  <param name="input" value="fastq_qual_conv1.fastq" />
  <param name="QUAL_FORMAT" value="Numeric quality scores" />
  <output name="output" file="fastq_qual_conv1.out" />

Greg Von Kuster
Galaxy Development Team

Assaf Gordon wrote:

I wrote a functional test for my tool, and encountered a strange behavior.

One of the tool's parameters looks like this:
<param name="QUAL_FORMAT" type="select" label="output format">
     <option value="-a">ASCII (letters) quality scores</option>
     <option value="-n">Numeric quality scores</option>

One of the tests looks like this:
   <!-- ASCII to NUMERIC -->
   <param name="input" value="fastq_qual_conv1.fastq" />
   <param name="QUAL_FORMAT" value="-n" />
   <output name="output" file="fastq_qual_conv1.out" />

When I run the functional tests for this tool, I get the following exception:
Traceback (most recent call last):
File "galaxy_devel_tools/test/functional/", line 114, in test_tool
File "galaxy_devel_tools/test/functional/", line 44, in do_it
    self.run_tool(, repeat_name=repeat_name, **page_inputs )
File "galaxy_devel_tools/test/base/", line 520, in run_tool
    self.submit_form( **kwd )
  File "galaxy_devel_tools/test/base/", line 495, in submit_form
    raise AssertionError( errmsg )
AssertionError: Attempting to set field 'QUAL_FORMAT' to value '['-n']' in form 'tool_form' threw exception: cannot find value/label "n" in list control
control: <SelectControl(QUAL_FORMAT=[-a, -n])>

If I understand the exception correctly, it means that somewhere the minus character ("-n") gets dropped, and therefor the value 'n' cannot be found in the list (which contains "-n" and "-a").

Is this an actual bug or am I doing something wrong?


Saving generated functional test output files

A small change to the test framework was introduced in April 2011 allowing test outputs generated by Twill during functional tests to be saved, making it easier to update test expected outputs after changes to a tool.

If there is a variable called 'GALAXY_TEST_SAVE' in the environment when tests are being run, each output file Twill generates that is compared with a reference file will be written to that directory - assuming write permissions and so on. For example:

setenv GALAXY_TEST_SAVE /tmp/galtest
sh -id myTool

will test the individual tool with id 'myTool' and write the tested output files to /tmp/galtest. Running a full set of functional tests will of course result in a full set of test outputs being saved. To stop test outputs from being saved, reset GALAXY_TOOL_SAVE to null