Locked History Actions

Events/GCC2013/TrainingDay/AdvanceToolData

GCC2013 Training Day

Training Day

Advanced Tool and Data Source Configuration

Presenters: Ross Lazarus and Dan Blankenberg

Scheduled duration: 15:00-17:00

15:00-15:05 Introduction to advanced tool and data source configuration

As an administrator of your own local Galaxy, you can extend Galaxy by writing new tools. There are a few things you need to do to make them work. In the simplest possible case, you need to prepare some XML and make your Galaxy read it in at startup when it reads and parses tool_conf.xml. Each tool must include a unique tool id, a visible name and a command line template. In addition, they can also include multiple tool form parameters with labels, validation and help, outputs, tests, dependency/version requirements. Galaxy uses these to set up the tool list and each selected tool's user interface form.

In summary, the essence of the entire 2 hour session is that to create your own tool in Galaxy, you need to:

  • ensure the executable is available to the execution host (your own VM/login in the workshop)
  • write some valid XML in a text file to describe your new tool and put it somewhere under tools/
  • edit tool_conf.xml to tell Galaxy where that XML file can be found - leave out the path .../tools/
  • restart Galaxy to make the new tool available

Intro

  • Introduce presenters and circulating tutors
  • Scope of the session - start with simplest possible tool.
  • Add complexity in the form of one useful tool feature at a time.
  • Offer a series of examples covering a wide range of common tool requirements.
  • We'll work as far as we can get.
  • NOT explaining how Galaxy actually works.
  • Moving at a fair clip through the essential steps for a new tool to become available to users on your own local Galaxy.
  • Command line skills really will be needed.

15:05-15:25 Hello world in Galaxian

The first exercise consists of creating (or copying, your choice) a text file containing valid XML describing a simple and admittedly, not very useful tool which calls a python script to do some work. However, it will demonstrate the bare bones of the power of the Galaxy tool interface. A few lines of XML and a small python script get you a familiar, simple user interface and a single new history item - a text file containing a string. Note that this is a trivial variation on the hello world tool used in the introductory session - instead of a fugly command line, we're also introducing a python script that does the actual work. 

Steps:

  1. Make a new directory [galaxy root]/tools/hello_advanced and put hello_advanced.xml there containing

   1 <tool id="hello_advanced" name="Hello Advanced" version="0.01">
   2 <description>World</description>
   3 <command interpreter="python">
   4 hello_advanced.py -o "${output1}" -s "hello advanced world"
   5 </command>
   6 <outputs>
   7     <data format="tabular" name="output1" label="hello_advanced_world"/>
   8 </outputs>
   9 <help>
  10 **What it does**
  11 Says hello advanced world by running a python script and passing appropriate parameters
  12 </help>
  13 </tool>

Make a new python script to match the name on the command line above (hello_advanced.py) containing

   1 #!/bin/env python
   2 # python script to echo a command line parameter to an output file also passed on the command line
   3 # your name here
   4 # your favourite OSI approved licence here
   5 import sys
   6 import optparse
   7 
   8 def advanced():
   9         """
  10         Trivial example
  11         """
  12         usage = "%s -o outfilename -s stringtowrite1 -s stringtowrite2 ..." % sys.argv[0]
  13         parser = optparse.OptionParser(usage = usage)
  14         parser.add_option("-s", "--stringtowrite",
  15                          action="append", type="string",dest="mystring",help="Strings to write")
  16         parser.add_option("-o","--outputfile",
  17                          action="store", type="string",dest="outputfile",help="output text file")
  18         (opts, args) = parser.parse_args()
  19         assert len(opts.mystring) > 0, "No strings to write found on command line"
  20         assert opts.outputfile,"No output file name found on command line"
  21         outf = open(opts.outputfile,'w')
  22         outf.write('\n'.join(opts.mystring))
  23         outf.write('\n')
  24         outf.close
  25 
  26 if __name__ == "__main__":
  27         advanced()

Test this script on the command line - eg something like

python tools/hello_advanced/hello_advanced.py -s "hello" -s "advanced" -s "world" -o /tmp/test.txt
cat /tmp/test.txt

Fix any syntax errors and make sure this runs and that the expected output is generated correctly because if it doesn't run from the command line, it certainly won't run when you try calling it from Galaxy!

This text and the python script it calls are all you need for a new, real new tool, including some help to display to the user. In this example, the executable we use is a python script which echos it's input (the string) to a new history output file. A single command line parameter "${output1}" on the command line is replaced with the Galaxy job execution engine's choice of path and the command line is parsed in the script

The syntax ${...} is recommended and it is also recommended that all user supplied parameters be quoted in case the parameter contains slashes or spaces which might cause the tool to fail mysteriously.

  1. If not already done, adjust universe_wsgi.ini by adding an admin_user email you will register with when you first log in - use commas ONLY - no spaces - to separate admin email addresses. Adjust tool_conf.xml adding a new tool path that must exactly match the directory/filename you chose for your tool.
       1  <tool file="hello_advanced/hello_advanced.xml"/> 
    
  2. Restart

    Stop Galaxy if it's running

     sh run.sh –stop-daemon

    Restart Galaxy

     sh run.sh –daemon
  3. Check paster.log for errors (search for “hello” to find where your tool loaded – or barfed). If it fails to load, look for the syntax error, repair it, rinse, repeat... until it loads.
  4. When it loads correctly, test your new tool. In your VM webrowser, visit http://localhost:8080 . Register your admin email address if you haven't already done so and log in. Test your new tool. It will write “hello world” to a new file in your history. If/when it works, find the actual commands Galaxy executed to run your tool in paster.log. If it fails, look in paster.log for hints about what went wrong. Repair and reload via the admin interface (no need to restart the Galaxy server) until it works.

  5. Raise arms in victory \o/

Bonus points if you finish early

  1. Look at what's been written to paster.log during correct execution.
  2. Make it do something more interesting.

15:25-15:45 Hello world test

Working automated functional tests are a great way to assure yourself that your tool works correctly for at least the test cases you provide and they are required for IUC approval of tool shed tools. Everytime Galaxy is updated, running the functional tests will assure you that changes to the core Galaxy code have not broken something in your tools. Without automated tests, you would need to test by hand every time you update.

Tests could fill a workshop on their own, but we can add a simple one for the hello advanced example with a few extra lines of code. We also need to provide the expected output from the test in the test-data subdirectory so the test framework can compare what is produced when the test is run against what is expected.

Steps:

  1. Make a new text file in your Galaxy test-data/ directory under the name hello_world_advanced_testout.txt - we will provide that name to the test tag and the test harness will find it there. It should contain exactly the same string as a successful run of the hello world advanced script which should be the single string
    hello world advanced
  2. Save a copy of hello_world_advanced.xml as hello_world_advanced1.xml
  3. Adjust hello_world_advanced.xml so it includes the test section shown below
       1 <tool id="hello_advanced" name="Hello Advanced" version="0.02">
       2 <description>World</description>
       3 <command interpreter="python">
       4 hello_advanced.py -o "${output1}" -s "hello advanced world"
       5 </command>
       6 <outputs>
       7 <data format="tabular" name="output1" label="hello_advanced"/>
       8 </outputs>
       9 <tests>
      10 <test>
      11  <output name='output1' file='hello_world_advanced_testout.txt' />
      12 </test>
      13 </tests>
      14 <help>
      15 
      16 **What it does**
      17 Says hello advanced world by running a python script and passing appropriate parameters with a functional test
      18 </help>
      19 </tool>
    
  4. Add the same line you added to tool_conf.xml to tool_conf.xml.sample - this is used by the functional test harness to find any tools to be tested. The test will not work unless it is also in that tool_conf.xml.sample file.

  5. Reload the hello_world_advanced tool and run it again to make sure there are no syntax errors in the test section - the test won't pass unless the tool itself runs in Galaxy.
  6. Run a functional test on the command line and use the -id parameter to pass the tool id hello_advanced
    sh run_functional_tests.sh -id hello_advanced

If the test does not work you will see some tracebacks which will indicate what you need to fix. There will be a default output file run_functional_tests.html containing the test results with failure details if it did not work

15:40-15:55. Hello repeating input

Add a repeating group input parameter as shown. These are handy when you need an unknown number of parameters from the user since they allow the user to simply add more until they are done. Save. Reload the tool via the admin interface and test it out. Repeat until it's working right. Experiment and play with the new repeating parameter. Note how the repeats are passed to the python script, where the optparse "append" option adds them to a list of strings which are then written as newline delimited rows. 

   1 <tool id="hello_advanced" name="Hello Advanced" version="0.03">
   2 <description>World</description>
   3 <command interpreter="python">
   4 hello_advanced.py -o "${output1}"
   5 #for x in $writeme
   6 -s "$x.astring"
   7 #end for
   8 </command>
   9 <inputs>
  10 <repeat name="writeme" title="Strings to be written">
  11 <param name="astring" type="text" label="An interesting string to write" help="keep adding these if you want"/>
  12 </repeat>
  13 </inputs>
  14 <outputs>
  15 <data format="tabular" name="output1" label="hello_advanced_repeats"/>
  16 </outputs>
  17 <help>
  18 **What it does**
  19 Says hello advanced world by running a python script and passing appropriate parameters
  20 Any number of strings can be input by the user through the use of a repeat tag
  21 </help>
  22 </tool>

Bonus points

  • 1.1 Experiment with tabs as separators ('\t') or commas or whatever instead of '\n'.

15:55 – 16:10 Hello_conditional

Conditional tags allow control flow in a tool form such as the "advanced options" control in the BWA/BWA2 tools forms. Add a very simple one as follows to allow the user to either input only one string without the repeat tag, or if they want to use the repeat tag and add as many as they feel like. Save hello_advanced.xml as hello_advanced2.xml as a backup and replace hello_advanced.xml with something like

   1 <tool id="hello_advanced" name="Hello Advanced" version="0.04">
   2 <description>World</description>
   3 <command interpreter="python">
   4 hello_advanced.py -o "${output1}"
   5 #if $allowMulti.onlyOne == "yes"
   6  #for x in $allowMulti.writeme
   7   -s "$x.strings"
   8  #end for
   9 #else
  10   -s "$allowMulti.astring"
  11 #end if
  12 </command>
  13 <inputs>
  14    <conditional name="allowMulti">
  15       <param name="onlyOne" type="select" label="Allow multiple strings?">
  16         <option value="yes" selected="True">Use the repeat tag</option>
  17         <option value="no">No repeat tag</option>
  18       </param>
  19       <when value="yes" >
  20         <repeat name="writeme" title="Strings">
  21            <param name="strings" type="text" label="An interesting string to write" help="keep adding these if you want"/>
  22         </repeat>
  23       </when>
  24       <when value="no">
  25            <param name="astring" type="text" label="An interesting string to write" help="You only get one of these!"/>
  26       </when>
  27     </conditional>
  28 </inputs>
  29 <outputs>
  30 <data format="tabular" name="output1" label="hello_advanced_repeats"/>
  31 </outputs>
  32 <help>
  33 **What it does**
  34 Says hello advanced world by running a python script and passing appropriate parameters
  35 Optionally, any number of strings can be input by the user through the use of a repeat tag. Or not.
  36 </help>
  37 </tool>

Note the use of #if and other cheetah tags to control the command line depending on how the user has set the conditional tag and whether there are repeats to add to the command line. This additional logic is a necessary complication and studying working examples like the BWA wrapper is helpful if you get stuck.

Reload the hello_advanced tool from the admin interface and use the redo button to recreate the form - test it with the repeat tag turned off and a single string, then with the repeat.

Check the output of paster.log to see how Galaxy is setting up the command line for the call to the python script for you.

Notice how the repeat group starts out empty. See if you can change it so there is always at least one string parameter showing on the form when the repeat group is turned on. http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Crepeat.3E_tag_set has the change you need.

16:10 – 16:30 Hello Tool Data Tables

Many Galaxy tools are able to make use of built-in reference data, e.g. genome indexes for the bwa aligner, that a user can choose from e.g. a select list. Ordinarily, this select list would need to be hard-coded into the tool's xml config, but by relying on Tool Data Tables, we can have the options of the select list populated with content from an external file.

Currently Tool Data Tables use tab-delimited files (the framework is generic and other formatted files can be defined); each field in the table is separated by a tab character.

A bare minimum for tool data tables is to include at least a value (required) and a display name (defaults to value when not specified) that will be used to populate the tool form and determine the value to pass on the command-line. The exact number and content of the columns to use with a tool data table will vary for the specific purpose, but a good practice would be to include an unique ID (value), name, dbkey (when needed), and command-line value. For example, the bwa_index.loc file has the form:

<unique_build_id>       <dbkey> <display_name>  <file_path>

with a tool data table defined in tool_data_table_conf.xml:

   1 <tables>
   2     <!-- Locations of indexes in the BWA mapper format -->
   3     <table name="bwa_indexes" comment_char="#">
   4         <columns>value, dbkey, name, path</columns>
   5         <file path="tool-data/bwa_index.loc" />
   6     </table>
   7 </tables>

Here the value is the unique id and is the value stored in the database (for e.g. rerun). The path column contains the path to the indexes, which will be the value passed to the command-line; this allows the underlying paths to the indexes to change over time, as needed, but to remain usable in workflows or via rerun.

Inside of the bwa tool xml file, we then define the select list parameter as:

   1 <param name="indices" type="select" label="Select a reference genome">
   2           <options from_data_table="bwa_indexes">
   3             <validator type="no_options" message="No indexes are available" />
   4           </options>
   5         </param>

and can pass the "path" value of the selected data table entry as:

"${indices.fields.path}"

Create a new location file tool-data/hello_world.loc, and add several entries, e.g. of the form:

#<greeting_id>  <greeting_text> <path_to_image_file_of_greeting>        <world_where_greating_is_valid>
greeting_hello  Hello   /path/to/file.png       Earth

Be sure to check that white space between fields are <TABS> and not spaces (double check, some editors automatically replace tab with space).

Edit your tool_data_tables_conf.xml file and define the structure of the data table:

   1 <table name="hello_world" comment_char="#">
   2         <columns>value, name, image_path, valid_world</columns>
   3         <file path="tool-data/hello_world.loc" />
   4     </table>

Define the new parameter as

   1 <param name="builtin_greeting" type="select" label="Select a greeting">
   2           <options from_data_table="hello_world">
   3             <validator type="no_options" message="No indexes are available" />
   4           </options>
   5         </param>

You can then access the various fields in the command-line by using e.g

-s "${builtin_greeting.fields.name}"

or

-s "${builtin_greeting.fields.valid_world}"

Feel free to play around with different numbers of entries passing different values via the command-line.

16:30 – 16:45 Hello Macros

Macros allow the reuse of commonly used chunks of code (e.g. parameter definitions and commandline cheetah code). This is particularly useful for tool suites that may have multiple individual tools, but which share a collection of commonly defined parameters.

Extensively documented: http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#Reusing_Repeated_Configuration_Elements

An example is the GATK. See tools/gatk/unified_genotyper.xml which makes use of the Macro file tools/gatk/gatk_macros.xml.

unified_genotyper.xml:

   1 <tool id="gatk_unified_genotyper" name="Unified Genotyper" version="0.0.6">
   2   <description>SNP and indel caller</description>
   3   <requirements>
   4       <requirement type="package" version="1.4">gatk</requirement>
   5       <requirement type="package">samtools</requirement>
   6   </requirements>
   7   <macros>
   8     <import>gatk_macros.xml</import>
   9   </macros>
  10   <command interpreter="python">gatk_wrapper.py
  11    --max_jvm_heap_fraction "1"
  12    --stdout "${output_log}"
  13    #for $i, $input_bam in enumerate( $reference_source.input_bams ):
  14        -d "-I" "${input_bam.input_bam}" "${input_bam.input_bam.ext}" "gatk_input_${i}"
  15        #if str( $input_bam.input_bam.metadata.bam_index ) != "None":
  16            -d "" "${input_bam.input_bam.metadata.bam_index}" "bam_index" "gatk_input_${i}" ##hardcode galaxy ext type as bam_index
  17        #end if
  18    #end for
  19    -p 'java 
  20     -jar "${GALAXY_DATA_INDEX_DIR}/shared/jars/gatk/GenomeAnalysisTK.jar"
  21     -T "UnifiedGenotyper"
  22     --num_threads 4 ##hard coded, for now
  23     --out "${output_vcf}"
  24     --metrics_file "${output_metrics}"
  25     -et "NO_ET" ##ET no phone home
  26     ##-log "${output_log}" ##don't use this to log to file, instead directly capture stdout
  27     #if $reference_source.reference_source_selector != "history":
  28         -R "${reference_source.ref_file.fields.path}"
  29     #end if
  30     --genotype_likelihoods_model "${genotype_likelihoods_model}"
  31     --standard_min_confidence_threshold_for_calling "${standard_min_confidence_threshold_for_calling}"
  32     --standard_min_confidence_threshold_for_emitting "${standard_min_confidence_threshold_for_emitting}"
  33    '
  34     #set $rod_binding_names = dict()
  35     #for $rod_binding in $rod_bind:
  36         #if str( $rod_binding.rod_bind_type.rod_bind_type_selector ) == 'custom':
  37             #set $rod_bind_name = $rod_binding.rod_bind_type.custom_rod_name
  38         #else
  39             #set $rod_bind_name = $rod_binding.rod_bind_type.rod_bind_type_selector
  40         #end if
  41         #set $rod_binding_names[$rod_bind_name] = $rod_binding_names.get( $rod_bind_name, -1 ) + 1
  42         -d "--dbsnp:${rod_bind_name},%(file_type)s" "${rod_binding.rod_bind_type.input_rod}" "${rod_binding.rod_bind_type.input_rod.ext}" "input_${rod_bind_name}_${rod_binding_names[$rod_bind_name]}"
  43     #end for
  44    
  45     #include source=$standard_gatk_options#
  46     ##start analysis specific options
  47     #if $analysis_param_type.analysis_param_type_selector == "advanced":
  48         -p '
  49         --p_nonref_model "${analysis_param_type.p_nonref_model}"
  50         --heterozygosity "${analysis_param_type.heterozygosity}"
  51         --pcr_error_rate "${analysis_param_type.pcr_error_rate}"
  52         --genotyping_mode "${analysis_param_type.genotyping_mode_type.genotyping_mode}"
  53         #if str( $analysis_param_type.genotyping_mode_type.genotyping_mode ) == 'GENOTYPE_GIVEN_ALLELES':
  54             --alleles "${analysis_param_type.genotyping_mode_type.input_alleles_rod}"
  55         #end if
  56         --output_mode "${analysis_param_type.output_mode}"
  57         ${analysis_param_type.compute_SLOD}
  58         --min_base_quality_score "${analysis_param_type.min_base_quality_score}"
  59         --max_deletion_fraction "${analysis_param_type.max_deletion_fraction}"
  60         --max_alternate_alleles "${analysis_param_type.max_alternate_alleles}"
  61         --min_indel_count_for_genotyping "${analysis_param_type.min_indel_count_for_genotyping}"
  62         --indel_heterozygosity "${analysis_param_type.indel_heterozygosity}"
  63         --indelGapContinuationPenalty "${analysis_param_type.indelGapContinuationPenalty}"
  64         --indelGapOpenPenalty "${analysis_param_type.indelGapOpenPenalty}"
  65         --indelHaplotypeSize "${analysis_param_type.indelHaplotypeSize}"
  66         ${analysis_param_type.doContextDependentGapPenalties}
  67         #if str( $analysis_param_type.annotation ) != "None":
  68             #for $annotation in str( $analysis_param_type.annotation.fields.gatk_value ).split( ','):
  69                 --annotation "${annotation}"
  70             #end for
  71         #end if
  72         #for $additional_annotation in $analysis_param_type.additional_annotations:
  73             --annotation "${additional_annotation.additional_annotation_name}"
  74         #end for
  75         #if str( $analysis_param_type.group ) != "None":
  76             #for $group in str( $analysis_param_type.group ).split( ','):
  77                 --group "${group}"
  78             #end for
  79         #end if
  80         #if str( $analysis_param_type.exclude_annotations ) != "None":
  81             #for $annotation in str( $analysis_param_type.exclude_annotations.fields.gatk_value ).split( ','):
  82                 --excludeAnnotation "${annotation}"
  83             #end for
  84         #end if
  85         ${analysis_param_type.multiallelic}
  86         '
  87 ##        #if str( $analysis_param_type.snpEff_rod_bind_type.snpEff_rod_bind_type_selector ) == 'set_snpEff':
  88 ##            -p '--annotation "SnpEff"'
  89 ##            -d "--snpEffFile:${analysis_param_type.snpEff_rod_bind_type.snpEff_rod_name},%(file_type)s" "${analysis_param_type.snpEff_rod_bind_type.snpEff_input_rod}" "${analysis_param_type.snpEff_rod_bind_type.snpEff_input_rod.ext}" "input_snpEff_${analysis_param_type.snpEff_rod_bind_type.snpEff_rod_name}"
  90 ##        #else:
  91 ##            -p '--excludeAnnotation "SnpEff"'
  92 ##        #end if
  93     #end if
  94   </command>
  95   <inputs>
  96     <conditional name="reference_source">
  97       <expand macro="reference_source_selector_param" />
  98       <when value="cached">
  99         <repeat name="input_bams" title="BAM file" min="1" help="-I,--input_file &amp;lt;input_file&amp;gt;">
 100             <param name="input_bam" type="data" format="bam" label="BAM file">
 101               <validator type="unspecified_build" />
 102               <validator type="dataset_metadata_in_data_table" table_name="gatk_picard_indexes" metadata_name="dbkey" metadata_column="dbkey" message="Sequences are not currently available for the specified build." /> <!-- fixme!!! this needs to be a select -->
 103             </param>
 104         </repeat>
 105         <param name="ref_file" type="select" label="Using reference genome" help="-R,--reference_sequence &amp;lt;reference_sequence&amp;gt;">
 106           <options from_data_table="gatk_picard_indexes">
 107             <!-- <filter type="data_meta" key="dbkey" ref="input_bam" column="dbkey"/> does not yet work in a repeat...--> 
 108           </options>
 109           <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
 110         </param>
 111       </when>
 112       <when value="history"> <!-- FIX ME!!!! -->
 113         <repeat name="input_bams" title="BAM file" min="1" help="-I,--input_file &amp;lt;input_file&amp;gt;">
 114             <param name="input_bam" type="data" format="bam" label="BAM file" >
 115             </param>
 116         </repeat>
 117         <param name="ref_file" type="data" format="fasta" label="Using reference file" help="-R,--reference_sequence &amp;lt;reference_sequence&amp;gt;" />
 118       </when>
 119     </conditional>
 120     
 121     <repeat name="rod_bind" title="Binding for reference-ordered data" help="-D,--dbsnp &amp;lt;dbsnp&amp;gt;">
 122         <conditional name="rod_bind_type">
 123           <param name="rod_bind_type_selector" type="select" label="Binding Type">
 124             <option value="dbsnp" selected="True">dbSNP</option>
 125             <option value="snps">SNPs</option>
 126             <option value="indels">INDELs</option>
 127             <option value="custom">Custom</option>
 128           </param>
 129           <when value="dbsnp">
 130               <param name="input_rod" type="data" format="vcf" label="ROD file" />
 131           </when>
 132           <when value="snps">
 133               <param name="input_rod" type="data" format="vcf" label="ROD file" />
 134           </when>
 135           <when value="indels">
 136               <param name="input_rod" type="data" format="vcf" label="ROD file" />
 137           </when>
 138           <when value="custom">
 139               <param name="custom_rod_name" type="text" value="Unknown" label="ROD Name"/>
 140               <param name="input_rod" type="data" format="vcf" label="ROD file" />
 141           </when>
 142         </conditional>
 143     </repeat>
 144     
 145     <param name="genotype_likelihoods_model" type="select" label="Genotype likelihoods calculation model to employ" help="-glm,--genotype_likelihoods_model &amp;lt;genotype_likelihoods_model&amp;gt;">
 146       <option value="BOTH" selected="True">BOTH</option>
 147       <option value="SNP">SNP</option>
 148       <option value="INDEL">INDEL</option>
 149     </param>
 150     
 151     <param name="standard_min_confidence_threshold_for_calling" type="float" value="30.0" label="The minimum phred-scaled confidence threshold at which variants not at 'trigger' track sites should be called" help="-stand_call_conf,--standard_min_confidence_threshold_for_calling &amp;lt;standard_min_confidence_threshold_for_calling&amp;gt;" />
 152     <param name="standard_min_confidence_threshold_for_emitting" type="float" value="30.0" label="The minimum phred-scaled confidence threshold at which variants not at 'trigger' track sites should be emitted (and filtered if less than the calling threshold)" help="-stand_emit_conf,--standard_min_confidence_threshold_for_emitting &amp;lt;standard_min_confidence_threshold_for_emitting&amp;gt;" />
 153 
 154     
 155     <expand macro="gatk_param_type_conditional" />
 156     
 157     <expand macro="analysis_type_conditional">
 158         <param name="p_nonref_model" type="select" label="Non-reference probability calculation model to employ" help="-pnrm,--p_nonref_model &amp;lt;p_nonref_model&amp;gt;">
 159           <option value="EXACT" selected="True">EXACT</option>
 160           <option value="GRID_SEARCH">GRID_SEARCH</option>
 161         </param>
 162         <param name="heterozygosity" type="float" value="1e-3" label="Heterozygosity value used to compute prior likelihoods for any locus" help="-hets,--heterozygosity &amp;lt;heterozygosity&amp;gt;" />
 163         <param name="pcr_error_rate" type="float" value="1e-4" label="The PCR error rate to be used for computing fragment-based likelihoods" help="-pcr_error,--pcr_error_rate &amp;lt;pcr_error_rate&amp;gt;" />
 164         <conditional name="genotyping_mode_type">
 165           <param name="genotyping_mode" type="select" label="How to determine the alternate allele to use for genotyping" help="-gt_mode,--genotyping_mode &amp;lt;genotyping_mode&amp;gt;">
 166             <option value="DISCOVERY" selected="True">DISCOVERY</option>
 167             <option value="GENOTYPE_GIVEN_ALLELES">GENOTYPE_GIVEN_ALLELES</option>
 168           </param>
 169           <when value="DISCOVERY">
 170             <!-- Do nothing here -->
 171           </when>
 172           <when value="GENOTYPE_GIVEN_ALLELES">
 173             <param name="input_alleles_rod" type="data" format="vcf" label="Alleles ROD file" help="-alleles,--alleles &amp;lt;alleles&amp;gt;" />
 174           </when>
 175         </conditional>
 176         <param name="output_mode" type="select" label="Should we output confident genotypes (i.e. including ref calls) or just the variants?" help="-out_mode,--output_mode &amp;lt;output_mode&amp;gt;">
 177           <option value="EMIT_VARIANTS_ONLY" selected="True">EMIT_VARIANTS_ONLY</option>
 178           <option value="EMIT_ALL_CONFIDENT_SITES">EMIT_ALL_CONFIDENT_SITES</option>
 179           <option value="EMIT_ALL_SITES">EMIT_ALL_SITES</option>
 180         </param>
 181         <param name="compute_SLOD" type="boolean" truevalue="--computeSLOD" falsevalue="" label="Compute the SLOD" help="--computeSLOD" />
 182         <param name="min_base_quality_score" type="integer" value="17" label="Minimum base quality required to consider a base for calling" help="-mbq,--min_base_quality_score &amp;lt;min_base_quality_score&amp;gt;" />
 183         <param name="max_deletion_fraction" type="float" value="0.05" label="Maximum fraction of reads with deletions spanning this locus for it to be callable" help="to disable, set to &lt; 0 or &gt; 1 (-deletions,--max_deletion_fraction &amp;lt;max_deletion_fraction&amp;gt;)" />
 184         <param name="max_alternate_alleles" type="integer" value="5" label="Maximum number of alternate alleles to genotype" help="-maxAlleles,--max_alternate_alleles &amp;lt;max_alternate_alleles&amp;gt;" />
 185         <param name="min_indel_count_for_genotyping" type="integer" value="5" label="Minimum number of consensus indels required to trigger genotyping run" help="-minIndelCnt,--min_indel_count_for_genotyping &amp;lt;min_indel_count_for_genotyping&amp;gt;" />
 186         <param name="indel_heterozygosity" type="float" value="0.000125" label="Heterozygosity for indel calling" help="1.0/8000==0.000125 (-indelHeterozygosity,--indel_heterozygosity &amp;lt;indel_heterozygosity&amp;gt;)"/>
 187         <param name="indelGapContinuationPenalty" type="float" value="10.0" label="Indel gap continuation penalty" help="--indelGapContinuationPenalty" />
 188         <param name="indelGapOpenPenalty" type="float" value="45.0" label="Indel gap open penalty" help="--indelGapOpenPenalty" />
 189         <param name="indelHaplotypeSize" type="integer" value="80" label="Indel haplotype size" help="--indelHaplotypeSize" />
 190         <param name="doContextDependentGapPenalties" type="boolean" truevalue="--doContextDependentGapPenalties" falsevalue="" label="Vary gap penalties by context" help="--doContextDependentGapPenalties" />
 191         <param name="annotation" type="select" multiple="True" display="checkboxes" label="Annotation Types" help="-A,--annotation &amp;lt;annotation&amp;gt;">
 192           <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs -->
 193           <options from_data_table="gatk_annotations">
 194             <filter type="multiple_splitter" column="tools_valid_for" separator=","/>
 195             <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/>
 196           </options>
 197         </param>
 198         <repeat name="additional_annotations" title="Additional annotation" help="-A,--annotation &amp;lt;annotation&amp;gt;">
 199           <param name="additional_annotation_name" type="text" value="" label="Annotation name" />
 200         </repeat>
 201 <!--
 202         <conditional name="snpEff_rod_bind_type">
 203           <param name="snpEff_rod_bind_type_selector" type="select" label="Provide a snpEff reference-ordered data file">
 204             <option value="set_snpEff">Set snpEff</option>
 205             <option value="exclude_snpEff" selected="True">Don't set snpEff</option>
 206           </param>
 207           <when value="exclude_snpEff">
 208           </when>
 209           <when value="set_snpEff">
 210             <param name="snpEff_input_rod" type="data" format="vcf" label="ROD file" />
 211             <param name="snpEff_rod_name" type="hidden" value="snpEff" label="ROD Name"/>
 212           </when>
 213         </conditional>
 214 -->
 215         <param name="group" type="select" multiple="True" display="checkboxes" label="Annotation Interfaces/Groups" help="-G,--group &amp;lt;group&amp;gt;">
 216             <option value="RodRequiringAnnotation">RodRequiringAnnotation</option>
 217             <option value="Standard">Standard</option>
 218             <option value="Experimental">Experimental</option>
 219             <option value="WorkInProgress">WorkInProgress</option>
 220             <option value="RankSumTest">RankSumTest</option>
 221             <!-- <option value="none">none</option> -->
 222         </param>
 223     <!--     <param name="family_string" type="text" value="" label="Family String"/> -->
 224         <param name="exclude_annotations" type="select" multiple="True" display="checkboxes" label="Annotations to exclude" help="-XA,--excludeAnnotation &amp;lt;excludeAnnotation&amp;gt;" >
 225           <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs -->
 226           <options from_data_table="gatk_annotations">
 227             <filter type="multiple_splitter" column="tools_valid_for" separator=","/>
 228             <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/>
 229           </options>
 230         </param>
 231         <param name="multiallelic" type="boolean" truevalue="--multiallelic" falsevalue="" label="Allow the discovery of multiple alleles (SNPs only)" help="--multiallelic" />
 232     </expand>
 233   </inputs>
 234   <outputs>
 235     <data format="vcf" name="output_vcf" label="${tool.name} on ${on_string} (VCF)" />
 236     <data format="txt" name="output_metrics" label="${tool.name} on ${on_string} (metrics)" />
 237     <data format="txt" name="output_log" label="${tool.name} on ${on_string} (log)" />
 238   </outputs>
 239   <trackster_conf/>
 240   <tests>
 241       <test>
 242           <param name="reference_source_selector" value="history" />
 243           <param name="ref_file" value="phiX.fasta" ftype="fasta" />
 244           <param name="input_bam" value="gatk/gatk_table_recalibration/gatk_table_recalibration_out_1.bam" ftype="bam" />
 245           <param name="rod_bind_type_selector" value="dbsnp" />
 246           <param name="input_rod" value="gatk/fake_phiX_variant_locations.vcf" ftype="vcf" />
 247           <param name="standard_min_confidence_threshold_for_calling" value="0" />
 248           <param name="standard_min_confidence_threshold_for_emitting" value="4" />
 249           <param name="gatk_param_type_selector" value="basic" />
 250           <param name="analysis_param_type_selector" value="advanced" />
 251           <param name="genotype_likelihoods_model" value="BOTH" />
 252           <param name="p_nonref_model" value="EXACT" />
 253           <param name="heterozygosity" value="0.001" />
 254           <param name="pcr_error_rate" value="0.0001" />
 255           <param name="genotyping_mode" value="DISCOVERY" />
 256           <param name="output_mode" value="EMIT_ALL_CONFIDENT_SITES" />
 257           <param name="compute_SLOD" />
 258           <param name="min_base_quality_score" value="17" />
 259           <param name="max_deletion_fraction" value="-1" />
 260           <param name="min_indel_count_for_genotyping" value="2" />
 261           <param name="indel_heterozygosity" value="0.000125" />
 262           <param name="indelGapContinuationPenalty" value="10" />
 263           <param name="indelGapOpenPenalty" value="3" />
 264           <param name="indelHaplotypeSize" value="80" />
 265           <param name="doContextDependentGapPenalties" />
 266           <!-- <param name="annotation" value="" />
 267           <param name="group" value="" /> -->
 268           <output name="output_vcf" file="gatk/gatk_unified_genotyper/gatk_unified_genotyper_out_1.vcf" lines_diff="4" /> 
 269           <output name="output_metrics" file="gatk/gatk_unified_genotyper/gatk_unified_genotyper_out_1.metrics" /> 
 270           <output name="output_log" file="gatk/gatk_unified_genotyper/gatk_unified_genotyper_out_1.log.contains" compare="contains" />
 271       </test>
 272   </tests>
 273   <help>
 274 **What it does**
 275 
 276 A variant caller which unifies the approaches of several disparate callers.  Works for single-sample and multi-sample data.  The user can choose from several different incorporated calculation models.
 277 
 278 For more information on the GATK Unified Genotyper, see this `tool specific page &lt;http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper&gt;`_.
 279 
 280 To learn about best practices for variant detection using GATK, see this `overview &lt;http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3&gt;`_.
 281 
 282 If you encounter errors, please view the `GATK FAQ &lt;http://www.broadinstitute.org/gsa/wiki/index.php/Frequently_Asked_Questions&gt;`_.
 283 
 284 ------
 285 
 286 **Inputs**
 287 
 288 GenomeAnalysisTK: UnifiedGenotyper accepts an aligned BAM input file.
 289 
 290 
 291 **Outputs**
 292 
 293 The output is in VCF format.
 294 
 295 
 296 Go `here &lt;http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK&gt;`_ for details on GATK file formats.
 297 
 298 -------
 299 
 300 **Settings**::
 301 
 302  genotype_likelihoods_model                        Genotype likelihoods calculation model to employ -- BOTH is the default option, while INDEL is also available for calling indels and SNP is available for calling SNPs only (SNP|INDEL|BOTH)
 303  p_nonref_model                                    Non-reference probability calculation model to employ -- EXACT is the default option, while GRID_SEARCH is also available. (EXACT|GRID_SEARCH)
 304  heterozygosity                                    Heterozygosity value used to compute prior likelihoods for any locus
 305  pcr_error_rate                                    The PCR error rate to be used for computing fragment-based likelihoods
 306  genotyping_mode                                   Should we output confident genotypes (i.e. including ref calls) or just the variants? (DISCOVERY|GENOTYPE_GIVEN_ALLELES)
 307  output_mode                                       Should we output confident genotypes (i.e. including ref calls) or just the variants? (EMIT_VARIANTS_ONLY|EMIT_ALL_CONFIDENT_SITES|EMIT_ALL_SITES)
 308  standard_min_confidence_threshold_for_calling     The minimum phred-scaled confidence threshold at which variants not at 'trigger' track sites should be called
 309  standard_min_confidence_threshold_for_emitting    The minimum phred-scaled confidence threshold at which variants not at 'trigger' track sites should be emitted (and filtered if less than the calling threshold)
 310  noSLOD                                            If provided, we will not calculate the SLOD
 311  min_base_quality_score                            Minimum base quality required to consider a base for calling
 312  max_deletion_fraction                             Maximum fraction of reads with deletions spanning this locus for it to be callable [to disable, set to &lt; 0 or &gt; 1; default:0.05]
 313  min_indel_count_for_genotyping                    Minimum number of consensus indels required to trigger genotyping run
 314  indel_heterozygosity                              Heterozygosity for indel calling
 315  indelGapContinuationPenalty                       Indel gap continuation penalty
 316  indelGapOpenPenalty                               Indel gap open penalty
 317  indelHaplotypeSize                                Indel haplotype size
 318  doContextDependentGapPenalties                    Vary gap penalties by context
 319  indel_recal_file                                  Filename for the input covariates table recalibration .csv file - EXPERIMENTAL, DO NO USE
 320  indelDebug                                        Output indel debug info
 321  out                                               File to which variants should be written
 322  annotation                                        One or more specific annotations to apply to variant calls
 323  group                                             One or more classes/groups of annotations to apply to variant calls
 324 
 325 @CITATION_SECTION@
 326   </help>
 327 </tool>

gatk_macros.xml:

   1 <macros>
   2   <template name="standard_gatk_options">      
   3     ##start standard gatk options
   4     #if $gatk_param_type.gatk_param_type_selector == "advanced":
   5         #for $pedigree in $gatk_param_type.pedigree:
   6             -p '--pedigree "${pedigree.pedigree_file}"'
   7         #end for
   8         #for $pedigree_string in $gatk_param_type.pedigree_string_repeat:
   9             -p '--pedigreeString "${pedigree_string.pedigree_string}"'
  10         #end for
  11         -p '--pedigreeValidationType "${gatk_param_type.pedigree_validation_type}"'
  12         #for $read_filter in $gatk_param_type.read_filter:
  13             -p '--read_filter "${read_filter.read_filter_type.read_filter_type_selector}"
  14             ###raise Exception( str( dir( $read_filter ) ) )
  15             #for $name, $param in $read_filter.read_filter_type.iteritems():
  16                 #if $name not in [ "__current_case__", "read_filter_type_selector" ]:
  17                     #if hasattr( $param.input, 'truevalue' ):
  18                         ${param}
  19                     #else:
  20                         --${name} "${param}"
  21                     #end if
  22                 #end if
  23             #end for
  24             '
  25         #end for
  26         #for $interval_count, $input_intervals in enumerate( $gatk_param_type.input_interval_repeat ):
  27             -d "--intervals" "${input_intervals.input_intervals}" "${input_intervals.input_intervals.ext}" "input_intervals_${interval_count}"
  28         #end for
  29         
  30         #for $interval_count, $input_intervals in enumerate( $gatk_param_type.input_exclude_interval_repeat ):
  31             -d "--excludeIntervals" "${input_intervals.input_exclude_intervals}" "${input_intervals.input_exclude_intervals.ext}" "input_exlude_intervals_${interval_count}"
  32         #end for
  33 
  34         -p '--interval_set_rule "${gatk_param_type.interval_set_rule}"'
  35         
  36         -p '--downsampling_type "${gatk_param_type.downsampling_type.downsampling_type_selector}"'
  37         #if str( $gatk_param_type.downsampling_type.downsampling_type_selector ) != "NONE":
  38             -p '--${gatk_param_type.downsampling_type.downsample_to_type.downsample_to_type_selector} "${gatk_param_type.downsampling_type.downsample_to_type.downsample_to_value}"'
  39         #end if
  40         -p '
  41         --baq "${gatk_param_type.baq}"
  42         --baqGapOpenPenalty "${gatk_param_type.baq_gap_open_penalty}"
  43         ${gatk_param_type.use_original_qualities}
  44         --defaultBaseQualities "${gatk_param_type.default_base_qualities}"
  45         --validation_strictness "${gatk_param_type.validation_strictness}"
  46         --interval_merging "${gatk_param_type.interval_merging}"
  47         ${gatk_param_type.disable_experimental_low_memory_sharding}
  48         ${gatk_param_type.non_deterministic_random_seed}
  49         '
  50         #for $rg_black_list_count, $rg_black_list in enumerate( $gatk_param_type.read_group_black_list_repeat ):
  51             #if $rg_black_list.read_group_black_list_type.read_group_black_list_type_selector == "file":
  52                 -d "--read_group_black_list" "${rg_black_list.read_group_black_list_type.read_group_black_list}" "txt" "input_read_group_black_list_${rg_black_list_count}"
  53             #else
  54                 -p '--read_group_black_list "${rg_black_list.read_group_black_list_type.read_group_black_list}"'
  55             #end if
  56         #end for
  57     #end if
  58     
  59     #if str( $reference_source.reference_source_selector ) == "history":
  60         -d "-R" "${reference_source.ref_file}" "${reference_source.ref_file.ext}" "gatk_input"
  61     #end if
  62     ##end standard gatk options
  63   </template>
  64   <xml name="gatk_param_type_conditional">
  65     <conditional name="gatk_param_type">
  66       <param name="gatk_param_type_selector" type="select" label="Basic or Advanced GATK options">
  67         <option value="basic" selected="True">Basic</option>
  68         <option value="advanced">Advanced</option>
  69       </param>
  70       <when value="basic">
  71         <!-- Do nothing here -->
  72       </when>
  73       <when value="advanced">
  74         <repeat name="pedigree" title="Pedigree file" help="-ped,--pedigree &amp;lt;pedigree&amp;gt;">
  75             <param name="pedigree_file" type="data" format="txt" label="Pedigree files for samples"/>
  76         </repeat>
  77         <repeat name="pedigree_string_repeat" title="Pedigree string" help="-pedString,--pedigreeString &amp;lt;pedigreeString&amp;gt;">
  78             <param name="pedigree_string" type="text" value="" label="Pedigree string for samples"/>
  79         </repeat>
  80         <param name="pedigree_validation_type" type="select" label="How strict should we be in validating the pedigree information" help="-pedValidationType,--pedigreeValidationType &amp;lt;pedigreeValidationType&amp;gt;">
  81           <option value="STRICT" selected="True">STRICT</option>
  82           <option value="SILENT">SILENT</option>
  83         </param>
  84         <repeat name="read_filter" title="Read Filter" help="-rf,--read_filter &amp;lt;read_filter&amp;gt;">
  85             <conditional name="read_filter_type">
  86               <param name="read_filter_type_selector" type="select" label="Read Filter Type">
  87                 <option value="BadCigar">BadCigar</option>
  88                 <option value="BadMate">BadMate</option>
  89                 <option value="DuplicateRead">DuplicateRead</option>
  90                 <option value="FailsVendorQualityCheck">FailsVendorQualityCheck</option>
  91                 <option value="MalformedRead">MalformedRead</option>
  92                 <option value="MappingQuality">MappingQuality</option>
  93                 <option value="MappingQualityUnavailable">MappingQualityUnavailable</option>
  94                 <option value="MappingQualityZero">MappingQualityZero</option>
  95                 <option value="MateSameStrand">MateSameStrand</option>
  96                 <option value="MaxInsertSize">MaxInsertSize</option>
  97                 <option value="MaxReadLength" selected="True">MaxReadLength</option>
  98                 <option value="MissingReadGroup">MissingReadGroup</option>
  99                 <option value="NoOriginalQualityScores">NoOriginalQualityScores</option>
 100                 <option value="NotPrimaryAlignment">NotPrimaryAlignment</option>
 101                 <option value="Platform454">Platform454</option>
 102                 <option value="Platform">Platform</option>
 103                 <option value="PlatformUnit">PlatformUnit</option>
 104                 <option value="ReadGroupBlackList">ReadGroupBlackList</option>
 105                 <option value="ReadName">ReadName</option>
 106                 <option value="ReadStrand">ReadStrand</option>
 107                 <option value="ReassignMappingQuality">ReassignMappingQuality</option>
 108                 <option value="Sample">Sample</option>
 109                 <option value="SingleReadGroup">SingleReadGroup</option>
 110                 <option value="UnmappedRead">UnmappedRead</option>
 111               </param>
 112               <when value="BadCigar">
 113                   <!-- no extra options -->
 114               </when>
 115               <when value="BadMate">
 116                   <!-- no extra options -->
 117               </when>
 118               <when value="DuplicateRead">
 119                   <!-- no extra options -->
 120               </when>
 121               <when value="FailsVendorQualityCheck">
 122                   <!-- no extra options -->
 123               </when>
 124               <when value="MalformedRead">
 125                   <!-- no extra options -->
 126               </when>
 127               <when value="MappingQuality">
 128                   <param name="min_mapping_quality_score" type="integer" value="10" label="Minimum read mapping quality required to consider a read for calling"/>
 129               </when>
 130               <when value="MappingQualityUnavailable">
 131                   <!-- no extra options -->
 132               </when>
 133               <when value="MappingQualityZero">
 134                   <!-- no extra options -->
 135               </when>
 136               <when value="MateSameStrand">
 137                   <!-- no extra options -->
 138               </when>
 139               <when value="MaxInsertSize">
 140                   <param name="maxInsertSize" type="integer" value="1000000" label="Discard reads with insert size greater than the specified value"/>
 141               </when>
 142               <when value="MaxReadLength">
 143                   <param name="maxReadLength" type="integer" value="76" label="Max Read Length"/>
 144               </when>
 145               <when value="MissingReadGroup">
 146                   <!-- no extra options -->
 147               </when>
 148               <when value="NoOriginalQualityScores">
 149                   <!-- no extra options -->
 150               </when>
 151               <when value="NotPrimaryAlignment">
 152                   <!-- no extra options -->
 153               </when>
 154               <when value="Platform454">
 155                   <!-- no extra options -->
 156               </when>
 157               <when value="Platform">
 158                   <param name="PLFilterName" type="text" value="" label="Discard reads with RG:PL attribute containing this string"/>
 159               </when>
 160               <when value="PlatformUnit">
 161                   <!-- no extra options -->
 162               </when>
 163               <when value="ReadGroupBlackList">
 164                   <!-- no extra options -->
 165               </when>
 166               <when value="ReadName">
 167                   <param name="readName" type="text" value="" label="Filter out all reads except those with this read name"/>
 168               </when>
 169               <when value="ReadStrand">
 170                   <param name="filterPositive" type="boolean" truevalue="--filterPositive" falsevalue="" label="Discard reads on the forward strand"/>
 171               </when>
 172               <when value="ReassignMappingQuality">
 173                   <param name="default_mapping_quality" type="integer" value="60" label="Default read mapping quality to assign to all reads"/>
 174               </when>
 175               <when value="Sample">
 176                   <param name="sample_to_keep" type="text" value="" label="The name of the sample(s) to keep, filtering out all others"/>
 177               </when>
 178               <when value="SingleReadGroup">
 179                   <param name="read_group_to_keep" type="integer" value="76" label="The name of the read group to keep, filtering out all others"/>
 180               </when>
 181               <when value="UnmappedRead">
 182                   <!-- no extra options -->
 183               </when>
 184             </conditional>
 185         </repeat>
 186         <repeat name="input_interval_repeat" title="Operate on Genomic intervals" help="-L,--intervals &amp;lt;intervals&amp;gt;">
 187           <param name="input_intervals" type="data" format="bed,gatk_interval,picard_interval_list,vcf" label="Genomic intervals" />
 188         </repeat>
 189         <repeat name="input_exclude_interval_repeat" title="Exclude Genomic intervals" help="-XL,--excludeIntervals &amp;lt;excludeIntervals&amp;gt;">
 190           <param name="input_exclude_intervals" type="data" format="bed,gatk_interval,picard_interval_list,vcf" label="Genomic intervals" />
 191         </repeat>
 192         
 193         <param name="interval_set_rule" type="select" label="Interval set rule" help="-isr,--interval_set_rule &amp;lt;interval_set_rule&amp;gt;">
 194           <option value="UNION" selected="True">UNION</option>
 195           <option value="INTERSECTION">INTERSECTION</option>
 196         </param>
 197         
 198         <conditional name="downsampling_type">
 199           <param name="downsampling_type_selector" type="select" label="Type of reads downsampling to employ at a given locus" help="-dt,--downsampling_type &amp;lt;downsampling_type&amp;gt;">
 200             <option value="NONE" selected="True">NONE</option>
 201             <option value="ALL_READS">ALL_READS</option>
 202             <option value="BY_SAMPLE">BY_SAMPLE</option>
 203           </param>
 204           <when value="NONE">
 205               <!-- no more options here -->
 206           </when>
 207           <when value="ALL_READS">
 208               <conditional name="downsample_to_type">
 209                   <param name="downsample_to_type_selector" type="select" label="Downsample method">
 210                       <option value="downsample_to_fraction" selected="True">Downsample by Fraction</option>
 211                       <option value="downsample_to_coverage">Downsample by Coverage</option>
 212                   </param>
 213                   <when value="downsample_to_fraction">
 214                       <param name="downsample_to_value" type="float" label="Fraction [0.0-1.0] of reads to downsample to" value="1" min="0" max="1" help="-dfrac,--downsample_to_fraction &amp;lt;downsample_to_fraction&amp;gt;"/>
 215                   </when>
 216                   <when value="downsample_to_coverage">
 217                       <param name="downsample_to_value" type="integer" label="Coverage to downsample to at any given locus" value="0" help="-dcov,--downsample_to_coverage &amp;lt;downsample_to_coverage&amp;gt;"/>
 218                   </when>
 219               </conditional>
 220           </when>
 221           <when value="BY_SAMPLE">
 222               <conditional name="downsample_to_type">
 223                   <param name="downsample_to_type_selector" type="select" label="Downsample method">
 224                       <option value="downsample_to_fraction" selected="True">Downsample by Fraction</option>
 225                       <option value="downsample_to_coverage">Downsample by Coverage</option>
 226                   </param>
 227                   <when value="downsample_to_fraction">
 228                       <param name="downsample_to_value" type="float" label="Fraction [0.0-1.0] of reads to downsample to" value="1" min="0" max="1" help="-dfrac,--downsample_to_fraction &amp;lt;downsample_to_fraction&amp;gt;"/>
 229                   </when>
 230                   <when value="downsample_to_coverage">
 231                       <param name="downsample_to_value" type="integer" label="Coverage to downsample to at any given locus" value="0" help="-dcov,--downsample_to_coverage &amp;lt;downsample_to_coverage&amp;gt;"/>
 232                   </when>
 233               </conditional>
 234           </when>
 235         </conditional>
 236         <param name="baq" type="select" label="Type of BAQ calculation to apply in the engine" help="-baq,--baq &amp;lt;baq&amp;gt;">
 237           <option value="OFF" selected="True">OFF</option>
 238           <option value="CALCULATE_AS_NECESSARY">CALCULATE_AS_NECESSARY</option>
 239           <option value="RECALCULATE">RECALCULATE</option>
 240         </param>
 241         <param name="baq_gap_open_penalty" type="float" label="BAQ gap open penalty (Phred Scaled)" value="40" help="Default value is 40. 30 is perhaps better for whole genome call sets. -baqGOP,--baqGapOpenPenalty &amp;lt;baqGapOpenPenalty&amp;gt;" />
 242         <param name="use_original_qualities" type="boolean" truevalue="--useOriginalQualities" falsevalue="" label="Use the original base quality scores from the OQ tag" help="-OQ,--useOriginalQualities" />
 243         <param name="default_base_qualities" type="integer" label="Value to be used for all base quality scores, when some are missing" value="-1" help="-DBQ,--defaultBaseQualities &amp;lt;defaultBaseQualities&amp;gt;"/>
 244         <param name="validation_strictness" type="select" label="How strict should we be with validation" help="-S,--validation_strictness &amp;lt;validation_strictness&amp;gt;">
 245           <option value="STRICT" selected="True">STRICT</option>
 246           <option value="LENIENT">LENIENT</option>
 247           <option value="SILENT">SILENT</option>
 248           <!-- <option value="DEFAULT_STRINGENCY">DEFAULT_STRINGENCY</option> listed in docs, but not valid value...-->
 249         </param>
 250         <param name="interval_merging" type="select" label="Interval merging rule" help="-im,--interval_merging &amp;lt;interval_merging&amp;gt;">
 251           <option value="ALL" selected="True">ALL</option>
 252           <option value="OVERLAPPING_ONLY">OVERLAPPING_ONLY</option>
 253         </param>
 254         
 255         <repeat name="read_group_black_list_repeat" title="Read group black list" help="-rgbl,--read_group_black_list &amp;lt;read_group_black_list&amp;gt;">
 256           <conditional name="read_group_black_list_type">
 257             <param name="read_group_black_list_type_selector" type="select" label="Type of reads read group black list">
 258               <option value="file" selected="True">Filters in file</option>
 259               <option value="text">Specify filters as a string</option>
 260             </param>
 261             <when value="file">
 262               <param name="read_group_black_list" type="data" format="txt" label="Read group black list file" />
 263             </when>
 264             <when value="text">
 265               <param name="read_group_black_list" type="text" value="tag:string" label="Read group black list tag:string" />
 266             </when>
 267           </conditional>
 268         </repeat>
 269         
 270         <param name="disable_experimental_low_memory_sharding" type="boolean" truevalue="--disable_experimental_low_memory_sharding" falsevalue="" label="Disable experimental low-memory sharding functionality." checked="False" help="--disable_experimental_low_memory_sharding"/>
 271         <param name="non_deterministic_random_seed" type="boolean" truevalue="--nonDeterministicRandomSeed" falsevalue="" label="Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run" checked="False"  help="-ndrs,--nonDeterministicRandomSeed"/>
 272         
 273       </when>
 274     </conditional>    
 275   </xml>
 276   <xml name="analysis_type_conditional">
 277     <conditional name="analysis_param_type">
 278       <param name="analysis_param_type_selector" type="select" label="Basic or Advanced Analysis options">
 279         <option value="basic" selected="True">Basic</option>
 280         <option value="advanced">Advanced</option>
 281       </param>
 282       <when value="basic">
 283         <!-- Do nothing here -->
 284       </when>
 285       <when value="advanced">
 286         <yield />
 287       </when>
 288     </conditional>
 289   </xml>
 290   <xml name="reference_source_selector_param">
 291     <param name="reference_source_selector" type="select" label="Choose the source for the reference list">
 292       <option value="cached">Locally cached</option>
 293       <option value="history">History</option>
 294     </param>
 295   </xml>
 296   <token name="@CITATION_SECTION@">------
 297 
 298 **Citation**
 299 
 300 For the underlying tool, please cite `DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011 May;43(5):491-8. &lt;http://www.ncbi.nlm.nih.gov/pubmed/21478889&gt;`_
 301 
 302 If you use this tool in Galaxy, please cite Blankenberg D, et al. *In preparation.*
 303 
 304   </token>
 305 </macros>

Exercises: Can you use Macros to simplify the inclusion of common tool content between the various phases of the Hello World examples?

16:45 - 17:00

Open questions and Free play.

Some suggestions for exploration (http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax):

  • Config files
  • Validators
  • Defining Datatypes and Metadata
    • Composite Datatypes
  • Parameter sanitizers
  • Advanced Data source tool configuration
  • Dynamic Select parameters
  • Customizing output attributes
    • Labels
    • output <actions> (e.g. see tools/filters/cutWrapper.xml)

17:00 session ends

Questions? Contact the Organizers.