Locked History Actions

ToolShed/Contributions/2016_05

Galaxy ToolShed

Tools contributed to the Galaxy Project Tool Shed in April and May 2016.

New Tools

unrestricted

  • From insilico-bob:

    • mean_center_matrix: mean-center a matrix with header row and 1st column containing labels Mean-center a matrix with header row and 1st column with labels.

      • Assumes Labels are in row 1 and in column 1
        Mean-center all values in a row (cell value = cell value - row mean value)
        Repeat for all rows 2 - N+1.

    • ngchm: Generate clustered Heatmaps with optional co-variate bars. Generate a clustered Heatmap from NGCHM data, or other data matrices, with many methods to choose from for clustering. Also, multiple category/co-variate bars may be added to either the columns or rows. The output is a zip file that can be displayed in Galaxy via the visualize icon at the bottom of the output file in the History ( near the save, information "I", rerun, then the visualize icon. Click the icon and the heatmap displays in the Galaxy middle region.

      • The input matrix is assume to have both the first column and the first row containing labels
        Any input co-variate bar files must have the same number of labels as in the input matrix's row or column labels (whichever the co-variate bar is to map to).
        The input matrix is assume to have both the first column and the first row containing labels.
        Any co-variate bar files must have the same number of labels as in the input matrix's row or column labels (whichever the co-variate bar is to map to).

  • From engineson:

    • multiqc: MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

  • From alenail:

  • From stemcellcommons:

    • chipseq_workflows: hg19 workflow. ChIP-seq workflows annotated for use with Refinery Platform

    • fastqc_workflow: FastQC workflow designed for use with Refinery Platform

  • From crique:

    • phylogenetic_analysis: Phylogenetic analysis using PhyML. PhyML is a software package which primary task that is to estimate maximum likelihood phylogenies from alignments of nucleotide or amino acid sequences. It provides a wide range of options that were designed to facilitate standard phylogenetic analyses. (Guindon, S., & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology, 52(5), 696-704)

  • From rnateam:

    • rnacommender: RNAcommender is a tool for genome-wide recommendation of RNA-protein interactions. RNAcommender is a tool for genome-wide recommendation of RNA-protein interactions. It is a recommender system capable of suggesting RNA targets to unexplored RNA binding proteins, by propagating the available interaction information, taking into account the protein domain composition and the RNA predicted secondary structure.

  • From ambarishk:

    • mytoolshed: Toolshed to make required tools shreable and installable.

  • From bebatut:

    • fasta_add_barcode: Add barcodes at begining of FASTA sequences Add barcodes at begining of FASTA sequences

    • format_cd_hit_output: Format CD-hit output to rename representative sequences with cluster name and/or extract distribution inside clusters given a mapping file

    • format_metaphlan2_output: Format MetaPhlAn2 output to extract abundance at different taxonomic levels

    • compute_wilcoxon_test: Compute Wilcoxon test with R

    • extract_min_max_lines: Extract lines corresponding with minimum and maximum values of a column

    • plot_grouped_barplot: Plot a grouped barplot graphic using R

    • convert_extract_sequence_file: Convert/ Extract information from a sequence file, with possible constraints

    • combine_metaphlan2_humann2: Combine MetaPhlAn2 and HUMAnN2 outputs to relate genus/species abundances and gene families/pathways abundances

    • plot_generic_x_y_plot: Plot a generic X-Y plot graphic using R

    • export2graphlan: export2graphlan is a conversion software tool for producing both annotation and tree file for GraPhlAn

    • graphlan: GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees

    • humann2: HUMAnN2 is a pipeline for efficiently and accuretly profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data.

    • group_humann2_uniref_abundances_to_go: Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances

    • normalize_dataset: Normalize a dataset by row or column sum

    • plot_barplot: Plot a barplot graphic using R

    • compare_humann2_output: Compare outputs of HUMAnN2 for several samples and extract similar and specific information

    • cdhit: Cd-Hit is a very widely used program for clustering and comparing protein or nucleotide sequences

  • From bornea:

    • query_crapome: This program will read in a SAINT formatted *Prey* file or a single column list (no column name) of Uniprot accessions (e.g. "P00533" or "EGFR_HUMAN"), query the CRAPome database (v1.1) http://crapome.org, and return a file specifying the prevalence of each protein in the CRAPome.

    • nsaf_scoring: This program performs the file merging as well as a calculating CRAPomePCT, NSAF and NSAF score.

  • From bgruening:

    • openmg: Open Molecule Generator Open Molecule Generator - an exhaustive generation of chemical structures

  • From takakoron:

  • From galaxyp:

    • hardklor: Hardkl\u00f6r Identifies peptide or protein-like features in mass spectra, deconvolves overlapping ion signals, and can be used on a variety of input formats. https://proteome.gs.washington.edu/software/hardklor/

    • msconvert_nix: Convert and/or filter mass spectrometry files on linux or MacOSX Proteowizard msconvert. Convert and/or filter mass spectrometry files. This does not contain vendor proprietary libraries, so it can only be used on publicly available file formats. http://proteowizard.sourceforge.net

    • asms_tutorial_2016: ASMS 2016 Tutorial Workflows Workflow for PeptideShaker PSM REPORT to NOVEL peptides

    • morpheus: Morpheus MS Search Application Morpheus database search algorithm for high-resolution tandem mass spectra. Features automatic inclusion of known PTMs (Post Translational Modifications) when a UniProt Proteome is used as the search database. https://github.com/cwenger/Morpheus/

    • proteogenomics_splice_junc_search_db: Generate Novel Splice Junction Search DBs from RNAseq Workflows generate a protein fasta and a bed file for novel splice junctions identified from RNAseq data

    • idconvert: Convert mass spectrometry identification files on linux or MacOSX Proteowizard idconvert. Convert mass spectrometry identification files: mzIdentML, protXML, pepXML. http://proteowizard.sourceforge.net

    • msconvert_win: msconvert. Convert and/or filter mass spectrometry files (including vendor formats) on Windows OS Proteowizard msconvert. Convert and/or filter mass spectrometry files on a Windows OS. This can contain vendor proprietary libraries for vendor file formats. A Windows admin needs to install proteowizard and vendor DLL. http://proteowizard.sourceforge.net

  • From mandorodriguez:

    • endsid_gene_name_append: Adds the real Gene Name to a row with EnsId identifiers using a file with a mapping. Takes in a file with gene expression data and a file that has a mapping of EnsGene IDs to gene names and adds the real gene name to the row with the matching EnsID.

  • From tiagoantao:

  • From aafc-mbb:

    • kurator: The wrapper tool to invoke Kurator package

  • From iuc:

  • From ulfschaefer:

    • phephenix: Public Health England SNP calling Pipeline

  • From iarc:

    • mutspec: mutation spectra anlysis tool suite mutation spectra anlysis tool suite

  • From drosofff:

    • metavisitor_workflows: A collection of workflows using the Metavisitor tools This Tool Shed Repository contains a collection of workflows using the Metavisitor tools.

  • From urgi-team:

    • workflow_teiso: TEiso RNA_seq

    • teiso: TEiso TEiso is a python script that allows to find distance between the element transposable and TSS of isoforms.

  • From brenninc:

    • data_manager_gene_transfer_by_path: Copies a path to a gene_transfer Data Table Copies a path to a gene_transfer Data Table.
      Check the file exists but not that it is the correct format,

    • data_manager_tagdust_architecture: Sets up architecture files to be used by tagdust Sets up architecture files to be used by tagdust. Creates the architecture file and adds it to the tagdust_architecture Data Table

    • sync_paired_end_reads: synchronise paired-end reads based on https://github.com/mmendez12/sync_paired_end_reads sync_paired_end_reads is a python tool to synchronise paired-end reads when reads1 or reads2 were modified. When working with paired-end sequencing data, it is common to filter out reads that do not pass basic quality controls. This leads to pairs that are not synced anymore. This tool streams reads1 and search for the associated read2 in reads2. Additionally it synchronises the sequence identifiers of the reads so if a software modified the sequence identifier of the reads1 then same identifiers will be used for reads2. Finally it replaces all space characters by an arbitrary '_' pattern. This tool was mainly developed to process the output of tagdust2 when ran in single-end mode which appends the UMIs found in the raw sequences to the sequence identifier. Source: https://github.com/mmendez12/sync_paired_end_reads

    • samtools_flag_filter_1_2: first version Usel Samtools view to filter by flag and sort resulting bam file Runs samtools view with the -f and _F options to allow filtering on the presence or absence of a bit in the Flag column. The sorts the resulting bam file in the order required by Galaxy

    • umicount: first version Remove and count PCR duplicates using https://github.com/mmendez12/umicount Remove and count PCR duplicates from paired-end libraries prepared with unique molecular identifiers (UMIs). "umicount is a collection of Python scripts which allows to remove and count PCR duplicates from paired-end libraries prepared with unique molecular identifiers (UMIs). The main difference between existing approaches (rmdup or MarkDuplicates) is that it uses UMI and Transcription Start Site information to remove duplicates rather than the reads size. It was mainly developed for single-cell CAGE and single-cell nanoCAGE protocols where a tagmentation step is performed between two PCRs. Source: https://github.com/mmendez12/umicount

    • subread_featurecounts1_5_0_p1: Runs http://subread.sourceforge.net/ featurecount tool Runs http://subread.sourceforge.net featureCounts: an efficient general purpose program for assigning sequence reads to genomic features

    • tagdust_2_31: first version of tagdust Runs tools based on http://sourceforge.net/projects/tagdust "TagDust allows users to specify the expected architecture of a read and converts it into a hidden Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are automatically discarded" Source: http://sourceforge.net/projects/tagdust Both single and paired ends available. Depends on: https://testtoolshed.g2.bx.psu.edu/view/brenninc/data_manager_tagdust_architecture

    • data_manager_for_directory_data: Loads a Directory and Extension Pair into the directory_data Data Table Loads a Directory and Extension Pair into the directory_data Data Table

    • pairedbamtobed12: first version Converts Ban files to bed 12 using https://github.com/Population-Transcriptomics/pairedBamToBed12 pairedBamToBed12 converts properly paired BAM alignments to BED12 format. Typical proper pairs will be represented by a 2 blocks BED12 entry. Additional blocks are produced when an alignment contains long deletion (CIGAR N-op). Thickness indicates the first read of the pair. The BAM input file must be grouped/sorted by query name (not alignment position). Source:https://github.com/Population-Transcriptomics/pairedBamToBed12

    • preconfigured_directory_reader: first version Reads files from a preconfigure directory on the server into a Data Collection Reads file from a preconfigure directory on the server. Loads all files from a server directory into a Data Collection and also provides a text fie with the original names. Depends on the directory_data Data Table. See https://testtoolshed.g2.bx.psu.edu/view/brenninc/data_manager_for_directory_data. Only preconfigured combination of path and extension work. Files will have their extension changed to one expected by galaxy and can be decompressed as set in the Data Table.

    • directory_reader_limited_by_data_table: first version Reads files into a data collection from a preconfigure directory on the server. Reads files from a preconfigure directory on the server. Loads all specified files from a server directory into a Data Collection and also provides a text fie with the original names. Depends on the directory_data Data Table. See https://toolshed.g2.bx.psu.edu/view/brenninc/data_manager_for_directory_data. Only preconfigured combination of path and extension work. Files will have their extension changed to one expected by galaxy and can be decompressed as set in the Data Table. Users have the ability to limit the files by prefix (start of name) and postfix (last bit of the name before the extension)

    • data_manager_all_fasta_path: original version Creates a link in the all_fasta data table to a fasta file on the server Creates a link in the all_fasta data table to a fasta file on the server. Checks the file exists but not that it is a readable fasta file.

    • package_tagdust_2_31: Installs code from http://sourceforge.net/projects/tagdust Version 2_31 Installs code from http://sourceforge.net/projects/tagdust Version 2_31. Or to be completely correct a http://sourceforge.net/projects/tagdust with a very minor correction. "Dust allows users to specify the expected architecture of a read and converts it into a hidden Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are automatically discarded:" Source: http://tagdust.sourceforge.net/

    • bedtools_bedtobam: Bed to Bam from bedtools with samtools sort at the end Runs BedToBam and the sort as galaxy no longer accepts unsorted bam files. Based on https://toolshed.g2.bx.psu.edu/view/iuc/bedtools/f8b7dc21b4ee

  • From epigenome:

    • history_summary: summarize current history contents and rename output files HistorySummary generates a html for summarizing the current history contents along with renaming the output files

  • From nturaga:

    • minfi_tools: A suite of Galaxy tools for minfi: Analyze Illumina's 450k methylation arrays Minfi package version 1.16.0, Analyze illumina human methyaltion 450K arrays

    • minfi_analyze_tcga: Wrapper for minfi tool: Minfi Analysis Pipeline Minfi package version 1.16.0, Analyze illumina human methyaltion 450K arrays

    • minfi_pipeline: Wrapper for minfi tool: Minfi pipeline Minfi package version 1.16.0, Analyze illumina human methyaltion 450K arrays

  • From portiahollyoak:

    • pindel2vcf: Pindel2Vcf converts Pindel output files to VCF format Pindel2Vcf converts Pindel output files to VCF format

    • temp: TEMP is a software package for detecting transposable elements (TEs) insertions and absences from pooled high-throughput sequencing data TEMP is a software package for detecting transposable elements (TEs) insertions and absences from pooled high-throughput sequencing data

    • change_fasta_header_using_tabular_file: Change fasta header using a tabular file This tool takes 2 input files, a tabular file with text to replace in the first column, and a replacement text in the 2nd column, as well as a fasta file. Every occurence of values in the first column of the tabular file will be replaced with the value in the 2nd column

    • genbank_to_fasta: This tool converts a multigenbank file into a multifasta file This tool converts a multigenbank file into a multifasta file

    • pindel: Pindel detects genome-wide structural variation. Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

    • breakdancer_max: !Breakdancer provides genome-wide detection of structural variation. BreakDancer (previously BreakDancerMax) provides genome-wide detection of five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

    • gff_feature_colours: Adds assigned feature colours Takes unique features in 3rd column of GFF file, allows user to assign a colour to each and then adds colour to 9th column for visualisation in a genome browser.

  • From dereeper:

    • snmf: Fast and efficient program for estimating individual admixture coefficients Fast and efficient program for estimating individual admixture coefficients based on sparse non-negative matrix factorization and population genetics.

  • From scottx611x:

  • From marie-tremblay-metatoul:

  • From george-weingart:

    • metaphlan2_hutlab: metaphlan2 Huttenhower Lab: Initial upload MetaPhlAn2 Huttenhower Lab: Computational tool for profiling the composition of microbial communities MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution. From version 2.0, MetaPhlAn is also able to identify specific strains (in the not-so-frequent cases in which the sample contains a previously sequenced strains) and to track strains across samples for all species.

tool_dependency_definition

  • From brenninc:

    • package_subread_1_5_0_p1: Installls the Subread software package https://sourceforge.net/projects/subread/files/subread-1.5.0-p1. "The Subread software package is a tool kit for processing next-gen sequencing data. It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program. Subread aligner can be used to align both gDNA-seq and RNA-seq reads. Subjunc aligner was specified designed for the detection of exon-exon junction. For the mapping of RNA-seq reads, Subread performs local alignments and Subjunc performs global alignments. Subread and Subjunc were published in the following paper: Yang Liao, Gordon K Smyth and Wei Shi. "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote", Nucleic Acids Research, 2013, 41(10):e108" Source: https://sourceforge.net/projects/subread/

    • package_pairedbamtobed12: first version pairedBamToBed12 converts properly paired BAM alignments to BED12 format. pairedBamToBed12 converts properly paired BAM alignments to BED12 format. Typical proper pairs will be represented by a 2 blocks BED12 entry. Additional blocks are produced when an alignment contains long deletion (CIGAR N-op). Thickness indicates the first read of the pair. The BAM input file must be grouped/sorted by query name (not alignment position). Source: https://github.com/Population-Transcriptomics/pairedBamToBed12

  • From aafc-mbb:

  • From iuc:

    • package_python_2_7_fisher_0_1_4: Contains a tool dependency definition that downloads and compiles version 0.1.4 of python fisher package Contains a tool dependency definition that downloads and compiles version 0.1.4 of python fisher package

    • package_python_2_7_xlsxwriter_0_8_5: Contains a tool dependency definition that downloads and compiles version 0.8.5 of python XlsxWriter package Contains a tool dependency definition that downloads and compiles version 0.8.5 of python XlsxWriter package

    • package_python_2_7_six_1_10_0: Contains a tool dependency definition that downloads and compiles version 1.10.0 of python six package Contains a tool dependency definition that downloads and compiles version 1.10.0 of python six package

    • package_python_2_7_wget_3_2: Contains a tool dependency definition that downloads and compiles version 3.2 of python wget package Contains a tool dependency definition that downloads and compiles version 3.2 of python wget package

  • From ulfschaefer:

  • From iarc:

    • package_r_mutspec_0_1: Contains a tool dependency definition for mutspec Contains a tool dependency definition for mutspec

  • From wolma:

    • package_python_3_4_x_lean: A lean build of Python 3.4.x. This package receives bug fixes within the 3.4 release series! Currently, it contains the zlib and sqlite3 modules as the only stdlib modules with external dependencies (handled in here by depending on package_zlib_1_2_8 and package_sqlite_3_8_3). In particular, this build does not compile Python's ssl module (which would cause dependency on openssl and, in turn, on Perl). This means that **the pip installation tool will not be available with this build** !! For a full build (including the ssl module) look at https://toolshed.g2.bx.psu.edu/view/iuc/package_python_3_4/ issued by the IUC.

  • From vlefort:

    • package_phyml_3_1: Imported from capsule None PhyML 3.0: new algorithms, methods and utilities to estimate maximum-likelihood phylogenies

  • From dereeper: