Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).
  • Subjunc: an RNA-seq aligner suitable for all purposes of RNA-seq analyses.
  • featureCounts: a highly efficient and accurate read summarization program.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.

CHANGELOG AND NEWS

Release 1.5.0-p3, 27 May 2016
  • Fixed a bug associated with '--allJunctions' option in Subjunc aligner.
  • Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).
  • Resoved an issue of concurrently opening a large number of files in featureCounts.
  • Improved processing of 'H' operations in CIGAR strings in featureCounts.

  • Release 1.5.0-p2, 14 Apr 2016
  • featureCounts
    • Fix a bug in processing long header lines in SAM/BAM files.
    • Depreciated the '-S < ff:fr:rf >' option.
    • The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.
  • subread-align and subjunc
    • Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
    • Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
    • Fixed a bug in soft-clipping read bases.
    • Improved screen output.

  • Release 1.5.0-p1, 18 Dec 2015
  • featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.
  • Report counts for exon-exon junctions by featureCounts ('-J' option).
  • Improved parsing of gzipped fastq files in Subread and Subjunc aligners.
  • Bug fixes.

  • Release 1.5.0, 29 Oct 2015
  • featureCounts
    • Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
    • A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.
  • subread-align and subjunc
    • New parameter '--type' for sequencing-type-specific mapping optimization.
    • New parameter '--sv' for detection of structural variant breakpoints.
    • New parameter '--complexIndels' for detection of complex indels.
    • Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
    • Improved detection of multi-mapping reads by considering locations that receive second highest votes.
    • When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
    • Default output format is set to BAM.
  • subread-buildindex
    • Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.

  • ChangeLog history

    Download and Installation

  • Latest version 1.5.0-p3
  • All the versions
  • Installation instructions
  • Mailing lists

  • Subread Users Group
  • Tutorials and Users Guide

  • A short tutorial on Subread
  • A short tutorial on Subjunc
  • A short tutorial on featureCounts
  • A short tutorial on exactSNP
  • A case study for analyzing RNA-seq data
  • Users Guide
  • Publications

  • Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
  • Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
  • Scientific publications citing our methods

  • Publications that cite Subread/Subjunc
  • Publications that cite featureCounts
  • Resources

  • Read count tables for Pickrell dataset and Montgomery dataset, both published in Nature in 2010.
  • Read count table for TCGA data.
  • Read count data from SEQC/MAQC III study.
  • Links

  • Bioconductor R package Rsubread
  • Bioconductor R package seqc
  • WEHI Bioinformatics
  • Contact

    Dr. Wei Shi (shi at wehi dot edu dot au) or
    Dr. Yang Liao (liao at wehi dot edu dot au)