Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).
  • Subjunc: an RNA-seq aligner suitable for all purposes of RNA-seq analyses.
  • featureCounts: a highly efficient and accurate read summarization program.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.

CHANGELOG AND NEWS

Release 1.4.4, 20 March 2014

  • Improved featureCounts in processing GTF/GFF format annotation files.
  • Breakpoint locations are reported along with mapping location of each fusion read in SAM/BAM files, using tags including CC(chromosome name), CP(mapping position), CG(CIGAR string) and CT(strand).
  • Strandness of the exon-splicing site is reported using XA tag (e.g. XA:A:+) in Subjunc program.
  • Fragments (ie. read pairs), instead of reads, are counted for each chromosomal location for SNP calling in exactSNP, when paired-end read data are provided.
  • Full index can be built for a reference genome by subread-buildindex program (no gap between neighbouring subreads) to further reduce the read mapping time.
  • New utility programs were added, including 'coverageCount' (calcuate coverage at each chromosomal location in a highly efficient manner) and 'propmapped' (cacluate proportion of mapped reads in a SAM/BAM file).
  • Bug fixes.

  • Release 1.4.3-p1, 18 December 2013

  • Fixed a bug in featureCounts for processing long header lines in SAM/BAM files.
  • Support for gzipped FASTQ input was added.
  • Improved indel detection.
  • exactSNP can use known SNPs to improve its SNP calling performance (-a option).

  • Release 1.4.2, 15 November 2013

  • featureCounts outputs summary info after read assignments (giving reasons for those unassigned reads).
  • featureCounts automatically re-orders paired-end reads if reads from the same pair are not adjacent to each other (-S option is not needed any more). It can also deal with those read pairs that have only one end included in SAM/BAM files.
  • Subjunc has an improved performance in mapping the exon-spanning reads in which junction locations are very close to (1-2bp away from) the ends of reads.
  • Bug fixes.

  • News - featureCounts publication, 14 Nov 2013

    The featureCounts paper was just published on Bioinformatics! Click the link below to see it:
    featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features.

    Release 1.4.1, 7 November 2013

  • Release of binary distributions for Linux and Mac OS X operating systems. Both 32-bit and 64-bit machines are supported.
  • featureCounts program automatically detects read input format (SAM or BAM) ('-b' option is no longer required for BAM input).
  • Added option '--reportFusions' to subread-align program to detect fusion events such as chimeras in gDNA-seq data. Discovered fusions will be saved to a file. Detailed mapping results for fusion reads are also saved to the SAM/BAM output. Optional fields in the SAM/BAM file are used to store secondary alignments for each fusion read, along with the primary alignment stored in the main fields. Each fusion read occupies only one row in the SAM/BAM output.
  • Added option '--allJunctions' to subjunc program to detect non-canonical exon-exon junctions (donor/receptor sites different from GT/AG) and also fusions in RNA-seq data.
  • Bug fixes.

  • Major release 1.4.0, 10 October 2013

  • Added a number of new features to featureCounts read summarization function, including reordering of reads in BAM files to make reads from the same pair be adjacent to each other, support for chromosome aliases and output of complete annotation data for counting results from meta-feature level summarization.
  • It is described in more details in the Users Guide on how featureCounts program summarizes reads.
  • Improved short indel detection for both Subread and Subjunc aligners. This was achieved by building a consensus indel table and by realigning the reads. Discovered indels are reported in the ouptput in addition to the read mapping results.
  • Support for detection of long indels (up to 200bp) was added in Subread. When the specified value of '-I' option is greater than 16, Subread will automatically perform read assembly to detect long insertions and deletions.
  • Subread and Subjunc can now take FASTQ/FASTA, SAM and BAM files as input and output mapping results in both SAM and BAM formats.
  • Subjunc now directly operates on raw read data (it previously took Subread output as input), thus reducing running time by nearly half.
  • Subjunc can be instructed to output uniquely mapped reads. Hamming distance and mapping quality scores can be used to break ties when more than one best location was found.
  • More options were added to exactSNP program. Its documentation was also greatly improved.
  • A number of bug fixes.


  • Download and Installation

  • Latest version v1.4.4
  • All the versions
  • Installation instructions
  • Mailing lists

  • Subread Users Group
  • Tutorials and Users Guide

  • A short tutorial on Subread
  • A short tutorial on Subjunc
  • A short tutorial on featureCounts
  • A short tutorial on exactSNP
  • A case study for analyzing RNA-seq data
  • Users Guide
  • Publications

    Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013

    Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014

    Scientific publications citing Subread

    Resources

  • Read count tables for published datasets
  • Prebuilt indexes: GRCh37/hg19[5.7GB] mm9[5.4GB] mm10[5.4GB] rn5[5.5GB]
  • Links

  • Rsubread: a Bioconductor R package
  • WEHI Bioinformatics
  • Contact

    Wei Shi (shi at wehi dot edu dot au)

    Yang Liao (liao at wehi dot edu dot au)