Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
  • Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
  • featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.


Release 1.5.2, 17 March 2017
  • subread-align and subjunc
    • Gene annotation can be provided to aligners to improve exon junction detection and read mapping for RNA-seq data.
    • Resolve inconsistency between runs when more than one CPU thread is used. The inconsistency was caused by excessive number of indels/junctions occurring at the same base position (rare).
    • Improved sanity checking for input and output data (eg. check if disk is full).
  • featureCounts
    • Allow BAM/SAM input from STDIN (eg. via pipes).
    • Improved sanity checking for input and output data (eg. check if input BAM file is corrupted).

  • Release 1.5.1, 25 Aug 2016
  • featureCounts
    • New parameter '--fracOverlap' that specifies the minimum fraction of overlapping read bases required for assigning a read to a feature.
    • New parameter '--tmpDir' that specifies the directory where intermediate files will be saved to.
    • The '--fraction' option can now be used to produce fractional counts for both multi-mapping reads and multi-overlapping reads (reads that overlap with more than one feature).
    • RefSeq gene annotation for hg38 is added to the package (overlapping exons from the same gene are merged into one exon).
    • Sanity check for input BAM files.
    • In featureCounts output report the original exon coordinates (provided in the annotation) without merging them.
    • bug fixes.
  • subread-align and subjunc
    • A new method for reporting mapping quality score (MQS) that takes into account top condidate mapping locations and also number of mismatches.

  • Release 1.5.0-p3, 27 May 2016
  • Fixed a bug associated with '--allJunctions' option in Subjunc aligner.
  • Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).
  • Resoved an issue of concurrently opening a large number of files in featureCounts.
  • Improved processing of 'H' operations in CIGAR strings in featureCounts.

  • Release 1.5.0-p2, 14 Apr 2016
  • featureCounts
    • Fix a bug in processing long header lines in SAM/BAM files.
    • Depreciated the '-S < ff:fr:rf >' option.
    • The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.
  • subread-align and subjunc
    • Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
    • Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
    • Fixed a bug in soft-clipping read bases.
    • Improved screen output.

  • Release 1.5.0-p1, 18 Dec 2015
  • featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.
  • Report counts for exon-exon junctions by featureCounts ('-J' option).
  • Improved parsing of gzipped fastq files in Subread and Subjunc aligners.
  • Bug fixes.

  • Release 1.5.0, 29 Oct 2015
  • featureCounts
    • Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
    • A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.
  • subread-align and subjunc
    • New parameter '--type' for sequencing-type-specific mapping optimization.
    • New parameter '--sv' for detection of structural variant breakpoints.
    • New parameter '--complexIndels' for detection of complex indels.
    • Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
    • Improved detection of multi-mapping reads by considering locations that receive second highest votes.
    • When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
    • Default output format is set to BAM.
  • subread-buildindex
    • Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.

  • ChangeLog history

    Download and installation

  • Latest version 1.5.2
  • All the versions
  • Installation instructions
  • Users guide and tutorials

  • Users Guide
  • A quick tutorial on Subread
  • A quick tutorial on Subjunc
  • A quick tutorial on featureCounts
  • A quick tutorial on exactSNP
  • Case study for RNA-seq data analysis
  • How to get help

    Please post your questions or suggestions to Bioconductor support site or Subread Users Group

    How to cite the methods

  • Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
  • Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
  • Google citations

  • Publications that cite Subread/Subjunc
  • Publications that cite featureCounts
  • Student projects

    Projects for Ph.D, MS and Honour students are available. Internship is also considered.

  • Read mapping, variant detection and isoform discovery (bioinformatics projects)
  • Reconstructing the immune response: from molecules to cells to systems (computational immunology project)
  • Resources

  • Read counts for TCGA data.
  • Read counts for SEQC data.
  • Read counts for Pickrell dataset and Montgomery dataset.
  • Links

  • Bioconductor R package Rsubread
  • Bioconductor R package seqc
  • WEHI Bioinformatics
  • Developers

    Yang Liao
    Wei Shi