Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).
  • Subjunc: an RNA-seq aligner suitable for all purposes of RNA-seq analyses.
  • featureCounts: a highly efficient and accurate read summarization program.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.


Release 1.5.0-p2, 14 Apr 2016
  • featureCounts
    • Fix a bug in processing long header lines in SAM/BAM files.
    • Depreciated the '-S < ff:fr:rf >' option.
    • The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.
  • subread-align and subjunc
    • Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
    • Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
    • Fixed a bug in soft-clipping read bases.
    • Improved screen output.

  • Release 1.5.0-p1, 18 Dec 2015
  • featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.
  • Report counts for exon-exon junctions by featureCounts ('-J' option).
  • Improved parsing of gzipped fastq files in Subread and Subjunc aligners.
  • Bug fixes.

  • Release 1.5.0, 29 Oct 2015
  • featureCounts
    • Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
    • A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.
  • subread-align and subjunc
    • New parameter '--type' for sequencing-type-specific mapping optimization.
    • New parameter '--sv' for detection of structural variant breakpoints.
    • New parameter '--complexIndels' for detection of complex indels.
    • Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
    • Improved detection of multi-mapping reads by considering locations that receive second highest votes.
    • When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
    • Default output format is set to BAM.
  • subread-buildindex
    • Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.

  • Release 1.4.6-p5, 4 Sept 2015
  • Added '-S' option to featureCounts to specify the orientation of paired-end reads.
  • Added '--minOverlap' and '--largestOverlap' options to featureCounts.
  • Fixed a CIGAR reporting bug in Subjunc when '--allJunctions' option is turned on (in rare cases CIGAR may contain an incorrect large N section).

  • Release 1.4.6-p4, 25 June 2015
  • Fixed a bug in featureCounts for counting reads (not read pairs) in stranded paired-end sequencing data. Counting read pairs is not affected by this bug.

  • Release 1.4.6-p3, 18 May 2015
  • Added an argument "--donotsort" to featureCounts to allow users to turn off the read sorting procedure.
  • Show details of read pairs that were not properly paired in featureCounts screen output.

  • NEWS

    The Subread-featureCounts-limma/voom pipeline was successfully used in a large-scale RNA-seq study for defining a signature for mouse antibody-secreting plasma cells. The study was published in Nature Immunology in April 2015:

    Transcriptional profiling of mouse B cell terminal differentiation defines a signature for antibody-secreting plasma cells

    Release 1.4.6-p2, 24 March 2015
  • Added a new parameter for subread-align and subjunc programs: --minDistanceBetweenVariants. This parameter specifies the minimum allowed distance between two neighboring genomic variants within the same read.

  • Release 1.4.6-p1, 12 Feb 2015
  • Distance between two neighboring genomic variants is allowed to be as small as 1bp (it was 16bp).
  • Fix a bug that sometimes caused randomness in multi-threaded running of Subjunc program.

  • NEWS

    The Subread-featureCounts-limma/voom pipeline has been found to be one of the best-performing pipelines for the analyses of RNA-seq data by the SEQC/MAQC III Consortium. This study was published in the 2014 September issue of Nature Biotechnology -- A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

    Release 1.4.6, 15 Oct 2014
  • The default number of maximum allowed mismatches in each reported alignment is changed to 3.
  • The minimun fraction of consensus subreads out of all extracted subreads, required for detecting candidate mapping locations, is changed to 0.3 in subjunc for the mapping of exonic reads (reads falling within exons).
  • Better support for the mapping of micro RNA sequencing (miRNA-seq) reads. A full index with no gaps included can now be built to allow miRNA-seq reads to be mapped in the highest possible resolution. A new section is added to the User Guides to describe how to map miRNA-seq reads using Subread.
  • Bug fixes.

  • ChangeLog history

    Download and Installation

  • Latest version 1.5.0-p2
  • All the versions
  • Installation instructions
  • Mailing lists

  • Subread Users Group
  • Tutorials and Users Guide

  • A short tutorial on Subread
  • A short tutorial on Subjunc
  • A short tutorial on featureCounts
  • A short tutorial on exactSNP
  • A case study for analyzing RNA-seq data
  • Users Guide
  • Publications

  • Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
  • Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
  • Scientific publications citing our methods

  • Publications that cite Subread/Subjunc
  • Publications that cite featureCounts
  • Resources

  • Read count tables for Pickrell dataset and Montgomery dataset, both published in Nature in 2010.
  • Read count table for TCGA data.
  • Read count data from SEQC/MAQC III study.
  • Links

  • Bioconductor R package Rsubread
  • Bioconductor R package seqc
  • WEHI Bioinformatics
  • Contact

    Dr. Wei Shi (shi at wehi dot edu dot au) or
    Dr. Yang Liao (liao at wehi dot edu dot au)