Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
  • Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
  • featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.

CHANGELOG AND NEWS

Release 1.5.3, 12 July 2017
  • featureCounts
    • New parameter '-L' for counting long reads (eg. Nanopore and PacBio reads).
    • New parameter '--byReadGroup' for counting reads in each read group in each library.
    • Output detailed assignment results for reads in three different formats: CORE, SAM and BAM ('-R' option).
    • Filters included in counting summary are ordered by their order of being applied in counting from first to last.
  • subread-align and subjunc
    • Remove '-u' option and add '--multiMapping' option. By default aligners report uniquely mapped reads only.

  • Release 1.5.2, 17 March 2017
  • subread-align and subjunc
    • Gene annotation can be provided to aligners to improve exon junction detection and read mapping for RNA-seq data.
    • Resolve inconsistency between runs when more than one CPU thread is used. The inconsistency was caused by excessive number of indels/junctions occurring at the same base position (rare).
    • Improved sanity checking for input and output data (eg. check if disk is full).
  • featureCounts
    • Allow BAM/SAM input from STDIN (eg. via pipes).
    • Improved sanity checking for input and output data (eg. check if input BAM file is corrupted).

  • Release 1.5.1, 25 Aug 2016
  • featureCounts
    • New parameter '--fracOverlap' that specifies the minimum fraction of overlapping read bases required for assigning a read to a feature.
    • New parameter '--tmpDir' that specifies the directory where intermediate files will be saved to.
    • The '--fraction' option can now be used to produce fractional counts for both multi-mapping reads and multi-overlapping reads (reads that overlap with more than one feature).
    • RefSeq gene annotation for hg38 is added to the package (overlapping exons from the same gene are merged into one exon).
    • Sanity check for input BAM files.
    • In featureCounts output report the original exon coordinates (provided in the annotation) without merging them.
    • bug fixes.
  • subread-align and subjunc
    • A new method for reporting mapping quality score (MQS) that takes into account top condidate mapping locations and also number of mismatches.

  • Release 1.5.0-p3, 27 May 2016
  • Fixed a bug associated with '--allJunctions' option in Subjunc aligner.
  • Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).
  • Resoved an issue of concurrently opening a large number of files in featureCounts.
  • Improved processing of 'H' operations in CIGAR strings in featureCounts.

  • ChangeLog history

    Download and installation

  • Latest version 1.5.3
  • All the versions
  • Installation instructions
  • Users guide and tutorials

  • Users Guide
  • A quick tutorial on Subread
  • A quick tutorial on Subjunc
  • A quick tutorial on featureCounts
  • A quick tutorial on exactSNP
  • Case study for RNA-seq data analysis
  • How to get help

    Please post your questions or suggestions to Bioconductor support site or Subread Users Group

    How to cite the methods

  • Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
  • Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
  • Google citations

  • Publications that cite Subread/Subjunc
  • Publications that cite featureCounts
  • Student projects

    Projects for Ph.D, MS and Honour students are available. Internship is also considered.

  • Read mapping, variant detection and isoform discovery (bioinformatics projects)
  • Reconstructing the immune response: from molecules to cells to systems (computational immunology project)
  • Resources

  • Read counts for TCGA data.
  • Read counts for SEQC data.
  • Read counts for Pickrell dataset and Montgomery dataset.
  • Links

  • Bioconductor R package Rsubread
  • Bioconductor R package seqc
  • WEHI Bioinformatics
  • Developers

    Yang Liao
    Wei Shi