Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

  • Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
  • Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
  • featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
  • Sublong: a long-read aligner that is designed based on seed-and-vote.
  • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

These programs were also implemented in Bioconductor R package Rsubread.

CHANGELOG AND NEWS

Release 2.0.6, 10 May 2023
  • Fixed a bug in featureCounts when counting reads supporting each exon-exon junction in long-read data.

  • Release 2.0.5, 17 April 2023
  • Update the inbuilt gene annotation for mm39.

  • Release 2.0.4, 27 February 2023
  • '--fraction' parameter now works with '--largestOverlap' parameter in featureCounts.
  • Value provided to '--fracOverlap' parameter in featureCounts is now precisely applied for read filtering.
  • '-J' parameter in featureCounts now considers all the reads in the input for junction counting.

  • Release 2.0.3, 15 July 2021
  • Users guide updated.

  • Release 2.0.2, 29 March 2021
  • New parameter '--countReadPairs' is added to featureCounts to explicitly specify that read pairs will be counted, and the '-p' option in featureCounts now only specifies if the input reads are paired end (it also implied that counting of read pairs would be performed in previous versions).

  • Release 2.0.1, 13 May 2020
  • The '-t' option in featureCounts now accepts multiple features.

  • Release 2.0.0, 4 Sept 2019 -- We finally ported Subread package to Windows!

    Release 1.6.5, 18 Jul 2019
  • Reduce the amount of computer memory used by subread-buildindex program to build an index.
  • Use the first 1000 mapped read pairs to estimate the template length and use this information to improve the mapping of paired end reads.
  • Improve the detection and reporting of indels that are present in repetitive genomic regions.
  • Input BAM/SAM files to featureCounts program are allowed to contain both single-end and paired-end reads.
  • flattenGTF can combine overlapping exons to form a single large exon encompassing all the overlapping exons, or chop them into non-overlapping bins.
  • Algorithm improvement for exactSNP program.

  • Release 1.6.4, 14 Mar 2019
  • New options in featureCounts: readShiftType and readShiftSize.
  • The fasta file(s) provided to subread-buildindex is allowed to be in gzipped format.
  • removeDup utility program supports BAM input and output.
  • Improved checking on file read and write operations.

  • Release 1.6.3, 9 Oct 2018
  • Fixed a bug in subread-align and subjunc that may cause incorrect reporting of mapping result when a read is mapped out of chromosome boundary.
  • Fixed a bug in featureCounts that may cause incorrect counting of reads when '--byReadGroup' is specified and unmapped reads are not included in the BAM input.
  • Limit on the size of header in the SAM input is removed from featureCounts.
  • Depreciated the coverageCount utility function.

  • Release 1.6.2, 15 May 2018
  • featureCounts
    • New parameter '--extraAttributes': allow extra attributes to be included in the counting output.
    • Stranded/unstranded counting can be applied to each individual library ('-s' option).
    • Improve the speed of featureCounts in processing BAM files generated by some tools which produce reads that are stored in more than one BAM block.
  • subread-align and subjunc
    • New parameter '--sortReadsByCoordinates': output location-sorted reads in BAM output.
    • New parameter '--keepReadOrder': reads in the mapping output are kept in the same order as that in the input.
  • New function 'flattenGTF': flatten features included in a GTF/GFF annotation and output modified annotation to a SAF format annotation.

  • Release 1.6.1, 23 March 2018
  • featureCounts
    • Add two new parameters: nonOverlap and nonOverlapFeature.
  • subread-align
    • Breakpoint data generated with the '--sv' option are written into a VCF-format file.
  • Subread-align, subjunc, featureCounts and exactSNP
    • Annotation file can be provided as a gzipped file.

  • Release 1.6.0, 14 Nov 2017
  • sublong
    • Release of Sublong: a seed-and-vote aligner for mapping long reads such as Nanopore and PacBio reads.
  • featureCounts
    • New parameter 'fracOverlapFeature' for checking fraction of overlapping bases in a feature.
    • For the counting of reads in read groups, order of read group columns in counting output is determined by the order of read group names appearing in the BAM/SAM header.

  • ChangeLog history

    Download and installation

  • Latest version 2.0.6
  • All the versions
  • Installation instructions
  • Users guide and tutorials

  • Users Guide
  • A quick tutorial on Subread
  • A quick tutorial on Subjunc
  • A quick tutorial on featureCounts
  • A quick tutorial on exactSNP
  • Case study for RNA-seq data analysis
  • How to get help

    Please post your questions or suggestions to Bioconductor support site

    Publications

  • Liao Y, Smyth GK and Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Research, 47(8):e47, 2019
  • Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
  • Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
  • Google citations

  • Publications that cite featureCounts
  • Publications that cite Subread/Subjunc
  • Resources

  • Read counts for TCGA data.
  • Read counts for SEQC data.
  • Read counts for Pickrell dataset and Montgomery dataset.
  • Links

  • Bioconductor R package Rsubread
  • Bioconductor R package seqc
  • Shi Lab
  •