Subread package: high-performance read alignment, quantification and mutation discovery
The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
- Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).
- Subjunc: an RNA-seq aligner suitable for all purposes of RNA-seq analyses.
- featureCounts: a highly efficient and accurate read summarization program.
- exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.
These programs were also implemented in Bioconductor R package Rsubread.
Release 1.5.0-p3, 27 May 2016
Fixed a bug associated with '--allJunctions' option in Subjunc aligner.
Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).
Resoved an issue of concurrently opening a large number of files in featureCounts.
Improved processing of 'H' operations in CIGAR strings in featureCounts.
Release 1.5.0-p2, 14 Apr 2016
subread-align and subjunc
- Fix a bug in processing long header lines in SAM/BAM files.
- Depreciated the '-S < ff:fr:rf >' option.
- The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.
- Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
- Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
- Fixed a bug in soft-clipping read bases.
- Improved screen output.
Release 1.5.0-p1, 18 Dec 2015
featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.
Report counts for exon-exon junctions by featureCounts ('-J' option).
Improved parsing of gzipped fastq files in Subread and Subjunc aligners.
Release 1.5.0, 29 Oct 2015
subread-align and subjunc
- Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
- A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.
- New parameter '--type' for sequencing-type-specific mapping optimization.
- New parameter '--sv' for detection of structural variant breakpoints.
- New parameter '--complexIndels' for detection of complex indels.
- Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
- Improved detection of multi-mapping reads by considering locations that receive second highest votes.
- When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
- Default output format is set to BAM.
- Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.
Download and Installation
Latest version 1.5.0-p3
All the versions
Subread Users Group
Tutorials and Users Guide
A short tutorial on Subread
A short tutorial on Subjunc
A short tutorial on featureCounts
A short tutorial on exactSNP
A case study for analyzing RNA-seq data
Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
Scientific publications citing our methods
Publications that cite Subread/Subjunc
Publications that cite featureCounts
Read count tables for Pickrell dataset and Montgomery dataset, both published in Nature in 2010.
Read count table for TCGA data.
Read count data from SEQC/MAQC III study.
Bioconductor R package Rsubread
Bioconductor R package seqc
Dr. Wei Shi (shi at wehi dot edu dot au) or
Dr. Yang Liao (liao at wehi dot edu dot au)