Subread package: high-performance read alignment, quantification and mutation discovery
The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
- Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
- Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
- featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
- exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.
These programs were also implemented in Bioconductor R package Rsubread.
Release 1.5.1, 25 Aug 2016
subread-align and subjunc
- New parameter '--fracOverlap' that specifies the minimum fraction of overlapping read bases required for assigning a read to a feature.
- New parameter '--tmpDir' that specifies the directory where intermediate files will be saved to.
- The '--fraction' option can now be used to produce fractional counts for both multi-mapping reads and multi-overlapping reads (reads that overlap with more than one feature).
- RefSeq gene annotation for hg38 is added to the package (overlapping exons from the same gene are merged into one exon).
- Sanity check for input BAM files.
- In featureCounts output report the original exon coordinates (provided in the annotation) without merging them.
- bug fixes.
- A new method for reporting mapping quality score (MQS) that takes into account top condidate mapping locations and also number of mismatches.
Release 1.5.0-p3, 27 May 2016
Fixed a bug associated with '--allJunctions' option in Subjunc aligner.
Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).
Resoved an issue of concurrently opening a large number of files in featureCounts.
Improved processing of 'H' operations in CIGAR strings in featureCounts.
Release 1.5.0-p2, 14 Apr 2016
subread-align and subjunc
- Fix a bug in processing long header lines in SAM/BAM files.
- Depreciated the '-S < ff:fr:rf >' option.
- The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.
- Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
- Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
- Fixed a bug in soft-clipping read bases.
- Improved screen output.
Release 1.5.0-p1, 18 Dec 2015
featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.
Report counts for exon-exon junctions by featureCounts ('-J' option).
Improved parsing of gzipped fastq files in Subread and Subjunc aligners.
Release 1.5.0, 29 Oct 2015
subread-align and subjunc
- Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
- A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.
- New parameter '--type' for sequencing-type-specific mapping optimization.
- New parameter '--sv' for detection of structural variant breakpoints.
- New parameter '--complexIndels' for detection of complex indels.
- Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
- Improved detection of multi-mapping reads by considering locations that receive second highest votes.
- When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
- Default output format is set to BAM.
- Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.
Download and installation
Latest version 1.5.1
All the versions
Users guide and tutorials
A quick tutorial on Subread
A quick tutorial on Subjunc
A quick tutorial on featureCounts
A quick tutorial on exactSNP
Case study for RNA-seq data analysis
Projects for Ph.D, MS and Honour students are available. Internship is considered as well.
Read mapping, variant detection and isoform discovery (bioinformatics projects)
Reconstructing the immune response: from molecules to cells to systems (computational immunology project)
Subread Users Group
Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013
Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014
Scientific publications citing our methods
Publications that cite Subread/Subjunc
Publications that cite featureCounts
Read counts for TCGA data.
Read counts for SEQC data.
Read counts for Pickrell dataset and Montgomery dataset.
Bioconductor R package Rsubread
Bioconductor R package seqc
Dr. Wei Shi (shi at wehi dot edu dot au) or
Dr. Yang Liao (liao at wehi dot edu dot au)