The Subread package

Release 1.5.3, 12 July 2017

featureCounts

New parameter '-L' for counting long reads (eg. Nanopore and PacBio reads).
New parameter '--byReadGroup' for counting reads in each read group in each library.
Output detailed assignment results for reads in three different formats: CORE, SAM and BAM ('-R' option).
Filters included in counting summary are ordered by their order of being applied in counting from first to last.

subread-align and subjunc

Remove '-u' option and add '--multiMapping' option. By default aligners report uniquely mapped reads only.

Release 1.5.2, 17 March 2017

subread-align and subjunc

Gene annotation can be provided to aligners to improve exon junction detection and read mapping for RNA-seq data.
Resolve inconsistency between runs when more than one CPU thread is used. The inconsistency was caused by excessive number of indels/junctions occurring at the same base position (rare).
Improved sanity checking for input and output data (eg. check if disk is full).

featureCounts

Allow BAM/SAM input from STDIN (eg. via pipes).
Improved sanity checking for input and output data (eg. check if input BAM file is corrupted).

Release 1.5.1, 25 Aug 2016

featureCounts

New parameter '--fracOverlap' that specifies the minimum fraction of overlapping read bases required for assigning a read to a feature.
New parameter '--tmpDir' that specifies the directory where intermediate files will be saved to.
The '--fraction' option can now be used to produce fractional counts for both multi-mapping reads and multi-overlapping reads (reads that overlap with more than one feature).
RefSeq gene annotation for hg38 is added to the package (overlapping exons from the same gene are merged into one exon).
Sanity check for input BAM files.
In featureCounts output report the original exon coordinates (provided in the annotation) without merging them.
bug fixes.

subread-align and subjunc

A new method for reporting mapping quality score (MQS) that takes into account top condidate mapping locations and also number of mismatches.

Release 1.5.0-p3, 27 May 2016

Fixed a bug associated with '--allJunctions' option in Subjunc aligner.

Improved the efficiency of exactSNP program when calling SNPs from the data that have a very high sequencing depth (>1000x).

Resoved an issue of concurrently opening a large number of files in featureCounts.

Improved processing of 'H' operations in CIGAR strings in featureCounts.

Release 1.5.0-p2, 14 Apr 2016

featureCounts

Fix a bug in processing long header lines in SAM/BAM files.
Depreciated the '-S < ff:fr:rf >' option.
The '< input file >.featureCounts' files (generated when '-R' is specified) are saved to the same directory as the file < output_file >.

subread-align and subjunc

Fixed a bug related to reporting of reads mapping out of the chromosomal boundary.
Make sure no zero operations (eg '0M') are included in reported CIGAR strings.
Fixed a bug in soft-clipping read bases.
Improved screen output.

Release 1.5.0-p1, 18 Dec 2015

featureCounts can process long reads (up to 250kb long). It can also process reads that contain long extra fields.

Report counts for exon-exon junctions by featureCounts ('-J' option).

Improved parsing of gzipped fastq files in Subread and Subjunc aligners.

Bug fixes.

Release 1.5.0, 29 Oct 2015

featureCounts

Improved speed of re-sorting paired-end reads. It now takes only about 30 seconds to re-sort 30 million read pairs.
A utility program `repair' to provided to allow pre-sorting of pair-end BAM/SAM files.

subread-align and subjunc

New parameter '--type' for sequencing-type-specific mapping optimization.
New parameter '--sv' for detection of structural variant breakpoints.
New parameter '--complexIndels' for detection of complex indels.
Improved mapping of paired-end reads via a new formula that uses weighted votes (more weight is given to properly mapped reads).
Improved detection of multi-mapping reads by considering locations that receive second highest votes.
When gzipped fastq input is provided, reads will be directly extracted from the gzip-compressed file and no temporary files will be generated.
Default output format is set to BAM.

subread-buildindex

Default threshold for removing un-informative subreads from index is changed to 100 to allow more candidate mapping locations to be considered.

Release 1.4.6-p5, 4 Sept 2015

Added '-S' option to featureCounts to specify the orientation of paired-end reads.

Added '--minOverlap' and '--largestOverlap' options to featureCounts.

Fixed a CIGAR reporting bug in Subjunc when '--allJunctions' option is turned on (in rare cases CIGAR may contain an incorrect large N section).

Release 1.4.6-p4, 25 June 2015

Fixed a bug in featureCounts for counting reads (not read pairs) in stranded paired-end sequencing data. Counting read pairs is not affected by this bug.

Release 1.4.6-p3, 18 May 2015

Added an argument "--donotsort" to featureCounts to allow users to turn off the read sorting procedure.

Show details of read pairs that were not properly paired in featureCounts screen output.

NEWS

The Subread-featureCounts-limma/voom pipeline was successfully used in a large-scale RNA-seq study for defining a signature for mouse antibody-secreting plasma cells. The study was published in Nature Immunology in April 2015:

Transcriptional profiling of mouse B cell terminal differentiation defines a signature for antibody-secreting plasma cells

Release 1.4.6-p2, 24 March 2015

Added a new parameter for subread-align and subjunc programs: --minDistanceBetweenVariants. This parameter specifies the minimum allowed distance between two neighboring genomic variants within the same read.

Release 1.4.6-p1, 12 Feb 2015

Distance between two neighboring genomic variants is allowed to be as small as 1bp (it was 16bp).

Fix a bug that sometimes caused randomness in multi-threaded running of Subjunc program.

NEWS

The Subread-featureCounts-limma/voom pipeline has been found to be one of the best-performing pipelines for the analyses of RNA-seq data by the SEQC/MAQC III Consortium. This study was published in the 2014 September issue of Nature Biotechnology -- A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

Release 1.4.6, 15 Oct 2014

The default number of maximum allowed mismatches in each reported alignment is changed to 3.

The minimun fraction of consensus subreads out of all extracted subreads, required for detecting candidate mapping locations, is changed to 0.3 in subjunc for the mapping of exonic reads (reads falling within exons).

Better support for the mapping of micro RNA sequencing (miRNA-seq) reads. A full index with no gaps included can now be built to allow miRNA-seq reads to be mapped in the highest possible resolution. A new section is added to the User Guides to describe how to map miRNA-seq reads using Subread.

Bug fixes.

Release 1.4.5-p1, 7 July 2014

Fixed a bug for reporting unmapped reads in SAM output.

Release 1.4.5, 12 June 2014

New options in featureCounts:
--readExtension5 <int> Reads are extended upstream by <int> bases from their 5' end.
--readExtension3 <int> Reads are extended downstream by <int> bases from their 3' end.
--read2pos <5:3> The read is reduced to its 5' most base or 3' most base. Read summarization is then performed based on the single base position which the read is reduced to.
--minReadOverlap <int> Specify the minimum number of overlapped bases required for assigning a read to a feature. 1 by default. Negative values are permitted, indicating a gap being allowed between a read and a feature.
--countSplitAlignmentsOnly If specified, only split alignments (CIGAR strings containing letter 'N') will be counted. Example split alignments include exon-spanning reads in RNA-seq data.
--ignoreDup If specified, reads marked as duplicates are not counted. Duplicate reads are identified using FLAG Ox400.

A new option in subread-align/sbujunc:
-M <int> Specify the maximum number of mismatched bases allowed in the alignment. 10 by default.

Other changes:

NM tags are added into read mapping output.

Range of MAPQ values is changed to [0,60).

MAPQ values for Multiple-mapping reads are set to 0.

NCBI RefSeq gene annotations for hg19, mm10 and mm9 are added to the package, making it easier for performing read summarization.

Bug fixes.

Release 1.4.4, 20 March 2014

Improved featureCounts in processing GTF/GFF format annotation files.

Breakpoint locations are reported along with mapping location of each fusion read in SAM/BAM files, using tags including CC(chromosome name), CP(mapping position), CG(CIGAR string) and CT(strand).

Strandness of the exon-splicing site is reported using XA tag (e.g. XA:A:+) in Subjunc program.

Fragments (ie. read pairs), instead of reads, are counted for each chromosomal location for SNP calling in exactSNP, when paired-end read data are provided.

Full index can be built for a reference genome by subread-buildindex program (no gap between neighbouring subreads) to further reduce the read mapping time.

New utility programs were added, including 'coverageCount' (calcuate coverage at each chromosomal location in a highly efficient manner) and 'propmapped' (cacluate proportion of mapped reads in a SAM/BAM file).

Bug fixes.

Release 1.4.3-p1, 18 December 2013

Fixed a bug in featureCounts for processing long header lines in SAM/BAM files.

Support for gzipped FASTQ input was added.

Improved indel detection.

exactSNP can use known SNPs to improve its SNP calling performance (-a option).

Release 1.4.2, 15 November 2013

featureCounts outputs summary info after read assignments (giving reasons for those unassigned reads).

featureCounts automatically re-orders paired-end reads if reads from the same pair are not adjacent to each other (-S option is not needed any more). It can also deal with those read pairs that have only one end included in SAM/BAM files.

Subjunc has an improved performance in mapping the exon-spanning reads in which junction locations are very close to (1-2bp away from) the ends of reads.

Bug fixes.

News - featureCounts publication, 14 Nov 2013

The featureCounts paper was just published on Bioinformatics! Click the link below to see it:
featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features.

Release 1.4.1, 7 November 2013

Release of binary distributions for Linux and Mac OS X operating systems. Both 32-bit and 64-bit machines are supported.

featureCounts program automatically detects read input format (SAM or BAM) ('-b' option is no longer required for BAM input).

Added option '--reportFusions' to subread-align program to detect fusion events such as chimeras in gDNA-seq data. Discovered fusions will be saved to a file. Detailed mapping results for fusion reads are also saved to the SAM/BAM output. Optional fields in the SAM/BAM file are used to store secondary alignments for each fusion read, along with the primary alignment stored in the main fields. Each fusion read occupies only one row in the SAM/BAM output.

Added option '--allJunctions' to subjunc program to detect non-canonical exon-exon junctions (donor/receptor sites different from GT/AG) and also fusions in RNA-seq data.

Bug fixes.

Major release 1.4.0, 10 October 2013

Added a number of new features to featureCounts read summarization function, including reordering of reads in BAM files to make reads from the same pair be adjacent to each other, support for chromosome aliases and output of complete annotation data for counting results from meta-feature level summarization.

It is described in more details in the Users Guide on how featureCounts program summarizes reads.

Improved short indel detection for both Subread and Subjunc aligners. This was achieved by building a consensus indel table and by realigning the reads. Discovered indels are reported in the ouptput in addition to the read mapping results.

Support for detection of long indels (up to 200bp) was added in Subread. When the specified value of '-I' option is greater than 16, Subread will automatically perform read assembly to detect long insertions and deletions.

Subread and Subjunc can now take FASTQ/FASTA, SAM and BAM files as input and output mapping results in both SAM and BAM formats.

Subjunc now directly operates on raw read data (it previously took Subread output as input), thus reducing running time by nearly half.

Subjunc can be instructed to output uniquely mapped reads. Hamming distance and mapping quality scores can be used to break ties when more than one best location was found.

More options were added to exactSNP program. Its documentation was also greatly improved.

A number of bug fixes.

v1.3.6-p1 release, 28 Aug 2013

Fixed a bug for processing long header lines in index building.

v1.3.6 release, 9 Aug 2013

Fixed a bug for the featureCounts program.

v1.3.5-p5 release, 26 Jun 2013

Fixed a bug in reporting mapping results for PE reads for subread-align.

v1.3.5-p4 release, 18 Jun 2013

Fixed a bug in reporting uniquely mapped reads for subread-align.

Refined program output information.

v1.3.5-p3 release, 13 Jun 2013

Changes to featureCounts to let it deal with the reversed order of reads from the same pair in SAM/BAM file.

v1.3.5-p2 release, 12 Jun 2013

Fixed a bug of reporting mapping location of mate read when it contains soft-clipped bases.

Reformatted the program usage info and updated the program output info.

An '-b' option was added to subread-align to output base-space reads when mapping color-space reads.

v1.3.5-p1 release, 7 Jun 2013

Fixed the bug of incorrect SAM output introduced in v1.3.5.

Enhanced subread-buildindex to let it check the integrity of provided reference sequences.

v1.3.5 release, 6 Jun 2013

Fixed the bug for multi-threaded running of subread-align and subjunc.

Fixed an error for populating the mapping location field in the SAM file for unmapped reads.

Changed the meaning of -d and -D options. They are now used to specify the minimum and maximum fragment lengths.

Removed the limit on the number of chromosomes allowed for the index building.

v1.3.4 release, 4 Jun 2013

Modified featureCounts to let it exclude multi-mapping reads in read summarization (default). To allow multi-mapping reads to be counted, use the '-M' option.

Documentation improvement.

v1.3.3-p5 release, 2 Jun 2013

Modified the '-s' option of featureCounts to allow reverse-stranded read counting ('-s 2').

v1.3.3-p4 release, 29 May 2013

Added '-Q' option to featureCounts to allow it filter out reads which have low mapping quality scores.

v1.3.3-p3 release, 23 May 2013

Fixed a bug with the'-B' argument of the featureCounts program.

featureCounts can now deal with the extra space character included in the gene_id attribute in some GTF annotation files generated by Ensembl.

v1.3.3-p2 release, 20 May 2013

featureCounts can now process any number of chromosomes included in the annotation file.

v1.3.3-p1 release, 17 May 2013

Fixed a bug of processing GTF annotation files.

Further improve the performance of featureCounts when running on one thread.

v1.3.3 release, 11 May 2013

The featureCounts program is released.

v1.3.3 release, 30 April 2013

Added an option to allow multiple best mapping locations to be reported.