featureCounts: a ultrafast and accurate read summarization program

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads. It is available in the SourceForge Subread package or the Bioconductor Rsubread package.

Input and output

featureCounts takes as input SAM/BAM files and an annotation file including chromosomal coordinates of features. It outputs numbers of reads assigned to features (or meta-features). It also outputs stat info for the overall summrization results, including number of successfully assigned reads and number of reads that failed to be assigned due to various reasons (these reasons are included in the stat info).

The annotation file should be in either GTF format or a simplified annotation format (SAF) as shown below (columns are tab-delimited):

GeneID	Chr	Start	End	Strand
497097	chr1	3204563	3207049	-
497097	chr1	3411783	3411982	-
497097	chr1	3660633	3661579	-
...

Features and meta-features

Each entry in the provided annotation file is taken as a feature (e.g. an exon). A meta-feature is the aggregation of a set of features (e.g. a gene). The featureCounts program uses the gene_id attribute available in the GTF format annotation (or the GeneID column in the SAF format annotation) to group features into meta-features, ie. features belonging to the same meta-feature have the same gene identifier.

featureCounts can count reads at either feature level or at meta-feature level. When summarizing reads at meta-feature level, read counts obtained for features included in the same meta-feature will be added up to yield the read count for the corresponding meta-feature.

Overlap between reads and features

A read is said to overlap a feature if at least one read base is found to overlap the feature. For paired-end data, a fragment (or template) is said to overlap a feature if any of the two reads from that fragment is found to overlap the feature.

By default, featureCounts does not count reads overlapping with more than one feature (or more than one meta-feature when summarizing at meta-feature level). Users can use the -O option to instruct featureCounts to count such reads (they will be assigned to all their overlapping features or meta-features).

Note that, when counting at the meta-feature level, reads that overlap multiple features of the same meta-feature are always counted exactly once for that meta-feature, provided there is no overlap with any other meta-feature. For example, an exon-spanning read will be counted only once for the corresponding gene even if it overlaps with more than one exon.

Installation

To use featureCounts program included in the SourceForge Subread package, click this link for installation instructions.

featureCounts is also available in the Bioconductor R package Rsubread. You need to have R installed on your computer to run featureCounts in Rsubread. Rsubread is part of the Bioconductor project.

Example commands

Below gives example commands of using featureCounts included in the SourceForge Subread package. For the example commands of using featureCounts in Rsubread package, please see the Subread/Rsubread Users Guide. Not that featureCounts automatically detects the format of input read files (SAM/BAM).

Summarize a single-end read dataset using 5 threads:

featureCounts -T 5 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.sam

Summarize a BAM format dataset:

featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam

Summarize multiple datasets at the same time:

featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt library1.bam library2.bam library3.bam

Perform strand-specific read counting (use '-s 2' if reversely stranded):

featureCounts -s 1 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam

Summarize paired-end reads and count fragments (instead of reads):

featureCounts -p --countReadPairs -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bam

Summarize multiple paired-end datasets:

featureCounts -p --countReadPairs -t exon -g gene_id -a annotation.gtf -o counts.txt library1.bam library2.bam library3.bam

Count the fragments that have fragment length between 50bp and 600bp only:

featureCounts -p --countReadPairs -P -d 50 -D 600 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bam

Count those fragments that have both ends mapped only:

featureCounts -p --countReadPairs -B -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bam

Exclude chimeric fragments from fragment counting:

featureCounts -p --countReadPairs -C -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bam

Citation

Liao Y, Smyth GK and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30.

Users Guide

Type featureCounts to see the usage information,or have a look at User's Guide for more details.

Get help

You may post your questions/suggestions at the Bioconductor support site.

Scientific publications citing Subread

See the full list from Google Scholar.

Links

Subread: A superfast and accurate read aligner.

Subjunc: Detecting exon-exon junctions and mapping RNA-seq reads.

Rsubread: A bioconductor R package for read mapping, exon-exon junction detection and read summarization.

A case study for RNA-seq data analysis: Using Bioconductor packages Rsubread and limma to perform a complete analysis for RNA-seq data, from read mapping to differential expression analysis. RNA-seq data generated by MAQC/SEQC Consortium were used in this case study.

Subread package overview: Brief description to Subread package.