featureCounts: a ultrafast and accurate read summarization program
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads. It is available in the SourceForge Subread package or the Bioconductor Rsubread package.
Input and output
featureCounts takes as input SAM/BAM files and an annotation file including chromosomal coordinates of features. It outputs numbers of reads assigned to features (or meta-features). It also outputs stat info for the overall summrization results, including number of successfully assigned reads and number of reads that failed to be assigned due to various reasons (these reasons are included in the stat info).
The annotation file should be in either GTF format or a simplified annotation format (SAF) as shown below (columns are tab-delimited):
GeneID Chr Start End Strand 497097 chr1 3204563 3207049 - 497097 chr1 3411783 3411982 - 497097 chr1 3660633 3661579 - ...
Features and meta-features
Each entry in the provided annotation file is taken as a feature (e.g. an exon). A meta-feature is the aggregation of a set of features (e.g. a gene). The featureCounts program uses the gene_id attribute available in the GTF format annotation (or the GeneID column in the SAF format annotation) to group features into meta-features, ie. features belonging to the same meta-feature have the same gene identifier.
featureCounts can count reads at either feature level or at meta-feature level. When summarizing reads at meta-feature level, read counts obtained for features included in the same meta-feature will be added up to yield the read count for the corresponding meta-feature.
Overlap between reads and features
A read is said to overlap a feature if at least one read base is found to overlap the feature. For paired-end data, a fragment (or template) is said to overlap a feature if any of the two reads from that fragment is found to overlap the feature.
By default, featureCounts does not count reads overlapping with more than one feature (or more than one meta-feature when summarizing at meta-feature level). Users can use the -O option to instruct featureCounts to count such reads (they will be assigned to all their overlapping features or meta-features).
Note that, when counting at the meta-feature level, reads that overlap multiple features of the same meta-feature are always counted exactly once for that meta-feature, provided there is no overlap with any other meta-feature. For example, an exon-spanning read will be counted only once for the corresponding gene even if it overlaps with more than one exon.
Installation
To use featureCounts program included in the SourceForge Subread package, click this link for installation instructions.
featureCounts is also available in the Bioconductor R package Rsubread. You need to have R installed on your computer to run featureCounts in Rsubread. Rsubread is part of the Bioconductor project.
Example commands
Below gives example commands of using featureCounts included in the SourceForge Subread package. For the example commands of using featureCounts in Rsubread package, please see the Subread/Rsubread Users Guide. Not that featureCounts automatically detects the format of input read files (SAM/BAM).
Summarize a single-end read dataset using 5 threads:featureCounts -T 5 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.samSummarize a BAM format dataset:
featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bamSummarize multiple datasets at the same time:
featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt library1.bam library2.bam library3.bamPerform strand-specific read counting (use '-s 2' if reversely stranded):
featureCounts -s 1 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bamSummarize paired-end reads and count fragments (instead of reads):
featureCounts -p --countReadPairs -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bamSummarize multiple paired-end datasets:
featureCounts -p --countReadPairs -t exon -g gene_id -a annotation.gtf -o counts.txt library1.bam library2.bam library3.bamCount the fragments that have fragment length between 50bp and 600bp only:
featureCounts -p --countReadPairs -P -d 50 -D 600 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bamCount those fragments that have both ends mapped only:
featureCounts -p --countReadPairs -B -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bamExclude chimeric fragments from fragment counting:
featureCounts -p --countReadPairs -C -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_PE.bam
Citation
Liao Y, Smyth GK and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30.Users Guide
Type featureCounts to see the usage information,or have a look at User's Guide for more details.Get help
You may post your questions/suggestions at the Bioconductor support site.Scientific publications citing Subread
See the full list from Google Scholar.Links
Subread: A superfast and accurate read aligner.
Subjunc: Detecting exon-exon junctions and mapping RNA-seq reads.
Rsubread: A bioconductor R package for read mapping, exon-exon junction detection and read summarization.
A case study for RNA-seq data analysis: Using Bioconductor packages Rsubread and limma to perform a complete analysis for RNA-seq data, from read mapping to differential expression analysis. RNA-seq data generated by MAQC/SEQC Consortium were used in this case study.
Subread package overview: Brief description to Subread package.