Updated Col-0 Genome Annotation (Araport11 Official Release) Updated Jun 2016

We are pleased to announce the official data release of Araport11, a comprehensive reannotation of the Col-0 genome. In this update, we made use of 113 public RNA-seq data sets along with annotation contributions from NCBI, UniProt, and labs conducting Arabidopsis thaliana research. Details of the structural and functional annotation steps to generate the Araport11 protein-coding gene set as well as consolidation and annotation of non-coding RNAs are described in this draft manuscript on bioRxiv.

The Araport11 protein-coding gene set contains 27,655 loci with 48,359 transcripts. The number of genes in Araport11 with splice variants (10,696; 39%) is higher than that reported in TAIR10 (20.8%). The functional annotation of over 5,000 of the TAIR10 loci has been updated. Overall, a total of 738 novel protein coding genes, 508 novel transcribed regions, and more than 3,000 non-coding genes have been added in the Araport11 release. Different categories of annotation improvements made in Araport11 are summarized in the table below.

The Araport11 data files are available for FTP download, query via ThaleMine and viewing via the Araport JBrowse. The Araport11 data are also available via Araport web services (araport11_gene_structure_by_locus) via the Araport API explorer.

 

TypeTAIR10Araport11
(A) Protein-coding gene
Number of loci27,41627,655
Number of transcripts35,38648,359
Number of loci with two or more splice variants5,804 (18%)10,696 (39%)
(B) Noncoding gene
Long intergenic noncoding RNA (lincRNA)362,444
Natural antisense transcript (NAT)2231,115
MicroRNA (miRNA)177325
Small nucleolar RNA (snoRNA)71287
tRNA689689
Small nuclear RNA (snRNA)1382
rRNA1515
Other RNA394221
(C) Genomic feature
Small RNA35,846
Novel transcribed region508
Upstream open reading frame5884
Obsolete loci with short coding sequence388