The Araport11 Official Release (06/2016) annotation dataset has been accepted by NCBI GenBank and accessible under the BioProject PRJNA10719.
We are pleased to announce the official data release of Araport11, a comprehensive reannotation of the Col-0 genome. In this update, we made use of 113 public RNA-seq data sets along with annotation contributions from NCBI, UniProt, and labs conducting Arabidopsis thaliana research. Details of the structural and functional annotation steps to generate the Araport11 protein-coding gene set as well as consolidation and annotation of non-coding RNAs are described in this draft manuscript on bioRxiv.
The Araport11 protein-coding gene set contains 27,655 loci with 48,359 transcripts. The number of genes in Araport11 with splice variants (10,696; 39%) is higher than that reported in TAIR10 (20.8%). The functional annotation of over 5,000 of the TAIR10 loci has been updated. Overall, a total of 738 novel protein coding genes, 508 novel transcribed regions, and more than 3,000 non-coding genes have been added in the Araport11 release. Different categories of annotation improvements made in Araport11 are summarized in the table below.
The Araport11 data files are available for FTP download, query via ThaleMine and viewing via the Araport JBrowse. The Araport11 data are also available via Araport web services (araport11_gene_structure_by_locus) via the Araport API explorer.
|(A) Protein-coding gene|
|Number of loci||27,416||27,655|
|Number of transcripts||35,386||48,359|
|Number of loci with two or more splice variants||5,804 (18%)||10,696 (39%)|
|(B) Noncoding gene|
|Long intergenic noncoding RNA (lincRNA)||36||2,444|
|Natural antisense transcript (NAT)||223||1,115|
|Small nucleolar RNA (snoRNA)||71||287|
|Small nuclear RNA (snRNA)||13||82|
|(C) Genomic feature|
|Novel transcribed region||508|
|Upstream open reading frame||58||84|
|Obsolete loci with short coding sequence||388|