How was the 38% alternative spliced isoforms counted?


Status message

New Feature: Guest Login function added to facilitate site exploration without registering. Try it out!


I'm trying to figure out how the number of loci with splice isoforms ~38% was calculated.

I downloaded the Araport11 pre-release-3 annotation gtf file, extracted all the unique transcript ids (Gene.XX) and counted the occurrence of all unique genes(Gene). Then filtered for genes with a count of more than 1. But, now I get only 31.8% of genes which have more than one transcript or alternatively spliced. Where am I going wrong?


The 38% of Araport11 loci with AS variants corresponds to the "protein-coding gene" fraction

The Araport11 Pre-Release 3 dataset (dated 2015-12-02), pertains to the protein-coding gene fraction of the Arabidopsis genome annotation. Across the 5 nuclear + 2 organelle chromosomes, there are 27,667 protein-coding gene loci. Within this set, 61.3% (16,969) loci contain a single transcript, while 38.7% (10,698) loci have 2 or more transcript variants.

Unfortunately, it appears that the version of the GTF file you worked with contained an extended set of genes loci (protein-coding + non-coding). As such, in your calculations, there were fewer loci (31.8%) with 2 or more transcript variants.

We have corrected this issue by updating the GTF file, which now matches with the GFF3 file. Please re-download the file from the Araport11 downloads area ( and use it for your analysis.

We apologize for the confusion and inconvenience caused.

Thank you very much!

Vivek @ Araport