Hi Araport team,
I started working with your annotation and I did some checks on the isoforms. Thereby I found a list of isoforms with identical exon (cds) structure !
Should I remove these duplicates ?
For the sake of clarity, I've moved the list of identical isoforms to a text file: clausndh-identical-isoforms.txt
Thank you reporting this. The issue of identical isoforms has been identified and will be resolved in the upcoming Oct 2015 Araport11 release.
Thank you for your message regarding the Araport11 pre-release GFF3 file, dated 20150701.
The purpose of the pre-release was to collect community feedback. We very much appreciate your effort in evaluating the data and pointing out the duplication issues. Indeed we have also recently noted duplicated isoforms at the exon level in the pre-release version.
Background: The issue of duplicate transcripts arose because of a round of aggressive UTR trimming (in order to resolve overlapping neighboring gene loci) prior to making the pre-release.
The issue was resolved by adopting a more conservative approach to UTR trimming, as well as following up with a transcript clustering step which used pairwise comparisons between all transcripts in a locus to identify "clusters sharing the same internal splicing structure(s)" while showing variation only in the terminal UTRs.
Resolution: We have now corrected the duplicates and performed additional refinements (e.g. gene merges and splits) in preparation for the upcoming official release for NCBI submission.
As an example, let us take a look at the locus AT1G11300.
Here is how the transcripts in this locus looked like in the pre-release annotation set (snippet of GFF3: AT1G11300.gff3):
After applying the fixes, here is how the locus is represented in the upcoming release (snippet of GFF3: AT1G11300.fixed.gff3):
As you can see, isoforms (1,3 & 4) have been collapsed into one, and isoform 2 is retained as-is, and the missing 5' UTRs have been reinstated.
This new Araport11 official release will be available in October 2015, at which point we will send you an update.
Vivek @ Araport