For the protein annotation dataset, are you going to release a "primary_transcript_only" dataset?


Predicted proteomes often contain 2 datasets. A file with proteins predicted using all splicing variants and another using only a "primary_transcript_only" (thus, only one protein will be translated for each gene). Are you gonna to generate this dataset?

A file with one transcript sequence encoding the longest CDS at each locus will be available

Currently, the representative model is defined as the isoform that encodes the longest CDS at each locus (please refer to this document, If that's what you were asking for 'primary transcript', then yes, we will offer a file with one sequence per locus using the gene model that codes for the longest CDS at each locus.