Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triplet error for spliced proteins #55

Open
hrrsjeong opened this issue Aug 30, 2021 · 2 comments
Open

triplet error for spliced proteins #55

hrrsjeong opened this issue Aug 30, 2021 · 2 comments

Comments

@hrrsjeong
Copy link

Hello,
Thank you for your great work! When I ran a program (within group analysis) I got an error message regarding the codon multiplet issue as below.

WARNING: The CDS coordinates for gene gene-TTC7A in the gtf file do not yield a set of complete codons,

or are absent from the file. The number of nucleotides must be a multiple of 3.

SNPGenie terminated.

I checked the gene-TTC7A in the gtf file and realized that if the 8th column of the first exon of the gene includes non-zero values the error occurred. I was wondering if you consider this while you concat multiple exons. Thanks!

NW_005081548.1 Gnomon CDS 8983330 8983493 . + 2 transcript_id "rna-XM_014264419.2"; gene_id "gene-TTC7A"; gene_name "TTC7A";
NW_005081548.1 Gnomon CDS 8985037 8985205 . + 0 transcript_id "rna-XM_014264419.2"; gene_id "gene-TTC7A"; gene_name "TTC7A";
NW_005081548.1 Gnomon CDS 9002261 9002391 . + 2 transcript_id "rna-XM_014264419.2"; gene_id "gene-TTC7A"; gene_name "TTC7A";

@hrrsjeong
Copy link
Author

To solve this issue, I changed the genomic coordinate of the start positions of the first exon (for forward-stranded genes) or the end positions of the last exon (for reverse-stranded genes) based on the phase (8th column).

@singing-scientist
Copy link
Contributor

Greetings @hrrsjeong! Apologies for the delay, as I have been traveling.

The issue here is that, for a given gene_id, the nucleotides must 'add up' to a complete set of codons. Here this is not the case, i.e., the exons are:

8983330..8983493=164nt
8985037..8985205=169nt
9002261..9002391=131nt

These sum to 464nt, which is not a complete codon set (multiple of 3), i.e., 464/3=154.67 when the remainder should be 0.

Thus, it is important to check the gene annotations to make sure it is correct. Is it possible this gene has a frameshift somewhere? And, if you remove this gene, do you get the same error for other gene(s)?

Unfortunately SNPGenie will keep giving this error until all genes are an exact multiple of 3.

Let me know if that helps!
Chase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants