Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method>

Hey @milot-mirdita ,

Thanks for a great resource to find remote homologs!
 I am interested in finding an E.coli gene in other Proteobacteria. The literature shows that this gene is conserved in closely related strains only, so I am using HH-SUITE to find remote homologs of this gene in other Proteobacteria samples.

I would like to get some advice on the way I am using the HH-SUITE makes sense, and if the output is not a false positive/negative.
1. I run `hhblits` to get all sequences similar to the E.coli gene of interest in the Uniclust30 cluster
`hhblits -cpu 4 -i ${IN_DIR}/ytfI_ecoli.fasta -d ${DB2}/UniRef30_2023_02 -oa3m ${OUT_DIR}/ytfI_ECOLI_uniclust.a3m -all`

#The idea behind the step1 is to get remote homologs for the E.coli gene of interest as HMMsearch against a single E.coli gene as the database doesn't give any results!
 
2. The resulting `.a3m` file was converted back to fasta file using `reformat.pl` script.
3. The `hmmbuild` command was used to convert the MSA into a database.
4.  I use `hmmsearch` on the Proteobacteria protein sequences against the database from step 3.

Unfortunately, this is not giving a hit that is "significant" enough i.e. the E.value of the hit was not less than 1e-3.

I am comparing the Proteobacteria sequences with the E.coli gene of interest using Foldseek's `easy_search` command too. And, I find no "significant" hit i.e. the E.value of the hit was not less than 1e-3.

So I am interested in understanding what could be considered a reasonable remote homolog of the gene, and if the two methods I am using make sense.

Regards,
Jigyasa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method> #369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development