Skip to content

Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method> #369

Open
@Jigyasa3

Description

Hey @milot-mirdita ,

Thanks for a great resource to find remote homologs!
I am interested in finding an E.coli gene in other Proteobacteria. The literature shows that this gene is conserved in closely related strains only, so I am using HH-SUITE to find remote homologs of this gene in other Proteobacteria samples.

I would like to get some advice on the way I am using the HH-SUITE makes sense, and if the output is not a false positive/negative.

  1. I run hhblits to get all sequences similar to the E.coli gene of interest in the Uniclust30 cluster
    hhblits -cpu 4 -i ${IN_DIR}/ytfI_ecoli.fasta -d ${DB2}/UniRef30_2023_02 -oa3m ${OUT_DIR}/ytfI_ECOLI_uniclust.a3m -all

#The idea behind the step1 is to get remote homologs for the E.coli gene of interest as HMMsearch against a single E.coli gene as the database doesn't give any results!

  1. The resulting .a3m file was converted back to fasta file using reformat.pl script.
  2. The hmmbuild command was used to convert the MSA into a database.
  3. I use hmmsearch on the Proteobacteria protein sequences against the database from step 3.

Unfortunately, this is not giving a hit that is "significant" enough i.e. the E.value of the hit was not less than 1e-3.

I am comparing the Proteobacteria sequences with the E.coli gene of interest using Foldseek's easy_search command too. And, I find no "significant" hit i.e. the E.value of the hit was not less than 1e-3.

So I am interested in understanding what could be considered a reasonable remote homolog of the gene, and if the two methods I am using make sense.

Regards,
Jigyasa

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions