You’ve sequenced your samples and identified variants. Great!
Now, here’s how you can use WormBase ParaSite’s new feature to find out the effect of your variants on 3D AlphaFold protein structures and interactions:
The Variant Effect Predictor (VEP) is what you’re going to need for this task. You can find it in the Tools page of WormBase ParaSite.
2. Click the ‘Variant Effect Predictor’ button to open the VEP web tool and enter your input data using instructions in the documentation:
3. Now, make sure that the ‘Protein domains’ option from the ‘Protein Annotation’ panel is selected. It will retrieve overlapping protein domains from
When the “Protein domains” option is selected, as well as reporting the IDs of overlapping protein domains, the VEP output will also show links to a 3D protein viewer if a missense variant overlaps an Alphafold 3D protein model.
4. Now click the ‘Run’ button to submit the VEP query… et voilá!
5. Click on the ’AlphaFold Model’ button in the ‘Protein Matches’ column to launch the interactive AlphaFold protein structure visualisation.
. 8/ You can now browse the 3D protein structure. Exons/protein domains can be displayed to provide more context. Use your mouse to rotate and zoom in/out the model. Click on the ‘Focus’ button to zoom in to the variant of interest.
We are thrilled to announce the release of WormBase ParaSite 18 (WBPS18), our most comprehensive update yet, with the largest number of new genomes and updated annotations since the resource was first launched! WBPS18 hosts 240 different genomes from 181 species.
Highlights
Addition of 37 new genome assemblies for 14 existing and 12 new species, including a massive update of the trematode flukes genomes.
Assembly and/or annotation updates for 3 genomes.
Pairwise whole genome alignments are now available between all genomes in taxonomic groups.
Integration of over 2 million AlphaFold 3D protein structures for 239 genomes.
Updated RNA-Seq pipeline.
Release 17 remains accessible.
Mailing list.
New Species
Aphelenchoides besseyi is a plant-parasitic nematode that causes root-knot disease in a variety of crops.
Aphelenchoidesbicaudatusis a plant-parasitic nematode that causes damage to a variety of crops, including potatoes, tomatoes, and strawberries.
Aphelenchoides fujianensis is a plant-parasitic nematode that causes damage to a variety of crops, including rice, wheat, and soybeans.
Caenorhabditis auriculariae is a free-living nematode that is used as a model organism in scientific research.
Meloidogyne chitwoodiis a plant-parasitic nematode that causes root-knot disease in a variety of crops.
New Trematodes (Flukes)
Dicrocoelium dendriticum is a trematode fluke that is a parasite of ruminants, including cattle, sheep, and goats.
Caenorhabditis briggsae: 2 new long-read sequenced assemblies with RNASeq-assisted (long and short read sequencing data) annotations were imported for QX1410 and VX34.
Schmidtea mediterranea: A new phased long-read sequenced assembly with RNASeq-assisted annotation. Both haplotypes are available: Haplotype 1, Haplotype 2. Also, low confidence gene models from the same annotation source have been added as Jbrowse tracks (Haplotype 1 Jbrowse, Haplotype 2 Jbrowse) and can be loaded from the left menu. You can use the Jbrowse search bar to search for these genes.
Angiostrongylus costaricensis: A new alternative set of gene models from da Silva EMG et al., 2022, for the Crissiumal strain has been added as Jbrowse tracks (Jbrowse) and can be loaded from the left menu. You can use the Jbrowse search bar to search for these genes.
Schistosoma curassoni: A new long-read sequenced assembly with RNASeq-assisted annotation.
Schistosoma haematobium (UoM_Shae.V3): An update to the previous SchHae_2.0 reference genome. The new reference is long-read sequenced with RNASeq-assisted gene annotation.
Schistosoma mansoni: Following incorporation of HiC data, and further PacBio analysis to resolve repeats a near-complete chromosomal assembly v10 is now available. Gene models were generated using RNASeq (including IsoSeq) data. More than 4500 genes have been manually curated.
Schistosoma margrebowiei: A new long-read sequenced assembly with RNASeq-assisted (including IsoSeq) annotation.
Schistosoma mattheei: A new long-read sequenced assembly with RNASeq-assisted (including IsoSeq) annotation.
Trichobilharzia regenti: A new long-read sequenced assembly with RNASeq-assisted (including IsoSeq) annotation.
Pairwise Whole Genome Alignments
In WBPS18, users can now browse and visualise whole genome alignments between genomes in the same taxonomic group. Cactus, a next-generation aligner that stores whole-genome alignments in a graph structure, was utilised to calculate multiple pairwise genome alignments between WBPS genomes. These genomes have been assigned to 5 big taxonomic groups, and pairwise alignments can be viewed between the genomes of each group.
Taxonomic Group Identifier
Genomes
Anc001
Platyhelminthes
Anc005
Clade I nematodes
Anc020
Clade III nematodes
Anc028
Clade V nematodes
Anc029
Clade IV nematodes
These alignments are viewable through the Ensembl Genome Browser. Follow this tutorial to learn how to browse these alignments.
Fetching genome alignments is a resource-consuming process, so please be patient while you are trying to load the alignment tracks and hit the refresh buttons in case any errors appear.
Ensembl Genome Browser view with pairwise alignment tracks in WBPS18. The view focuses on the genomic region of the Caenorhabditis parvicaudaCSP21.g2830 gene. Aligned genomic regions of other Caenorhabditis and Pristionchus genomes have been loaded (pink tracks).
The whole genome alignments are also available for download in HAL file format, through our FTP site.
In release 18, WormBase ParaSite incorporated these models. Now, over 2 million 3D protein structure models from AlphaFold are available for 239 genomes species. The AlphaFold predicted model (example) is available through the top-left menu on the transcript pages.
Follow this tutorial to learn how to browse your favourite protein models in WormBase ParaSite.
From release 16, due to the retirement of the RNASeq-er API, RNASeq data of genomes that had assembly or annotation updates were not re-aligned. For 34 updated genomes (releases 16 and 17), this affected their RNA-Seq alignment tracks available on the Ensembl Browser, Jbrowse and their gene counts on the Gene expression platform.
For this release, we utilised the novel RNA-Seq pipeline developed by the Ensembl Metazoa group to re-align the RNA-Seq data for these 34 genomes. The pipeline is using HISAT2 to perform the read alignments and htseq-count to count reads in features.
If you would like a new gene expression study to be included, please contact us.
“How can I browse RNA-Seq data in WormBase ParaSite?”
Browse our gene expression pages (use the “Gene Expression” button in the “Navigation” box on the genome landing page of each genome) to explore the gene expression of your gene of interest across different samples and perform differential expression analysis.
Our previous Release 17 has been archived and remains accessible here.
Subscribe to our new mailing list
If you’re using WormBase ParaSite, then you should subscribe to our mailing list so you can frequently receive updates on WormBase Parasite new releases, like this one, new features, career opportunities, and upcoming conferences and workshops.
Subscribing to the WormBase Parasite mailing list is easy! Simply fill out the subscription form at https://forms.gle/gN5NqZJ3FeVDkpTA8 and stay updated with our latest news.
Problems/issues
If you encounter any problems with latest release, including broken links or missing genes, please don’t hesitate to contact us.
From release 18 (WBPS18), users can browse and visualise whole genome alignments between genomes in the same taxonomic group.
Cactus, a next-generation aligner that stores whole-genome alignments in a graph structure, was utilised to calculate multiple pairwise genome alignments between WBPS genomes. These genomes have been assigned to 5 big taxonomic groups, and pairwise alignments can be viewed between the genomes of each group.
“I would like to view how my genome of interest aligns against some related genomes”
1. Go to the Ensembl Genome Browser of a region of interest (example). You can do this by using the Ensembl button in the “Genome Browser” column of your genome of interest in the Genome list. Alternatively, if you are already on a gene page click the Region in Detail button under “Genomic Context”.
2. You are on the Genome Browser (Ensembl) page for your region of interest. You can click here to learn more about it.
3. Click “Configure tracks” to display the track options dialog box:
4. Choose the “Pairwise whole genome alignments” option of the left navigation menu in the box.
5. Choose the genome or multiple genomes you would like to visualise pairwise genomic alignments against. Then click the tick box at the top right of the box.
6. The genomic regions of the selected genomes that align to the region of interest appear as pink tracks on the browser. One track per genome:
7. You can click on the genome initials on the left of the view to view the full name of the genome each track corresponds to.
8. You can click on the aligned segments (pink tracks) to browse the aligned genomic region of the corresponding genome:
If you’re using WormBase ParaSite, then you should subscribe to our mailing list so you can frequently receive updates on WormBase Parasite database improvements and new features, career opportunities, and upcoming conferences and workshops.
Subscribing to the WormBase Parasite mailing list is easy! Simply fill out the subscription form at https://forms.gle/gN5NqZJ3FeVDkpTA8 and stay updated with our latest news.
Click on your species of interest (e.g. Schistosoma mansoni) and search for abl1 gene
Click on the abl1 gene in the search results page:
We are now on the gene tab for Schistosoma mansoni abl1. This page gives us an overview of the information available at the gene level and shows the transcript table, also summary with links to external databases and a gene diagram.
Protein information in WormBase ParaSite is associated with transcripts of a gene. Therefore, we will navigate to the Transcript tab by clicking on the transcript ID (Smp_246700.1) in the transcript table.
The domains of the protein product of the transcript can be viewed graphically by clicking on Protein summary or in a table format by clicking on Domains and features. To access the 3D protein structure viewer click on AlphaFold predicted model.
6/6 And more so, we can now view the shiny new interactive 3D AlphaFold structure for our protein of interest. The interactive molecular viewer visualizes the structure, coloured by the per-residue pLDDT confidence measure.
“What functionalities does the 3D protein structure viewer offer?“ The central panel (viewer) annotates the model with regions of high confidence (blue) to low confidence (orange) with its protein sequence displayed above. It’s very simple to use it: Just drag and drop with your mouse pointer to rotate the stucture and scroll to zoom in and zoom out! You can rapidly zoom in a specific residue by clicking on it in the protein sequence above the model. The right hand panel enables highlighting of one or more exons and protein features (Gene3D, PROSITE, Pfam, etc) which are controlled by clicking on the eye icon.
Undoubtedly, AlphaFold opens new research horizons and we would like to encourage our users to go and explore this ground-breaking dataset in WormBase ParaSite by searching and testing the 3D protein models of your favourite worm.
We would love to hear the feedback of the helminth research community on the AlphaFold resource, the structure predictions, how you think WormBase ParaSite could facilitate your interaction with this unique dataset, or anything else. So please feel free to contact us ([email protected])
Have you been waiting for our next release? The wait is finally over! Despite being understaffed and underfunded, WormBase ParaSite launches its new release 17 with an exciting list of new/updated genomes and new features:
Integration of AlphaFold 3D protein structures for 8 species.
Addition of 11 new genome assemblies of which 6 are new species.
Annotation updates for 2 genomes.
Gene-phenotype associations are now available in our FTP directory.
Improvements in the way external gene synonyms are integrated and displayed.
Deployment of WebApollo instances for more species to further facilitate community curation.
Schistosoma mansoni We are proud to present the best ever S. mansoni assembly created until the next one! S. mansoni (blood fluke) is one of the three major infectious agents responsible for the chronic debilitating disease schistosomiasis found throughout Africa and South America. Its previous assembly (v7) was substantially upgraded following the incorporation of HiC data, and further PacBio analysis to resolve repeats leading to the new, near-complete chromosomal assembly (v9) presented in this release! More information can be found in this pre-print by Buddenborg et al 2021.
Fasciola hepatica This sheep liver fluke or common liver fluke, is a parasite that infects humans, cows and sheep, causing fascioliasis. The previous assembly and annotation which were submitted in 2013 were drastically improved in this release as described by McNulty et al., 2017.
Heterodera glycines The soybean cyst nematode (SCN), Heterodera glycines, is a plant-parasitic nematode infecting soybean roots. The previous assembly and annotation which were submitted in 2013 were drastically improved in this release as described by Masonbrink et al., 2021.
AlphaFold 3D protein structures are now browsable from WormBase ParaSite
For the first time in WormBase ParaSite, you can browse 3D protein structures visualised in a user-friendly and fun-to-explore viewer! Users can now explore the 3D protein models of their favourite genes in 8 WBPS species:
Table 1. Number of structural predictions for complete proteomes of parasitic worms in AlphaFold DB v.2.1.2 and WormBase ParaSite 17.
Want to know more about how to visualise AlphaFold Protein Structure in WormBase ParaSite? The page containing the 3D protein structure for a protein of interest is located under the transcript summary page on the left-side “Transcript-based displays” menu under “AlphaFold predicted model”. For a step-by-step tutorial click here. You can now view the shiny new interactive 3D AlphaFold structure for our protein of interest:
“What functionalities does the 3D protein structure viewer offer?”
The central panel (viewer) annotates the model with regions of high confidence (blue) to low confidence (orange) with its protein sequence displayed above. It’s very simple to use it: Just drag and drop with your mouse pointer to rotate the stucture and scroll to zoom in and zoom out! You can rapidly zoom in a specific residue by clicking on it in the protein sequence above the model. The right hand panel enables highlighting of one or more exons and protein features (Gene3D, PROSITE, Pfam, etc) which are controlled by clicking on the eye icon.
“More species please?”
As AlphaFold plans to expand their database in 2022 to cover additional proteomes of more species, as well as a much larger proportion of all catalogued proteins, we anticipate that more parasitic worms will make it into the AlphaFold database. WormBase ParaSite will then be able to enable “AlphaFold predicted model” viewer for these species. You can monitor the list of species present in the AlphaFold database here.
Phenotypes available on the FTP
“Is there any way to export gene-phenotype associations from WormBase ParaSite?”
In our previous release 16 we were happy to announce the import of over 350,000 C. elegans and S. mansoni gene-phenotype associations from our sister site, WormBase (C. elegans example). These associations were also propagated between orthologs to all our hosted species (H. polygyrus example). In this release we also made these gene-phenotype associations available in our FTP directory.
Gene-Phenotype associations have been deposited in GAF version 2.1 files in our FTP directory. For each species you will find 2 different GAF files:
<SPECIES>.<BIOPROJECT>.WBPS17.orthology-inferred_phenotypes.gaf.gz (Example): This file contains species-specific orthology inferred gene-phenotype associations from C. elegans or S. mansoni (you can find which one in the 8th column). A file like this is availabe for every species in WormBase ParaSite.
<SPECIES>.<BIOPROJECT>.WBPS17.phenotypes.gaf.gz (Example): This file contains original gene-phenotype associations for the species of interest. For the moment we only host original gene-phenotype association data for C. elegans and S. mansoni and therefore this file is available for these 2 species.
Sometimes GAF files are hard to interpret. For this reason, in the header of these files, we have included very useful column descriptions and general information. Enjoy!
Curated gene synonyms
“Gene synonyms are really useful, but is there any way to export them?”
In our release 16 we announced the import of literature-curated gene name synonyms for Strongyloides stercoralis. These synonyms are searchable (via the top-right search box) and appear in the new “Synonyms” line of the gene page. In this release we also made these synonyms exportablevia WormBase ParaSite Biomart!
To export curated gene synonyms for S. stercoralis: First navigate to the WormBase ParaSite Biomart and submit your Query Filters. Then enable the “Curated Gene Synonym ID” under the “EXTERNAL DATABASE REFERENCES AND ID CONVERSION” of the Output Attributes like shown here:
Then click Results and you will get a list of your selected genes and their curated synonyms:
We need your help on this! At the moment, we have only imported External Curated Gene synonyms for S. stercoralis but we would love to import more synonyms for other species. For this reason we need your help! If you have (or someone you know has) curated gene synonyms for any of the genomes we are hosting in WormBase ParaSite please contact us ([email protected]) and we will be able to integrate them.
Community Annotation
“I would like to manually curate gene models on a genome I have previously submitted (or I am about to submit) to WormBase ParaSite.”
We supplement our in-house gene curation platform by hosting Web Apollo instances for an increasing number of genomes. Web Apollo is an instanteneous, collaborative genomic annotation editor available on the web. Users can request relevant Web Apollo instances to be deployed from us. We would be happy to provide the relevant training! Please feel free to contact us ([email protected]) to make such a request.
The latest AlphaFold database update in January 2022, added three-dimensional (3D) structures of complete proteomes for 27 new organisms relevant to neglected tropical diseases and antimicrobial resistance including 7 parasitic worms from WormBase ParaSite.
Determining the three-dimensional (3D) structure of a protein has been a computational challenge for decades, and can provide essential insights into the underlying mechanisms of the proteins’ functions. AlphaFold is an AI system, created in partnership between DeepMind and the EMBL-European Bioinformatics Institute (EMBL-EBI), that makes state-of-the-art accurate predictions of a protein’s structure from its amino-acid sequence. Launched in July 2021, the database initially released ~350,000 3D structures of the human proteome and other 20 biologically-significant organisms such as C. elegans, E. coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria.
AlphaFold’s latest release, announced on the 28th January 2022, focused on organisms with a UniProt reference proteome that are relevant to Neglected Tropical Disease or antimicrobial resistance. The selection of 27 new species was based on priority lists compiled by the World Health Organisation and included 7 parasitic worms (Table 1). AlphaFold predicted these structures based on their Uniprot reference proteomes, provided through WormBase ParaSite.
Table 1.Structural predictions for complete proteomes of parasitic worms in AlphaFold DB v.2.1.2
As the majority of helminth proteins do not currently have data from direct protein characterisation studies, the prediction of their 3D structures will provide researchers with a powerful tool in predicting their mechanisms of function and their role within the cell. Scientists can also develop in silico screening assays against drugs that work with the protein’s unique shape.
The database is expected to grow further in 2022 and cover additional proteomes, as well as a much larger proportion of all proteins in Uniprot (UniRef90).
If you cannot find the AlphaFold predicted structure for the protein of interest of your favourite worm, here are some suggestions:
Multiple isoforms are not covered in AlphaFold DB, so make sure you are using the most appropriate protein from the reference proteome of your species.
Try searching by protein or gene name rather than specific UniProt accession.
Proteins with high sequence similarity will most likely have identical 3D structure predictions. If you don’t see the sequence you are looking for, try searching for it using the EBI Protein Similarity Search tool against the sequences in the AlphaFold DB and/or using the WormBase ParaSite BLASTp tool against species which already have their proteins in the AlphaFold database. If the query sequence is not available then a structure prediction with a similar sequence to the query may be available.
Contact us! If your favourite species or protein is not available yet, keep watching for further announcements (EMBL-EBI news, EMBL-EBI Twitter, DeepMind Twitter) , or let us know.
Undoubtedly, AlphaFold opens new research horizons and we would like to encourage our users to go and explore this ground-breaking dataset (https://alphafold.ebi.ac.uk/) by searching and testing the 3D protein models of your favourite worm. Over time, we are planning to create a deeper integration of this dataset into WormBase ParaSite, so it will be easier to search, analyse and interpret.
We would love to hear the feedback of the helminth research community on the AlphaFold resource, the structure predictions, how you think WormBase ParaSite could facilitate your interaction with this unique dataset, or anything else. So please feel free to contact us ([email protected]).