Announcing WormBase ParaSite 19

We are pleased to announce the 19th release of WormBase ParaSite (WBPS19), bringing new and updated genomes, data from RNASeq studies, and additional annotation files for download. WBPS19 hosts 274 different genomes representing 208 distinct species, includes an additional 41 new genome assemblies for 14 existing and 27 new species.

Additional Species

Bradynema listronoti (PRJNA842945), known to infect the carrot weevil, Listronotus oregonensis.
Levipalatum texanum (PRJNA655932), a free-living nematode in the family Diplogastridae. Found in the South-Eastern United States, the species lives in association with scarab beetles.
Koerneria luziae (PRJNA655932), a nematode from the family Neodiplogasteridae. K. luziae is thought to have a wide range of distribution, from Europe to East Asia.
Diplogasteroides magnus (PRJNA655932), a hermaphroditic nematode from the family Diplogastridae. Dauer juveniles of the species are present on wood cockchafer..
Bunonema rgd898 (PRJNA655932), a pathenogenic species that belongs to the Diplogastridae family.
Steinernema hermaphroditum (PRJNA982879), an entomopathogenic nematode of the genus Steinernema.
Trissonchulus sp. WLG1_4 (PRJNA953805), a free-living nematode.
Trissonchulus latispiculum (PRJNA953805), a free-living nematode.
Trileptium ribeirensis (PRJNA953805), a nematode from the family Thoracostomopsidae.
Theristus sp. LFF4_11 (PRJNA953805), a free-living nematode from the family Xyalidae.
Sabatieria punctata (PRJNA953805), a free-living marine nematode from the family Comesomatidae.
Rhynchonema sp. JSB1_4 (PRJNA953805), a free-living nematode.
Ptycholaimellus sp. GST1_10 (PRJNA953805), a free-living marine nematode from the family Chromadoridae.
Paralinhomoeus sp. GSCO2_6 (PRJNA953805), a free-living marine nematode from the family Linhomoeidae.
Mesodorylaimus sp. YZB2_4 (PRJNA953805), a free-living nematode.
Linhomoeus sp. GSCO2_2 (PRJNA953805), a free-living nematode from the family Linhomoeidae.
Epsilonema sp. ZAB3_2 (PRJNA953805), a free-living nematode.
Microlaimidae sp. YZB2_3 (PRJNA953805), a nematode belonging to the order Desmodorida.
Enoplolaimus lenunculus (PRJNA953805), a free-living nematode.
Allodiplogaster sudhausi (PRJEB48369), a free-living hermaphroditic nematode from the Diplogastridae family. 
Cylicocyclus nassatus (PRJEB63274), the most prevalent and abundant species of the Cyathostomins, a complex of 50 intestinal parasite species that infect horses and wild equids.
Paragonimus skrjabinimiyazakii (PRJNA245325), one of two species of Paragonimus thought to cause Paragonimiasis in humans.
Paragonimus kellicotti (PRJNA179523), or the North American lung fluke, is a trematode of the family Paragonimidae, responsible for causing a food-borne infection.
Paragonimus heterotremus (PRJNA284523),  or the human lung fluke, is a member of the Paragonimus family. Alongside Paragonimus westermani, the species is known to routinely infect humans, causing paragonimiasis infection.
Parelaphostrongylus tenuis (PRJNA729714), also known as brainworm, is a neurotropic parasite commonly found in white-tailed deer, damaging their central nervous system. 
Auanema sp. JU1783 (PRJEB51845), a trioecious free-living nematodes. 
Mesorhabditis spiculigera (PRJEB59059). a free-living nematode that exhibits great potential for recycling of organic matter.

Genome and Annotation Updates

Wuchereria bancrofti (PRJNA275548), a human parasite that is the major cause of lymphatic filariasis.
Strongyloides stercoralis (PRJNA930454), or human threadworm, is a wide-spread, minute gastro-intestinal parasite of humans, occurring principally in the tropics and sub-tropics. 
Heterodera schachtii (PRJNA522950), also known as the beet cyst nematode or sugarbeet nematode, is a plant pathogenic parasite which can infect more than 200 plants including the model plant Arabidopsis thaliana
Fasciola hepatica (PRJEB58756), or sheep liver fluke, is a parasite that infects humans, cows and sheep.
Mesorhabditis belari (PRJEB61636), a diecious free-living worm which reproduces by pseudogamy
Schistosoma japonicum (PRJNA724792), or Asian blood fluke, is a parasite of significant public health importance in China, Taiwan, the Philippines and Southeast Asia.
Echinococcus granulosus (PRJNA754835), a member of the Cyclophyllidea, which comprise the majority of tapeworms that are of medical importance. 
Necator americanus (prjna1007425), or human hookworm, lives in the intestines and is particularly harmful to children, causing chronic anemia, stunting growth and impairing intellectual development.
Nippostrongylus brasiliensis (PRJNA994163), or the rodent hookworm, is a strongylid nematode parasite that commonly infects the gastrointestinal tract of rats.
Ditylenchus destructor (PRJNA800207) or potato rot nematode, is an endoparasitic, migratory plant nematode.
Meloidogyne enterolobii (PRJEB36431),  a root-knot nematode, infects a variety of crops such as eggplant, bell pepper, soybean, sweet potato, tobacco, tomato, or watermelon. 
Paragonimus westermani (PRJNA219632), a lung fluke found in Southeast Asia and Japan, it is the most common cause of paragonimiasis. is the most common cause of paragonimiasis. 

Genome Update

Halicephalobus mephisto (PRJNA528747), a free-living worm feeding on subterranean bacteria.

Full Annotation Update

Ancylostoma ceylanicum (PRJNA231479), or mammalian hookworm, is a parasite that attaches to the intestine of animals, particularly humans and hamsters, causing anaemia in the host. 

RNASeq Studies

The RNA-Seq processing pipeline developed by Ensembl Metazoa has been used to process:

Schistosoma mansoni

Nippostrongylus brasiliensis

Steinernema hermaphroditum

Subscribe to our mailing list

If you’re using WormBase ParaSite, please subscribe to our mailing list to receive updates on WormBase Parasite new releases, like this one, new features, career opportunities, and upcoming conferences and workshops.

Subscribing to the WormBase Parasite mailing list is easy! Simply fill out the subscription form at and stay updated with our latest news.

Problems/issues

If you encounter any problems with latest release, including broken links or missing genes, please don’t hesitate to contact us.

Tutorial: Display variants on AlphaFold-predicted 3D protein structures

You’ve sequenced your samples and identified variants. Great!

Now, here’s how you can use WormBase ParaSite’s new feature to find out the effect of your variants on 3D AlphaFold protein structures and interactions:

  1. The Variant Effect Predictor (VEP) is what you’re going to need for this task. You can find it in the Tools page of WormBase ParaSite.

2. Click the ‘Variant Effect Predictor’ button to open the VEP web tool and enter your input data using instructions in the documentation:

3. Now, make sure that the ‘Protein domains’ option from the ‘Protein Annotation’ panel is selected. It will retrieve overlapping protein domains from

When the “Protein domains” option is selected, as well as reporting the IDs of overlapping protein domains, the VEP output will also show links to a 3D protein viewer if a missense variant overlaps an Alphafold 3D protein model.

4. Now click the ‘Run’ button to submit the VEP query… et voilá!

5. Click on the ’AlphaFold Model’ button in the ‘Protein Matches’ column to launch the interactive AlphaFold protein structure visualisation.

. 8/ You can now browse the 3D protein structure. Exons/protein domains can be displayed to provide more context. Use your mouse to rotate and zoom in/out the model. Click on the ‘Focus’ button to zoom in to the variant of interest.

Tutorial: Visualise pairwise genome alignments between WBPS genomes

From release 18 (WBPS18), users can browse and visualise whole genome alignments between genomes in the same taxonomic group.

Cactus, a next-generation aligner that stores whole-genome alignments in a graph structure, was utilised to calculate multiple pairwise genome alignments between WBPS genomes. These genomes have been assigned to 5 big taxonomic groups, and pairwise alignments can be viewed between the genomes of each group.

“I would like to view how my genome of interest aligns against some related genomes”

  • 1. Go to the Ensembl Genome Browser of a region of interest (example). You can do this by using the Ensembl button in the “Genome Browser” column of your genome of interest in the Genome list.
    Alternatively, if you are already on a gene page click the Region in Detail button under “Genomic Context”.

  • 2. You are on the Genome Browser (Ensembl) page for your region of interest. You can click here to learn more about it.
  • 3. Click “Configure tracks” to display the track options dialog box:

  • 4. Choose the “Pairwise whole genome alignments” option of the left navigation menu in the box.

  • 5. Choose the genome or multiple genomes you would like to visualise pairwise genomic alignments against. Then click the tick box at the top right of the box.
  • 6. The genomic regions of the selected genomes that align to the region of interest appear as pink tracks on the browser. One track per genome:

  • 7. You can click on the genome initials on the left of the view to view the full name of the genome each track corresponds to.

  • 8. You can click on the aligned segments (pink tracks) to browse the aligned genomic region of the corresponding genome:

Subscribe to our mailing list!

If you’re using WormBase ParaSite, then you should subscribe to our mailing list so you can frequently receive updates on WormBase Parasite database improvements and new features, career opportunities, and upcoming conferences and workshops.

Subscribing to the WormBase Parasite mailing list is easy! Simply fill out the subscription form at https://forms.gle/gN5NqZJ3FeVDkpTA8 and stay updated with our latest news.

AlphaFold 3D protein structures are now browsable from WormBase ParaSite

“How can I visualise an AlphaFold Protein Structure in WormBase ParaSite?”

  1. Navigate to WormBase ParaSite and click on the Genome List button:
  1. Click on your species of interest (e.g. Schistosoma mansoni) and search for abl1 gene

  1. Click on the abl1 gene in the search results page:

  1. We are now on the gene tab for Schistosoma mansoni abl1. This page gives us an overview of the information available at the gene level and shows the transcript table, also summary with links to external databases and a gene diagram.

  1. Protein information in WormBase ParaSite is associated with transcripts of a gene. Therefore, we will navigate to the Transcript tab by clicking on the transcript ID (Smp_246700.1) in the transcript table.
  1. The domains of the protein product of the transcript can be viewed graphically by clicking on Protein summary or in a table format by clicking on Domains and features. To access the 3D protein structure viewer click on AlphaFold predicted model.
  1. 6/6 And more so, we can now view the shiny new interactive 3D AlphaFold structure for our protein of interest. The interactive molecular viewer visualizes the structure, coloured by the per-residue pLDDT confidence measure.

What functionalities does the 3D protein structure viewer offer?
The central panel (viewer) annotates the model with regions of high confidence (blue) to low confidence (orange) with its protein sequence displayed above. It’s very simple to use it: Just drag and drop with your mouse pointer to rotate the stucture and scroll to zoom in and zoom out! You can rapidly zoom in a specific residue by clicking on it in the protein sequence above the model. The right hand panel enables highlighting of one or more exons and protein features (Gene3D, PROSITE, Pfam, etc) which are controlled by clicking on the eye icon.


Undoubtedly, AlphaFold opens new research horizons and we would like to encourage our users to go and explore this ground-breaking dataset in WormBase ParaSite by searching and testing the 3D protein models of your favourite worm.

We would love to hear the feedback of the helminth research community on the AlphaFold resource, the structure predictions, how you think WormBase ParaSite could facilitate your interaction with this unique dataset, or anything else. So please feel free to contact us ([email protected])

Announcing WormBase ParaSite Release 16

We are delighted to announce the 16th release of WormBase ParaSite. Highlights of this release include:

  • Addition of six new genome assemblies
  • Annotation updates for 12 genomes
  • Addition of phenotype data for C. elegans genes, imported from WormBase
  • Addition of gene name synonyms for a set of Strongyloides stercoralis genes
  • Introduction of an archiving service
  • Deprecation of CEGMA, and introduction of BUSCO as an annotation quality metric
  • New repeat feature libraries for all genomes, generated with RepeatModeler2.
New genomes

This release sees the addition of 6 new assemblies, of which 2 are new species:

  • Atriophallophorus winterbourni (new species) – a digenean trematode parasite native to the lakes of New Zealand (from Zajac et al., 2021).
  • Bursaphelenchus okinawaensis (new species) – a lab model for the root knot nematode Bursaphelenchus xylophilus (from Sun et al., 2020).
  • Bursaphelenchus xylophilus – a new chromosome-scale assembly from Dayi et al., 2020.
  • Caenorhabditis remanei – a new chromosome-scale assembly from Teterina et al., 2020.
  • Clonorchis sinensis – a new chromosome-scale assembly from Young et al., 2021.
  • Oscheius tipulae – a new chromosome-scale assembly from Gonzalez de la Rosa et al., 2021.
Annotation updates

We present full annotation updates for a set of genomes:

  • Pristionchus arcanus
  • Pristionchus entomophagus
  • Pristionchus exspectatus
  • Pristionchus fissidentatus
  • Pristionchus japonicus
  • Pristionchus maxplancki
  • Pristionchus mayeri
  • Parapristionchus giblindavisi
  • Micoletzkya japonica

Three further genomes have seen manual gene model curation by the community:

  • Dirofilaria immitis
  • Haemonchus contortus (PRJEB506)
  • Strongyloides stercoralis

For these species, gene models that have been manually verified/curated can be identified with a note in the “Annotation Method” line on their gene page, for example:

For all 12 species with annotation updates, previous gene models can be visualised in JBrowse in the “WBPS15 gene models track”:

Deprecated gene models and assemblies (and associated analyses) also remain accessible on our new archive site. See this blog post for further information on how to access archived data.

Phenotype data

This release sees the import of over 350,000 C. elegans gene-phenotype associations from our sister site, WormBase. These associations have been curated from the literature over many years, from RNAi and variant data. The data is accessible on C. elegans gene pages:

As the majority of helminth genes don’t currently have data from direct phenotypic assays, phenotypes have also been propagated between orthologues. In the example below, we can see the phenotypes associated with the C. elegans orthologues of a H. polygyrus gene:

In total, we now host phenotype data from 13,350 studies (of which 13,349 C. elegans and 1 Schistosoma mansoni).

Gene name synonyms

We have imported 375 gene name synonyms for Strongyloides stercoralis. These gene names have been curated from the literature by Jonathan Stoltzfus of Millersville University; we hope that storing these synonyms will improve discoverability and connectivity to the literature. The synonyms are searchable, and where present appear in the new “Synonyms” line of the gene page:

We can now see papers referring to this gene in the literature tab:

Many thanks to Jonathan Stoltzfus for providing this data!

deprecation of cegma and BUSCO update

In recognition of the fact that CEGMA has not been supported for several years, we have deprecated CEGMA scores for WormBase ParaSite assemblies. Instead, we now report two BUSCO metrics: one based on the genome assembly and another on the annotation. See the BUSCO manual for further information on these different modes and on interpreting BUSCO scores.

Repeat feature update

We have run RepeatModeler2 to generate custom transposable element (TE) sequence models for all assemblies. The TE sequence files are available to download from our FTP site (repeat-families.fa.gz files):

These custom TE models have also been used for masking, and are available as tracks on the genome browsers. Many thanks to Sanger Institute apprentice Charles Nunn for generating this data.

misc
  • Updated protein domain annotation (InterProScan v. 5.51-85.0), functional annotation and cross referencing for all genomes.
  • Gene identifiers have been mapped between WormBase ParaSite versions for all Pristionchus sp. updates using Ensembl’s mapping method. Where mapped, previous gene IDs are available in GFF downloads and on gene pages.
  • Recomputed orthologues, paralogues and protein families (Ensembl Compara v. 101). Where multiple assemblies exist for the same species, we have only used one assembly for orthology calculations. This means that some assemblies now do not have orthologue and parologue data.

Alternative gene set for Necator americanus

Logan et al. (2020) have published an alternative set of gene predictions for Necator americanus in PLoS Neglected Tropical Diseases, based on both RNA-seq and proteomics data and generated via the MAKER pipeline.

Their gene predictions can be downloaded from the WormBase ParaSite FTP site at:
ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/datasets/logan_2020_32453752

Thanks to Javier Sotillo Gallego for providing the data!

Brugia pahangi material available

The Devaney group have small numbers of adult B. pahangi and larger numbers of Mf available to others for research purposes.  If this would be useful, please contact [email protected] with approximate numbers, life stage required and whether fresh or frozen material is suitable. The B. pahangi life cycle is funded by a grant from the Wellcome Trust (208390/Z/17/Z).

Announcing release 14

We are pleased to announce the 14th release of WormBase ParaSite, bringing a new S. mediterranea assembly, and 8 other new and updated genomes.


New and updated genomes

Platyhelminths

We are happy to announce these new genomes of flatworms:

There is also an annotation update for Mesocestoides corti (PRJEB510) created with recently sequenced RNASeq data. It supports 5076 new genes and 8367 revised structures of the previous AUGUSTUS-only annotation.

If flatworm genomics is relevant to your work, be sure to also visit PlanMine, run by the authors of the Schmidtea genome. It contains many different assemblies from a number of free-living flatworms, phylogenetic data, and more.

Please note that we are deprecating the assembly SmedGD_c1.3 for Schmidtea mediterranea (PRJNA12585), corresponding to Robb et al. (2007), and intend to remove it once we are confident that no new research is being based on this assembly. Do let us know if you rely on it, or if there are good reasons for us to keep both Schmidteas around.

Nematodes

There are two new clade IV genomes, potentially relevant to agricultural research: Ditylenchus dipsaci (PRJNA498219), a plant pest, and an updated genome of an entomopathogenic nematode Steinernema carpocapsae (PRJNA202318).

There are also two genomes of free-living clade V nematodes. First is the genome of Halicephalobus mephisto (PRJNA528747), an extremophile found in deep rock fracture water in several gold mines in South Africa. We also have a genome of Mesorhabditis belari (PRJEB30104), an animal exhibiting an interesting pattern of reproduction: the eggs only mature after being activated by the males, which nevertheless do not pass on any genetic material.

Finally, we update WormBase core genomes to the WormBase version WS271.

Comparative genomics: bringing smaller trees

We remove altogether 7 genomes from our comparative genomics analysis for which there is a clearly better alternative genome of the same species.

We are hoping that this will make our results more robust overall, and their interpretation easier.

If you still need the old results, ortholog and paralog files from the last release are available through our FTP site. Apart from the previous S. mediterranea (PRJNA12585), we do not plan to remove any other genomes from our portal.

RNASeq studies

Our collaborators, the Functional Genomics group at the European Bioinformatics Institute, continue to process all public RNASeq studies through their plaform, RNASeq-er.

More studies

New data that has been produced in the last months, and more inclusive curation, helped us bring the total of studies processed on our site to 201, across 48 different species. This includes 30 studies for S. mediterranea, now aligned to the new assembly.

The total amount of studies RNASeq-er has data for is, as of July 2019, 639. Apart from 201 we have the results for, there are also 301 unannotated C. elegans or P. pacificus studies, which we skipped to reduce toil involved. The other 137 studies miss metadata or were consciously excluded, either because they did not have sufficiently many replicates, used a non-standard protocol like small miRNA-seq or Ribo-Seq, or because the authors asked us to suppress it.

Do contact us if you would like us to include a particular study, or if you have metadata that we are missing. It would be particularly helpful if you could let us know of any additional publications relating to our studies, as they are not always linked to archive records.

UI updates

There are no major changes to this aspect of our service since we rolled it out in the last release: you can browse through a list of studies by following a link on the species page, and access per-gene results on the gene page through the tab on the left.

We have improved how expression data is organized within the JBrowse track selector, separating studies into categories. If you usually access the gene expression studies through the “Gene expression” tab, have a look at how the studies are organized in the track selector (example link, S. mansoni) – it provides an interesting alternative way of viewing the results.

Analysis updates

Differential expression result files should now be slightly more convenient, and we hope that you will be able to open the files without trouble in any popular spreadsheet software. We are also providing complete results for each contrast – useful if you want to apply your own filtering criteria.

There is currently a slight non-uniformity in our count and TPM results, as RNASeq-er are switching between two different quantification methods. Either HTSeq (previous) or FeatureCounts (new) are used to quantify aligned reads within each study.

Helminth Bioinformatics Course in Accra, Ghana: Open for Applications

We’re pleased to announce that a new Wellcome Advanced Course in helminth bioinformatics is now open for applications. The course is aimed at Africa-based researchers at various levels. It will be a hands on and practical introduction to bioinformatics for helminth researchers, covering:

  • The use of public databases (including WormBase ParaSite) to explore gene and protein function
  • Genome assembly
  • Variant calling
  • Differential gene expression
  • Unix/linux command-line and some basic R

Dates and Deadlines

The course will be held at the West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Accra, Ghana, from 8 -13th September 2019.

The course is free to attend for non-commercial applicants, and a number of bursaries are available to cover travel, accommodation and sustenance.

The deadline for applications is 9th May.

More details are available here: https://coursesandconferences.wellcomegenomecampus.org/our-events/helminth-bioinformatics-ghana-2019/

 

 

Announcing WormBase ParaSite 12

We are pleased to announce the 12th release of WormBase ParaSite, bringing new and updated genomes, and better handling of old identifiers and history.

New and updated genomes

The biggest updates of the release are probably two tapeworm genomes: Hymenolepis microstoma (PRJEB124), an update to a chromosome-level assembly, and a new genome Taenia multiceps (PRJNA307624).

There are also new clade IV nematode genomes: root-knot nematodes Meloidogyne graminicola (PRJNA411966) and Meloidogyne arenaria (PRJNA438575) (alternative assembly), and a bacteria-feeding Acrobeloides nanus (PRJEB26554).

The rest of the updates are:

Data and tools

We have re-ran our comparative genomics pipeline, constructing gene trees and finding orthologs and homologs. We have also reran newest InterProScan (5.30-69.0) and our cross-references pipeline for all our genomes.

We re-imported all public RNASeq data from our collaborators, and did a round of minor improvements to the displays. We now use information from PubMed to let you find data sets corresponding to a publication of interest.

Archived gene IDs

New sequencing technologies let labs construct better genome assemblies, bringing access to chromosome level assembly data even to relatively small research communities. We are excited to see this trend. Each genome update brings new evidence and potentially unlocks research into previously forbiddingly difficult biological questions.

At the same time, insights gathered in work published using previous assemblies should stay accessible to the community, so there is a need to connect different assemblies with each other. As of this release, WormBase ParaSite will keep track of previous identifier versions at gene level and display annotation history. Authors of a genome update do not always provide a mapping to previous version, so we developed a pipeline to match up identifiers between genome versions.

Overview of new functionality

For an overview of how this now works consider Smp_340760, a Schistosoma mansoni gene. The gene model was revised in the past, twice: it used to be called Smp_044010, but in “Schisto_7.1” version of the annotation that we published in WBPS11 the authors changed gene structure enough that they decided to assign it a new identifier, and in “Schisto_7.2” which we publish now the gene model was corrected slightly.

Searching by Smp_044010 now leads to a page explaining that the identifier was deprecated and redirecting to Smp_340760. Over there, the history is represented by a diagram:

SchistosomamansoniPRJEA36577_Smp_340760

The site also displays previous protein sequences of transcripts, to help you carry forward any conclusions based on the previous gene model – the less the sequence has changed, the more similar results will be for e.g. BLAST matches.

ID mapping pipeline

We used authors’ mappings between annotation versions for updates of Schistosoma mansoni and Hymenolepis microstoma. Everywhere else we used an automated mapping pipeline, adapted from Ensembl gene build.

Pipeline description

The pipeline runs a sequence matching tool exonerate, scoring matches of exons between the two assemblies and propagating the scores onto the
transcript and gene level. The scores are then adjusted based on synteny – if a gene A is near gene B in the previous genome, A is mapped to A’ in the new genome, and there is a gene B’ near A’, the match of B to B’ is strengthened. Finally, best matches are iteratively taken out of the scoring, producing a list of pairs.

Results and benchmarking

We find the pipeline to be quite conservative even after we relaxed a few parameters around minimal match scores and similar values.  Typically only between a third and two thirds of the genes in the updated genome have a related past identifier:

Genome previous genes total previous version mapped new genes total fraction mapped
Ancylostoma ceylanicum PRJNA72583 11783 WBPS11 7564 15892 0.476
Ascaris suum PRJNA62057 17974 WBPS9 9468 15260 0.620
Fasciola hepatica PRJEB25283 16806 WBPS10 7564 22676 0.334
Haemonchus contortus PRJEB506 19430 WBPS10 11439 21869 0.523
Meloidogyne incognita PRJEB8714 45351 WBPS10 11977 19212 0.623

We also ran the automated pipeline on the S. mansoni WBPS10->WBPS11 update, comparing the results to a manual mapping obtained by annotators tracking individual identifiers. Our pipeline carried forward 5165 genes that authors considered to have none or minor changes, and 1347 genes with larger changes, onto an annotation with 10172 genes. The pipeline missed 2584 genes present somewhere in the manual mapping that were lost in the automatic one. It disagreed with the manual mapping in only 199 cases: some were genuinely wrong calls, and some are some were on par with manual mapping by being e.g. a mapping to a paralog gene.

Website maintenance 4th Dec 2018

Update: The task has not been finished yet and may take a couple of more hours. We expect to complete the task by 10pm UK Time. Thank you for your cooperation.

Please note that we are going to perform a server maintenance for the website on Tuesday 4th Dec 2018 from 2pm to 5pm (UK Time). During this period, you will not be able to sign in and use tools on the website including BLAST and VEP. We are sorry for the inconvenience this may cause.

 

WormBase ParaSite Release 18: Our Biggest Update Yet

We are thrilled to announce the release of WormBase ParaSite 18 (WBPS18), our most comprehensive update yet, with the largest number of new genomes and updated annotations since the resource was first launched! WBPS18 hosts 240 different genomes from 181 species.

Highlights

  • Addition of 37 new genome assemblies for 14 existing and 12 new species, including a massive update of the trematode flukes genomes.
  • Assembly and/or annotation updates for 3 genomes.
  • Pairwise whole genome alignments are now available between all genomes in taxonomic groups.
  • Integration of over 2 million AlphaFold 3D protein structures for 239 genomes.
  • Updated RNA-Seq pipeline.
  • Release 17 remains accessible.
  • Mailing list.

New Species

  • Aphelenchoides besseyi is a plant-parasitic nematode that causes root-knot disease in a variety of crops.
  • Aphelenchoides bicaudatus is a plant-parasitic nematode that causes damage to a variety of crops, including potatoes, tomatoes, and strawberries.
  • Aphelenchoides fujianensis is a plant-parasitic nematode that causes damage to a variety of crops, including rice, wheat, and soybeans.
  • Caenorhabditis auriculariae is a free-living nematode that is used as a model organism in scientific research.
  • Meloidogyne chitwoodi is a plant-parasitic nematode that causes root-knot disease in a variety of crops.

New Trematodes (Flukes)

Genome Updates

Trematodes (Flukes)

Pairwise Whole Genome Alignments

In WBPS18, users can now browse and visualise whole genome alignments between genomes in the same taxonomic group. Cactus, a next-generation aligner that stores whole-genome alignments in a graph structure, was utilised to calculate multiple pairwise genome alignments between WBPS genomes. These genomes have been assigned to 5 big taxonomic groups, and pairwise alignments can be viewed between the genomes of each group.

Taxonomic Group IdentifierGenomes
Anc001Platyhelminthes
Anc005Clade I nematodes
Anc020Clade III nematodes
Anc028Clade V nematodes
Anc029Clade IV nematodes

These alignments are viewable through the Ensembl Genome Browser. Follow this tutorial to learn how to browse these alignments.

Fetching genome alignments is a resource-consuming process, so please be patient while you are trying to load the alignment tracks and hit the refresh buttons in case any errors appear.

Ensembl Genome Browser view with pairwise alignment tracks in WBPS18. The view focuses on the genomic region of the Caenorhabditis parvicauda CSP21.g2830 gene. Aligned genomic regions of other Caenorhabditis and Pristionchus genomes have been loaded (pink tracks).

The whole genome alignments are also available for download in HAL file format, through our FTP site.

Full integration of AlphaFold models

In July 2022, AlphaFold predicted structures for almost every catalogued protein known to science.

In release 18, WormBase ParaSite incorporated these models. Now, over 2 million 3D protein structure models from AlphaFold are available for 239 genomes species. The AlphaFold predicted model (example) is available through the top-left menu on the transcript pages.

Follow this tutorial to learn how to browse your favourite protein models in WormBase ParaSite.

AlphaFold widget in WBPS18. The protein shown here corresponds to the M3Y97_00308000 gene of the newly added Aphelenchoides bicaudatus genome.

Updated RNA-Seq data

From release 16, due to the retirement of the RNASeq-er API, RNASeq data of genomes that had assembly or annotation updates were not re-aligned. For 34 updated genomes (releases 16 and 17), this affected their RNA-Seq alignment tracks available on the Ensembl Browser, Jbrowse and their gene counts on the Gene expression platform.

For this release, we utilised the novel RNA-Seq pipeline developed by the Ensembl Metazoa group to re-align the RNA-Seq data for these 34 genomes. The pipeline is using HISAT2 to perform the read alignments and htseq-count to count reads in features.

If you would like a new gene expression study to be included, please contact us.

“How can I browse RNA-Seq data in WormBase ParaSite?”

  1. Browse our gene expression pages (use the “Gene Expression” button in the “Navigation” box on the genome landing page of each genome) to explore the gene expression of your gene of interest across different samples and perform differential expression analysis.

  1. Use Jbrowse genome browser, to visualise RNA-Seq samples as genome browser tracks.
  1. Use Genome Browser (Ensembl), to visualise RNA-Seq samples as genome browser tracks.

Archiving Site

Our previous Release 17 has been archived and remains accessible here.

Subscribe to our new mailing list

If you’re using WormBase ParaSite, then you should subscribe to our mailing list so you can frequently receive updates on WormBase Parasite new releases, like this one, new features, career opportunities, and upcoming conferences and workshops.

Subscribing to the WormBase Parasite mailing list is easy! Simply fill out the subscription form at https://forms.gle/gN5NqZJ3FeVDkpTA8 and stay updated with our latest news.

Problems/issues

If you encounter any problems with latest release, including broken links or missing genes, please don’t hesitate to contact us.

Announcing WormBase ParaSite release 17

Have you been waiting for our next release? The wait is finally over! Despite being understaffed and underfunded, WormBase ParaSite launches its new release 17 with an exciting list of new/updated genomes and new features:

  • Integration of AlphaFold 3D protein structures for 8 species.
  • Addition of 11 new genome assemblies of which 6 are new species.
  • Annotation updates for 2 genomes.
  • Gene-phenotype associations are now available in our FTP directory.
  • Improvements in the way external gene synonyms are integrated and displayed.
  • Deployment of WebApollo instances for more species to further facilitate community curation.

New Species

Angiostrongylus vasorum – a clinically important parasitic nematode living in the arteries and heart of several canid species, including domestic dogs (from Tayrov et al., 2021).
Cercopithifilaria johnstoni – a filarial nematode transmitted by hard ticks to infect a broad native Australian murid and marsupial hosts (from McCann et al., 2021).
Fasciolopsis buski – a large fluke that infects the small intestine of humans and pigs in East/Southeast Asia (from Choi et al., 2020).
Gyrodactylus bullatarudis –  monogenean parasite of the guppy fish (from Konczal et. al, 2020).
Halicephalobus spNKZ332 – small, parthenogenic clade IV nematode isolated from termites in Japan (from Ragsdale et al., 2019).
Heterodera schachtii – a.k.a the beet cyst nematode, is a plant pathogenic parasite which can infect more than 200 plants including the model plant Arabidopsis thaliana (from Siddique et al., 2021).


Assembly Updates

Schistosoma mansoni
We are proud to present the best ever S. mansoni assembly created until the next one! S. mansoni (blood fluke) is one of the three major infectious agents responsible for the chronic debilitating disease schistosomiasis found throughout Africa and South America. Its previous assembly (v7) was substantially upgraded following the incorporation of HiC data, and further PacBio analysis to resolve repeats leading to the new, near-complete chromosomal assembly (v9) presented in this release! More information can be found in this pre-print by Buddenborg et al 2021.

Fasciola hepatica
This sheep liver fluke or common liver fluke, is a parasite that infects humans, cows and sheep, causing fascioliasis. The previous assembly and annotation which were submitted in 2013 were drastically improved in this release as described by McNulty et al., 2017.

Heterodera glycines
The soybean cyst nematode (SCN), Heterodera glycines, is a plant-parasitic nematode infecting soybean roots. The previous assembly and annotation which were submitted in 2013 were drastically improved in this release as described by Masonbrink et al., 2021.


Annotation updates


AlphaFold 3D protein structures are now browsable from WormBase ParaSite

For the first time in WormBase ParaSite, you can browse 3D protein structures visualised in a user-friendly and fun-to-explore viewer! Users can now explore the 3D protein models of their favourite genes in 8 WBPS species:

SpeciesPredicted structuresLinks
Brugia malayi8,743WormBase Parasite example
Caenorhabditis elegans19,694WormBase ParaSite example
Dracunculus medinensis10,834WormBase ParaSite example
Onchocerca volvulus12,047WormBase ParaSite example
Schistosoma mansoni13,865WormBase ParaSite example
Strongyloides stercoralis12,613WormBase ParaSite example
Trichuris trichiura9,564WormBase ParaSite example
Wuchereria bancrofti12,721WormBase ParaSite example
Table 1. Number of structural predictions for complete proteomes of parasitic worms in AlphaFold DB v.2.1.2 and WormBase ParaSite 17.

Want to know more about how to visualise AlphaFold Protein Structure in WormBase ParaSite?
The page containing the 3D protein structure for a protein of interest is located under the transcript summary page on the left-side “Transcript-based displays” menu under “AlphaFold predicted model”. For a step-by-step tutorial click here. You can now view the shiny new interactive 3D AlphaFold structure for our protein of interest:

AlphaFold predicted model page for S. mansoni transcript Smp_170450.1 in WormBase ParaSite 17

“What functionalities does the 3D protein structure viewer offer?”

The central panel (viewer) annotates the model with regions of high confidence (blue) to low confidence (orange) with its protein sequence displayed above. It’s very simple to use it: Just drag and drop with your mouse pointer to rotate the stucture and scroll to zoom in and zoom out! You can rapidly zoom in a specific residue by clicking on it in the protein sequence above the model. The right hand panel enables highlighting of one or more exons and protein features (Gene3D, PROSITE, Pfam, etc) which are controlled by clicking on the eye icon.

“More species please?”

As AlphaFold plans to expand their database in 2022 to cover additional proteomes of more species, as well as a much larger proportion of all catalogued proteins, we anticipate that more parasitic worms will make it into the AlphaFold database. WormBase ParaSite will then be able to enable “AlphaFold predicted model” viewer for these species. You can monitor the list of species present in the AlphaFold database here.


Phenotypes available on the FTP

“Is there any way to export gene-phenotype associations from WormBase ParaSite?”

In our previous release 16 we were happy to announce the import of over 350,000 C. elegans and S. mansoni gene-phenotype associations from our sister site, WormBase (C. elegans example). These associations were also propagated between orthologs to all our hosted species (H. polygyrus example). In this release we also made these gene-phenotype associations available in our FTP directory.

Gene-Phenotype associations have been deposited in GAF version 2.1 files in our FTP directory. For each species you will find 2 different GAF files:

  1. <SPECIES>.<BIOPROJECT>.WBPS17.orthology-inferred_phenotypes.gaf.gz (Example): This file contains species-specific orthology inferred gene-phenotype associations from C. elegans or S. mansoni (you can find which one in the 8th column). A file like this is availabe for every species in WormBase ParaSite.
  2. <SPECIES>.<BIOPROJECT>.WBPS17.phenotypes.gaf.gz (Example): This file contains original gene-phenotype associations for the species of interest. For the moment we only host original gene-phenotype association data for C. elegans and S. mansoni and therefore this file is available for these 2 species.

Sometimes GAF files are hard to interpret. For this reason, in the header of these files, we have included very useful column descriptions and general information. Enjoy!

Curated gene synonyms

“Gene synonyms are really useful, but is there any way to export them?”

In our release 16 we announced the import of literature-curated gene name synonyms for Strongyloides stercoralis. These synonyms are searchable (via the top-right search box) and appear in the new “Synonyms” line of the gene page. In this release we also made these synonyms exportable via WormBase ParaSite Biomart!

To export curated gene synonyms for S. stercoralis: First navigate to the WormBase ParaSite Biomart and submit your Query Filters. Then enable the “Curated Gene Synonym ID” under the “EXTERNAL DATABASE REFERENCES AND ID CONVERSION” of the Output Attributes like shown here:

Then click Results and you will get a list of your selected genes and their curated synonyms:

We need your help on this!
At the moment, we have only imported External Curated Gene synonyms for S. stercoralis but we would love to import more synonyms for other species. For this reason we need your help! If you have (or someone you know has) curated gene synonyms for any of the genomes we are hosting in WormBase ParaSite please contact us ([email protected]) and we will be able to integrate them.

Community Annotation

“I would like to manually curate gene models on a genome I have previously submitted (or I am about to submit) to WormBase ParaSite.”

We supplement our in-house gene curation platform by hosting Web Apollo instances for an increasing number of genomes. Web Apollo is an instanteneous, collaborative genomic annotation editor available on the web.
Users can request relevant Web Apollo instances to be deployed from us. We would be happy to provide the relevant training! Please feel free to contact us ([email protected]) to make such a request.

AlphaFold added new 3D protein structures of parasitic worms

The latest AlphaFold database update in January 2022, added three-dimensional (3D) structures of complete proteomes for 27 new organisms relevant to neglected tropical diseases and antimicrobial resistance including 7 parasitic worms from WormBase ParaSite.

Determining the three-dimensional (3D) structure of a protein has been a computational challenge for decades, and can provide essential insights into the underlying mechanisms of the proteins’ functions. AlphaFold is an AI system, created in partnership between DeepMind and the EMBL-European Bioinformatics Institute (EMBL-EBI), that makes state-of-the-art accurate predictions of a protein’s structure from its amino-acid sequence. Launched in July 2021, the database initially released ~350,000 3D structures of the human proteome and other 20 biologically-significant organisms such as C. elegans, E. coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria.

AlphaFold’s latest release, announced on the 28th January 2022, focused on organisms with a UniProt reference proteome that are relevant to Neglected Tropical Disease or antimicrobial resistance. The selection of 27 new species was based on priority lists compiled by the World Health Organisation and included 7 parasitic worms (Table 1). AlphaFold predicted these structures based on their Uniprot reference proteomes, provided through WormBase ParaSite.

SpeciesPredicted structuresLinks
Brugia malayi8,743WormBase Parasite, AlphaFold
Dracunculus medinensis10,834WormBase Parasite, AlphaFold
Onchocerca volvulus12,047WormBase Parasite, AlphaFold
Schistosoma mansoni13,865WormBase Parasite, AlphaFold
Strongyloides stercoralis12,613WormBase Parasite, AlphaFold
Trichuris trichiura9,564WormBase Parasite, AlphaFold
Wuchereria bancrofti12,721WormBase Parasite, AlphaFold
Table 1.Structural predictions for complete proteomes of parasitic worms in AlphaFold DB v.2.1.2

As the majority of helminth proteins do not currently have data from direct protein characterisation studies, the prediction of their 3D structures will provide researchers with a powerful tool in predicting their mechanisms of function and their role within the cell. Scientists can also develop in silico screening assays against drugs that work with the protein’s unique shape.

Figure 1. AlphaFold predicted 3D structure of Schistosoma Mansoni’s Malate dehydrogenase. You can find it here.

The database is expected to grow further in 2022 and cover additional proteomes, as well as a much larger proportion of all proteins in Uniprot (UniRef90).

If you cannot find the AlphaFold predicted structure for the protein of interest of your favourite worm, here are some suggestions:

  • Multiple isoforms are not covered in AlphaFold DB, so make sure you are using the most appropriate protein from the reference proteome of your species.
  • Try searching by protein or gene name rather than specific UniProt accession.
  • Check if your protein is in the reference proteome of one of the covered organisms or in Swiss-Prot.
  • Proteins with high sequence similarity will most likely have identical 3D structure predictions. If you don’t see the sequence you are looking for, try searching for it using the EBI Protein Similarity Search tool against the sequences in the AlphaFold DB and/or using the WormBase ParaSite BLASTp tool against species which already have their proteins in the AlphaFold database. If the query sequence is not available then a structure prediction with a similar sequence to the query may be available.
  • Contact us! If your favourite species or protein is not available yet, keep watching for further announcements (EMBL-EBI news, EMBL-EBI Twitter, DeepMind Twitter) , or let us know.

Undoubtedly, AlphaFold opens new research horizons and we would like to encourage our users to go and explore this ground-breaking dataset (https://alphafold.ebi.ac.uk/) by searching and testing the 3D protein models of your favourite worm. Over time, we are planning to create a deeper integration of this dataset into WormBase ParaSite, so it will be easier to search, analyse and interpret.

We would love to hear the feedback of the helminth research community on the AlphaFold resource, the structure predictions, how you think WormBase ParaSite could facilitate your interaction with this unique dataset, or anything else. So please feel free to contact us ([email protected]).

About archive sites

We’re pleased to announce that we have introduced an archiving service.

From release 16 onwards, older WormBase ParaSite releases will remain available for browsing. We have introduced this service to help users in transitions between genome and annotation versions. Older, draft assemblies are increasingly being superseded by highly contiguous assemblies generated with modern sequencing technologies. We want to host this data where it is available, but recognise that direct replacement of older assembly versions can cause disruption to researchers.

In release 16, we have updated gene models for 12 assemblies and replaced one assembly, Clonorchis sinensis (PRJNA386618), with a more highly scaffolded version. All superseded data remains available on the release 15 archive site:

https://release-16.parasite.wormbase.org/index.html

Due to technical restrictions, the site has slightly reduced functionality compared with the live site. BLAST and VEP tools are not available. BioMart does remain available.

The search service is also not available. To navigate to a gene page, paste its ID directly into the URL. For example:

https://release-16.parasite.wormbase.org/Gene/Summary?g=maxplancki-mkr-S2-19.22-mRNA-1

Where “maxplancki-mkr-S2-19.22-mRNA-1” is a gene stable ID. For annotation updates, deprecated gene pages can also be accessed from the JBrowse genome browser on the live site:

In addition to the archive site, data from all previous releases remains available to download from our FTP site in perpetuity.