Four million people in Northern England rely on the services of Yorkshire Water for clean water and waste treatment, including the processing of 150,000 tons of sewage sludge each year. Part of that processing is done by microorganisms: through anaerobic digestion they can reclaim biosolids and convert them into renewable energy. Professor James Chong, a Royal Society Industry Fellow and microbiologist at the University of York, studies those microorganisms to understand how to make that process more efficient and reduce greenhouse gases that harm the environment. Working with Yorkshire Water, Chong’s group collected sixty gigabases (or sixty billion base pairs) of microbial DNA sequence and turned to his colleagues Dr. John Davey, Bioinformatician in the York Bioscience Technology Facility (BTF), and Dr. Peter Ashton, Head of the Genomics and Bioinformatics Laboratory in the BTF, for help in analyzing the data on high performance computing (HPC) clusters.
Using Oxford Nanopore sequencing, Ashton and his lab can sequence tens or hundreds of thousands of DNA base pairs in “long reads.” Davey then runs software to assemble the reads by joining overlapping pieces of sequence together. “We expect to find hundreds of different genomes in a digester sample, but older sequencing technology which produces very short reads hundreds of base pairs long typically produces assemblies with hundreds of thousands of pieces,” he explains. “With long reads, we typically get assemblies in thousands of pieces, making it much easier to identify the species in the digesters.” But those long reads generate huge datasets with heavy computational demands, especially large amounts of disk space. So the team turned to Cloud Technology Solutions (CTS), a Google Cloud Premier Partner based in the UK and offering cloud migration, transformation, Big Data and support services, to pilot their workflow on Google Compute Engine’s virtual machines (VMs). The collaboration with Google Cloud and GÉANT enables CTS to offer unique services to the European Research and Education community.