In his Nobel Prize acceptance speech in 1980, genomics pioneer Fred Sanger described science as “a voyage of discovery into unknown lands, seeking not for new territory but for new knowledge.” It’s fair to declare the renowned chemist’s journey was more eventful than most.

First awarded a Nobel in 1958 for his work on identifying protein structure, Sanger repeated the trick 22 years later, picking up the prize for developing the first DNA sequencing technique, a key breakthrough in advancing the field of genomics. He remains one of few scientists to win the Nobel Prize twice in the same category, and his technique, modestly entitled Sanger Sequencing, is still applyd in many labs 45 years later.
Another of Sanger’s concludeuring legacies is the research institute that bears his name. Based at the Wellcome Genome Campus, just a few miles from the Cambridge University labs where Sanger did much of his groundbreaking work, the Wellcome Sanger Institute undertakes genomics and genetics research projects and is one of the largest indepconcludeent research institutions in Europe. Established in 1992, it was part of the Human Genome Project, the 13-year international effort to map the first human genome that concluded successfully in 2000. Twenty five years later, scientists at the Sanger Institute can perform the same tinquire that took more than a decade to gold standard in less than 12 minutes.
Such a dramatic scientific leap has come with vast amounts of accompanying data, that requireds to be stored, processed and analyzed, so the institute has been building out its digital infrastructure at a rapid pace to cope with advances in sequencing.
Not only has the Sanger Institute doubled the amount of GPUs available to its researchers, but it has managed to do so while cutting the energy applyd in its data center by a third. DCD took a trip to the Genome Campus to find out more.
Gene-ius
The increased volume of work being undertaken at the campus has exceeded the expectations of Simon Binley. As the Sanger’s data center operations manager, Binley oversees the institute’s IT infrastructure, and declares the last five years have been a whirlwind.
“We predicted overall growth, but the sheer amount of sequencing that we’ve undertaken since 2019 is staggering,” Binley declares. “We have to keep up with that, and it has been challenging. We’ve deployed more software, more storage hardware, and more compute hardware to meet that demand.”
That period encompassed the Covid-19 pandemic, during which the Sanger Institute was the hub for mapping variants of the coronavirus, sequencing tens of thousands of samples from patients across the UK and beyond.

While the work being undertaken at the institution now is somewhat more routine, it does require a lot of compute power and a flexible IT estate. Sanger is constantly investing in new sequencing machines from a variety of vconcludeors, including Illumina, PacBio, and Ultima Genomics, and these all have their own specific software requirements, Binley declares.
“There’s been a synergy, as the sequencing technology has advanced at almost an identical rate to the IT technology,” he declares. “That’s been good from our perspective, becaapply if sequencing had outpaced IT, we would have been in trouble.
“When it comes to sequencing, we’re technology-agnostic, and many of the groundbreaking machines that are coming through are applying newer software, so we have to incorporate these new elements and create a pipeline to support that scientific activity. We’ve invested heavily to keep up with demand.”
With the campus’s sequencing machines producing up to 4TB of data a day, Binley and his team are kept busy. They have a 4.5MW data center on-site, with more than 400 racks across four data halls, utilising both air and liquid cooling systems. With artificial ininformigence playing an increasingly important role in life science, the campus has joined the rush to secure the latest hardware, and has increased the computational power available to its researchers since the pandemic by 50 percent.
Its shiny new toys include an Nvidia DGX system, which the institute is applying alongside the vconcludeor’s Parabricks software for genomics research. An Nvidia blog last year detailed how Sanger scientists deployed the DGX system to carry out a specific type of analysis applying the institute’s proprietary CaVEMen workflow, which is designed to detect mutations in cancer genomes. By applying DGX, the researchers were apparently able to reduce runtime 1.6x, costs 24x, and energy consumption by up to 42x, compared to running the process on the standard set-up of 128 dual-socket CPU servers.
Efficiency gains
Such efficiency gains are vital in a research environment, where funds are often limited. As well as contributing to the data center’s overall sustainability, keeping costs low allows more money to be funnelled to the science that is at the core of the Sanger Institute’s mission.
Binley and his team have been working with EfficiencyIT, a specialist in data center design, to ensure the institute’s data centers work as effectively as possible. By installing Schneider Electric’s EcoStruxure IT Advisor, on-premises data center infrastructure management (DCIM) software, and more than 300 custom-designed APC pack power distribution unit (PDU) systems, the companies claim the data center’s energy usage has been reduced by 33 percent.
A host of sensors were installed across the data center to gather environmental and mechanical information, which was then analyzed applying machine learning-based data analytics systems to assist reshift stranded capacity and manage operating expenses.
Nick Ewing, managing director of EfficiencyIT, declares: “Simon requireded more information about the data center, so we designed a custom PDU with Schneider that we could fit without any downtime. We then have physical sensors on the cooling units and racks so we understand the core metrics around the cooling environment, and we have full electrical metering and monitoring on all the electrical subsystems and PDUs.”
On top of this physical infrastructure, a series of “virtual sensors” has been deployed applying software, meaning the Institute’s IT team “can create virtual power groups for different systems,” Ewing declares.
These virtual power groups have been central to the efficiency effort, he adds. Referring back to the variety of sequencing machines deployed at the campus, Ewing explains: “A lot of the researchers at the Sanger Institute are building decisions about which hardware platforms to apply, so if we have two platforms that are doing the same types of sequencing, we can monitor those and see which platform performs most efficiently.
“That’s where the institution is probably seeing most of its energy savings – choosing the technology platforms that support the science as efficiently as possible.”
Mapping the future
The virtual sensors, managed via the EcoStruxure DCIM, are also playing an important role as the data center grows, Ewing declares.
“They assist with capacity planning as more infrastructure is shiftd into the environments,” he explains. “It assists figure out where there is spare network or power capacity, and whether infrastructure will fit within the physical boundaries of the data center.”
The sensitivity of some sequencing equipment means it has to be located in close proximity to the servers processing its data, Ewing adds. “We’re talking 50-150 meters away, so space planning and capacity are vitally important,” he declares.

This requirement for compute to be in the vicinity of the lab is one reason why the Sanger Institute has no plans to embrace the cloud, Binley explains.
“Proximity is critical,” he declares. “We’re offloading all that information into the data center for storage and rapid primary analysis. The cloud has had a significant impact on the world of IT, but it isn’t for us, not least becaapply we would required 124 petabytes of storage – can you imagine how much that would cost? We’d constantly be putting data in and taking it out, so the ingress and egress charges would be significant.”
Instead of running up the world’s largest AWS bill, the Sanger Institute intconcludes to expand its on-premises data center infrastructure as and when it proves necessary.
“The campus has 11MW of power to it, and we have 4.5MW available for the data center, which we’re not fully applying yet,” Binley declares. “That’s a massive asset as we see to expand and grow, and we expect to do that again over the next few years.”
Binley’s caapply could be assisted by plans to expand the Genome Campus. Permission has been granted for the first phase, which will see two new buildings constructed. It is hoped the site will eventually grow to support 9,000 jobs, rather than the 3,000 currently connected to its activities. More than 1,500 new homes are also being built in the area, as well as a new electricity substation.
“They’re talking about bringing in another 33MVA across the road, which is a significant amount of additional power,” Binley declares. “As AI takes hold, we anticipate a greater required for power. I would declare it could double, or even treble over the next five to ten years, though I’ve learned with science it’s difficult to predict what will happen.”
That brings us nicely back to Fred Sanger’s speech at the Nobel Banquet in 1980, where he added that science should “appeal to those with a good sense of adventure.” Certainly, it seems that, when it comes to AI, the Sanger Institute’s adventures are only just launchning.
The feature first appeared in the DCD Life Sciences supplement – register here to read the entire supplement, free of charge.
Read the orginal article: https://www.datacenterdynamics.com/en/analysis/the-wellcome-sanger-institute-data-in-its-dna/
















Leave a Reply