Faculty compile FDA database to track infectious disease mutations

Media Credit: Anthony Peltier | Staff Photographer

Raja Mazumder, a professor of biochemistry and molecular medicine, said the research team has been creating these datasets for nearly 15 years and is now ready to implement them in the public database.

A team of faculty members is compiling sequences of genes from infectious diseases into a database for the Food and Drug Administration to help researchers detect new mutations of viruses and create vaccines and tests.

Researchers from GW, Temple University and Embleema, a medical technology company that specializes in data management and analysis, received a $2 million contract in September to input sequences of viruses into a database that will annotate characteristics like their size, cleanliness and mutations. The team’s researchers said scientists can reference the database to match sequences they are studying with variants of viruses like COVID-19 to quickly detect new mutations and stay on top of vaccines and treatments.

Raja Mazumder, a professor of biochemistry and molecular medicine and the project team’s lead, said the team will develop standards for the database like the minimum size and the level of cleanliness of a genetic sequence that they will use to review the quality of bacteria and viruses that researchers and health care practitioners submit to the database.

“This project is geared mostly towards detection like which portion of the virus is circulating in a certain population, so you’ll get the sample, sequence it and then you’ll use this resource to find out what is it,” Mazumder said.

Mazumder said his team will then use these standards to create annotations on the virus samples that researchers submit to delineate the changes occurring in their genetic makeup and discrepancies between other viruses and bacteria.

He said the annotations will also indicate when and where the source took the sample, allowing the researchers to track diseases. He said researchers can analyze the data when trying to find connections to other outbreaks, like Salmonella or coronavirus.

“You have to have a reference database of pathogen genome that are correctly annotated and curated and so on so that once you have the sequences and you can use some piece of software to map it to what is known, so then you say ‘Oh look this is a SARS-CoV-2 strain’ or ‘This is a Salmonella strain that was actually found earlier in other outbreaks three months ago,’” Mazumder said.

He said the research team has been working to create these datasets and analyze the quality of submissions for nearly 15 years, and it is now ready to implement them in the public database. He said test manufacturers can now use the data to develop new diagnostic tests for viruses and bacteria.

“Our task is to create those reference data sets that people who developed diagnostic devices can use to test their device, but at the same time, FDA can use to check and see the submissions that are coming in,” Mazumder said.

Keith Crandall, a member of the team and the director of the Computational Biology Institute in the Milken Institute School of Public Health said the team’s work will mainly focus on the coronavirus, HIV, influenza and Salmonella to diversify the database with viruses and bacteria.

“We’re trying to make sure that we have data that includes both viruses and bacteria because of the differences in complexity of those genomes and size of those genomes to be sure we’re building tools in our database to accommodate a wide variety of sizes of genomic data,” Crandall said.

Crandall said similar databases will allow public health officials and the FDA to quickly analyze new viruses.

“If you have a new virus that all of a sudden starts killing people, the first thing you want to do is figure out what that is and to identify whether it’s something you’ve already seen before or whether it’s new and what it’s close to,” Crandall said.

Robert Chu, the CEO of Embleema, said his company works on the technical elements of the project like the algorithms and software of the database. He said his team relies on GW researchers to analyze the biological side of the project, like the genetic sequences in the viruses and the bacteria that Embleema will put into the database.

“We understand bioinformatics when it’s data, but we don’t understand the biology of all these viruses and bacteria,” Chu said. “This is where George Washington University is so powerful, so that’s why it’s such a good combination to go in front of the FDA and a successful one too.”

Chu said scientists and researchers can use the database to quickly search for specific viruses, specific mutations or even a particular outbreak once the team publishes the database publicly in summer 2022.

He said the group will initially focus on the coronavirus to track the virus’ mutations, like the Delta variant. He said the database – called FDA-ARGOS, Food and Drug Administration’s Database for Reference Grade Microbial Sequences – will monitor variants that affect the efficacy of vaccines.

“It’s very important to keep track of all those variants, all those mutations, variants, the same thing, and what ARGOS does is precisely keep tab of every variant that appears in the world, assess the impacts of the variants on vaccines,” Chu said.

Chu said Embleema joined the partnership with GW and Temple after the two universities had submitted the contract application early this year. He said the University has expertise in microbiology and genetics that sets the team apart from others that were also competing for the contract.

“When it comes to molecular biology, the ability to understand the biology behind the genomic sequencing and genomics, I think GW certainly has an edge, at least in our opinion,” Chu said.

Sarah Hendrick contributed reporting.

The Hatchet has disabled comments on our website. Learn more.