Researchers at the University of Glasgow in the UK noted that most emerging infectious diseases of humans such as COVID-19 are zoonotic — caused by viruses originating from other animal species.
Advertisement
Identifying high-risk viruses earlier can improve research and surveillance priorities.
However, identifying zoonotic diseases prior to emergence is a major challenge because only a small minority of the estimated 1.67 million animal viruses are able to infect humans.
To develop machine learning models using viral genome sequences, the researchers first compiled a dataset of 861 virus species from 36 families.
Related Articles
Advertisement
Machine learning is the study of computer algorithms that can improve automatically through experience.
The researchers applied the best-performing model to analyse patterns in the predicted zoonotic potential of additional virus genomes sampled from a range of species.
The study, published in the journal PLOS Biology, found that viral genomes may have generalisable features that are independent of virus taxonomic relationships and may preadapt viruses to infect humans.
The researchers were able to develop machine learning models capable of identifying candidate zoonoses using viral genomes.
The researchers noted that these models have limitations, as computer models are only a preliminary step of identifying zoonotic viruses with the potential to infect humans.
Viruses flagged by the models will require confirmatory laboratory testing before pursuing major additional research investments, they said.
While these models predict whether viruses might be able to infect humans, the ability to infect is just one part of broader zoonotic risk, according to the researchers.
This risk is also influenced by the ability of the virus to transmit between humans, and the ecological conditions at the time of human exposure, they said.
“Our findings show that the zoonotic potential of viruses can be inferred to a surprisingly large extent from their genome sequence,” the authors of the study noted.
“By highlighting viruses with the greatest potential to become zoonotic, genome-based ranking allows further ecological and virological characterisation to be targeted more effectively,” they added.
Simon Babayan from the University of Glasgow noted that a genomic sequence is typically the first, and often only, information on newly-discovered viruses.
“The more information we can extract from it, the sooner we might identify the virus’ origins and the zoonotic risk it may pose,” Babayan said.
“As more viruses are characterised, the more effective our machine learning models will become at identifying the rare viruses that ought to be closely monitored and prioritised for preemptive vaccine development,” he added.