Scientists have used artificial intelligence (AI) to detect a new class of mutations behind autism spectrum disorder. Many mutations in DNA that contribute to disease are not in actual genes but instead lie in the 99 per cent of the genome once considered "junk."
Even though scientists have recently come to understand that these vast stretches of DNA do in fact play critical roles, deciphering these effects on a wide scale has been impossible until now.
Using AI, a research team led by Princeton University in the US has decoded the functional impact of such mutations in people with autism.
The researchers believe this powerful method is generally applicable to discovering such genetic contributions to any disease.
Published in the journal Nature Genetics, the researchers analysed the genomes of 1,790 families in which one child has autism spectrum disorder but other members do not.
The method sorted among 120,000 mutations to find those that affect the behaviour of genes in people with autism. Although the results do not reveal the exact causes of cases of autism, they reveal thousands of possible contributors for researchers to study.
Much previous research has focused on identifying mutations in genes themselves. Genes are essentially instructions for making the many proteins that build and control the body. Mutations in genes result in mutated proteins whose functions are disrupted.
Other types of mutations, however, disrupt how genes are regulated. Mutations in these areas affect not what genes make but when and how much they make.
Until now, it was not possible to look across the entire genome for snippets of DNA that regulate genes and to predict how mutations in this regulatory DNA are likely to contribute to complex disease, the researchers said.
The study is the first proof that mutations in regulatory DNA can cause a complex disease.
"This method provides a framework for doing this analysis with any disease," said Olga Troyanskaya, professor and a senior author of the study. The approach could be particularly helpful for neurological disorders, cancer, heart disease and many other conditions that have eluded efforts to identify genetic causes.
"This transforms the way we need to think about the possible causes of those diseases," said Troyanskaya.
Most previous research on the genetic basis of disease has focused on the 20,000 known genes and the surrounding sections of DNA that regulate those genes. However, even this enormous amount of genetic information makes up only slightly more than one per cent of the 3.2 billion chemical pairs in the human genome.
The other 99 per cent has conventionally been thought of as "dark" or "junk," although recent research has begun to disrupt that idea.
The research team offers a method to make sense of this vast array of genomic data.
The system uses an AI technique called deep learning in which an algorithm performs successive layers of analysis to learn about patterns that would otherwise be impossible to discern.
In this case, the algorithm teaches itself how to identify biologically relevant sections of DNA and predicts whether those snippets play a role in any of more than 2,000 protein interactions that are known to affect the regulation of genes.
The system also predicts whether disrupting a single pair of DNA units would have a substantial effect on those protein interactions.
The algorithm "slides along the genome" analysing every single chemical pair in the context of the 1,000 chemical pairs around it, until it has scanned all mutations, Troyanskaya said. The system can thus predict the effect of mutating each and every chemical unit in the entire genome.
It reveals a prioritised list of DNA sequences that are likely to regulate genes and mutations that are likely to interfere with that regulation.