Bioinformatic Tool Finds 71% of Batten-causing Mutations in Key Protein Regions
A new bioinformatic tool called Aminode, developed to help researchers predict the potential outcome of genetic mutations, found that most disease-causing mutations occur in regions of proteins essential to their normal structure and function — including in Batten disease.
In fact, nearly three-quarters of the mutations known to cause Batten disease occur in these regions, an early analysis with the tool found.
Aminode was developed by a research team with the Jan and Dan Duncan Neurological Research Institute (NRI) at Texas Children’s Hospital to identify evolutionarily constrained regions (ECRs) in proteins, which are commonly found to be preserved among species and throughout time.
The tool was described in the study “Aminode: Identification of Evolutionary Constraints in the Human Proteome” that was published in the journal Scientific Reports.
Proteins, like other biological components, are under evolutionary pressure to acquire the characteristics they need to perform best. The coding sequences that are critical for proteins to perform their role — be it interactions with other molecules, or moving throughout a cell, or helping facilitate a reaction — are under particular constraints to remain unchanged, to maintain a high level of function.
ECRs, those regions that remain unchanged, can therefore help researchers understand which parts of a protein are most critical to its activity. Evaluating them can help predict the effects of mutations on the protein’s structure and function. However, this protein analysis strategy is not commonly used, because it’s time-consuming and requires specific bioinformatic skills.
Aminode, “a user-friendly webtool for the routine and rapid inference of ECRs,” was developed to facilitate ECR analysis.
Aminode holds data on all human proteins, as well as protein data collected from 62 additional vertebrate species. This provides a database of ECR profiles of human proteins and their respective relative rates of amino acid substitutions during evolution.
To validate the tool, the researchers evaluated the distribution of previously reported genetic mutations known to cause protein dysfunction. They found that 67% of disease-causing mutations occurred within ECRs, whereas this was only true for 41% of non-harmful variants. That means that most mutations that cause problems occur in the regions that are most important for the protein’s function.
The researchers found a similar distribution pattern when they analyzed genetic variants of the transcription factor EB (TFEB), an important regulator of molecular pathways associated with Batten disease. In this analysis, 71% of known Batten-associated genetic variants were found to fall in ECRs, which overlapped with essential functional protein structures, compared to 21% of non-disease associated variants.
In conclusion, the researchers believe that Aminode and its ECR analysis “may help evaluate the potential pathogenicity [harmful potential] of variants of unknown significance.”