Proteins play an important role in all life processes. From catalyzing reactions to protecting our body to supporting cell structure, proteins have a wide variety of functions based on each specific protein’s structure. Naturally-occurring proteins are perfectly evolved for their specific functions in each organism. Synthetically designed proteins, however, have the potential to solve the multitude of global problems facing the world today; for example, engineered bacteria can make enzymes to decompose plastics and reduce landfill waste, or produce designer proteins that can harvest energy from sunlight for clean energy.
Direct experimental methods for designing synthetic proteins can be used for creating new proteins with the desired activities, but they are expensive and labor intensive. Another strategy is to employ computer simulations, which have the potential to greatly streamline the process and reduce costs. However, despite a number of successes, computational protein design software still frequently makes inaccurate predictions of protein structure and interactions.
To solve this problem, two Yale groups are combining their expertise in an interdisciplinary effort led by Corey O’Hern, Associate Professor of Mechanical Engineering & Materials Science and a physicist and materials scientist, and Lynne Regan, Professor of Molecular Biophysics & Biochemistry and Chemistry, a biochemist and biophysicist.
In a recent Protein Engineering, Design, and Selection paper published in July 2016, the team of researchers described a new computational model that helps solve the “repacking” problem, allowing them to accurately predict how each amino acid side chain fits into the core of a protein. Amino acids are the fundamental building blocks of proteins, so understanding how they are positioned within proteins is crucial to understanding protein structure. O’Hern explains, “It may sound trivial, but it is not because you have to try all side chain conformations to determine which one will fit. Our simple model performed as well as the state of the art software in repacking amino acid side chains.”
Other approaches include all possible energetic contributions to protein structure, such as steric interactions, electrostatic effects, van der Waals attractions, and hydrogen bonding. In contrast, the O’Hern and Regan team used a somewhat unconventional approach to modeling proteins by only considering steric interactions—repulsive forces that prevent atomic overlaps. In their approach, the amino acids are modeled as 3D puzzle pieces that are arranged to fit into the protein core without overlaps. The model can accurately predict how each amino acid must be positioned to best fit into the core, just like the way Tetris pieces in the 1980s video game need to be in certain orientations to tightly fit together and not overlap.
Regan explains, “Our intention was to determine how far we could go in protein structure prediction using the simplest model and only add in additional factors when the simplest model can no longer predict the experimentally observed data. That was our idea: a bottom-up approach rather than throwing everything in at the beginning.” Regan adds, “Surprisingly, we found that our model performs extremely well simply by avoiding steric overlaps. We didn’t need to explicitly put in any attraction or hydrogen bonding [or other factors].”
The team discovered that their simple model worked well on many more amino acids than they anticipated. Even so, they were able to identify its limits and simultaneously learn much about the dominant forces that determine protein structure. This point is well illustrated by comparing the two hydroxyl functional group-containing amino acids, threonine and serine, which are typically considered similar in biochemistry textbooks. Although the position of the threonine side chain can be predicted by steric interactions alone, inclusion of hydrogen bonding is required to correctly position the serine side chain. O’Hern and Regan propose that this is because the steric interactions of the additional methyl group on Thr are dominant.
The team has already expanded their original studies to successfully repack multiple amino acid side chains simultaneously and they are working on calculating the energetic cost of mutating amino acids in protein cores and at interfaces. The O’Hern and Regan team are poised to apply their novel approach and combined expertise to design proteins for sustainability, biomedical, and pharmaceutical applications.

