Search

Drugging The Undruggable

Art by Dahlia Kordit

Drug discovery is challenging. Drug-discovering pharmacologists often start by examining a protein that causes a disease. For the best insights, it’s often necessary for scientists to arduously solve the protein’s structure, using the predictable diffractions of light in a pure crystal of the protein to calculate the positions of atoms. The process is laborious as tens of conditions must be fine-tuned to grow a good crystal of the pure protein. When all goes right, by observing the new structure and drawing on chemical expertise, pharmacologists design compounds that interact with and fit in the binding pocket of the target protein. 

Unfortunately, it is difficult to learn about some proteins using this approach, as some proteins lack ordered, consistent structures. The calculations start to break down, and a three-dimensional structure solution gets messy. Many pharmacologists abandon targets at this point and look instead at upstream or downstream reactions. “Most diseases are caused by disordered proteins—not something that a pharma company would easily go after because they are large, disordered, and have no binding pockets,” said Pranam Chatterjee, assistant professor of biomedical engineering and computer science at Duke University.

The Chatterjee Lab aims to tackle this problem with artificial intelligence. In a study published in Science Advances, they introduced a new drug discovery pipeline, Peptide Prioritization via Contrastive Language-Image Pretraining (PepPrCLIP). PepPrCLIP can take the amino acid sequence of any target protein and predict what peptides, or short proteins, effectively bind the target. Pharmacologists can then modify the peptide so that it activates our innate targeted protein degradation pathway once bound to the disease-causing protein.

How can peptides be designed given only a simple protein sequence, without the key structural information telling what parts of the protein are exposed and likely to bind these peptides? The group came up with an idea: randomly generating naturalistic peptide sequences and predicting how they would interact with the target protein. 

Kalyan Palepu, co-first author of the publication, started from Meta’s protein language model, Evolutionary Scale Modeling 2 (ESM-2), which can represent each peptide with a vector, a group of numbers. The twenty canonical amino acids—the building blocks of peptides—are more likely to assemble in certain combinations due to their specific chemical properties. Therefore, ESM-2 vectors of natural peptides form clusters in probabilistic space. Palepu proposed that adding random noise to these vectors would generate a class of new, reasonably stable peptides. This forms the library of possible peptides for drug screening. 

Suhaas Bhat, the other co-first author, modified OpenAI’s Contrastive Language-Image Pretraining (CLIP) model to predict protein-peptide interactions. CLIP trains on image-caption sets, aligning an image and caption when they match up and contrasting them when they do not. Using the same rationale, Bhat trained the PepPrCLIP model on documented interactions between short peptides and target proteins, so it can predict the binding likelihood of a novel peptide to a protein from their sequences. 

Researchers then applied the model to several target proteins, from structured to disordered, and tested the predictions in cell cultures. Among the top hits, they found multiple peptides that effectively bound to the protein and targeted it for degradation. One example is SS18-SSX1, a disordered protein that results from a fusion of chromosomes 18 and X and drives a cancer of the connective tissue around joints called synovial sarcoma (SS). “A lot of the oncoproteins that we care about when trying to target cancer are unstructured and wiggly, so classical techniques aren’t going to work,” Bhat said. Nevertheless, the peptide generated from PepPrCLIP’s fourth hit reduced SS18-SSX1 levels in cells by over forty percent. This means that the new peptide effectively tags the SS18-SSX1 protein for degradation.

Instead of designing millions of options and testing them in the lab, PepPrCLIP allows researchers to assess most peptides computationally and then experiment with the top ten or twenty picks.

Members of the Chatterjee Lab also developed a protein language model that incorporated protein post-translational modifications (PTMs) called PTM-Mamba. PTMs are changes to a protein done after it is synthesized in the body, and they are important in physiological and disease pathways. For example, many enzymes are modified by the addition of a phosphate group. By replacing ESM-2 with PTM-Mamba, PepPrCLIP can design peptides that might selectively target the modified enzyme but not the unmodified version. 

PepPrCLIP is now open-source to academics, and the lab maintains a user-friendly web-based code environment for public access. Since its publishing, Chatterjee estimates, twenty labs have reported success with this model, and many more are learning to utilize it. They hope that artificial intelligence will help uncover many drugs that were once off-limits for biochemical discovery.