In the half century since 1963, when Ramachandran published his seminal study of conformational preferences in the protein backbone, massive improvements in the ability to predict small molecule and protein conformations have been made. Initially, the bulk of work in the field of computational modeling and design was focused on developing a reliably predictive energy function that is derived from the fundamental physics of molecular interaction. The energy function is the calculation at the core of any simulation of a molecule, so a predictive function forms the basis of any useful simulation. This function was combined with approaches such as energy minimization, molecular dynamics and Monte Carlo, to 1) reproduce the rapidly expanding pools of publicly available experimental X-ray crystal structures and of increasingly-precise quantum mechanical predictions and 2) make prospective predictions (binding energies to screen lead compounds, structures to generate lead compounds or understand SAR, dynamics to understand target molecules in physical detail thus generating better leads).
As drug discovery has historically been focused on small molecules, most efforts up through the 1990s were focused on small molecule elaboration. A satisfactory energy function for the protein receptor was also necessary, but there was only a modest amount of work directed toward actual redesign of the protein receptor itself.
The relative inattention to potentials and methods that could allow redesign of proteins themselves started to change rapidly in the late 1990s. Astounding advancements in our understanding of biologics (proteins as drugs), including the identification and optimization of monoclonal antibodies, opened new paths to drug discovery. Subsequently, an increasing number of biologic drugs were identified and brought to market, many of them major blockbusters. Biologic drugs now account for roughly a third of all commercial pharmaceutical revenues–seven of the top ten best-selling drugs in 2017–and about half of commercial research budgets, and those fractions are expected to continue to increase.
With the heightened focus on redesigned proteins as drugs, diagnostics and enzymes, there is a concomitant increased need for computational tools to aid in protein design. While the tools developed over the decades for small molecule design provide a starting point, protein design presents its own challenges that require new scoring functions and approaches. In particular, protein design requires that we be able to sample very large numbers of possible changes in both sequence and conformation (typically hundreds of thousands, and frequently millions) in order to reliably characterize potential designs.
Standard approaches developed for small molecule design (e.g. minimization, molecular dynamics, free energy calculations), which are rooted in a high resolution physics-based energy function and (typically) an all atom explicit solvent representation of the system, are simply not practical for reliable protein design. They are computationally too costly to allow examination of the necessary numbers of changes necessary to derive satisfactory search convergence. Most widely-used molecular design platforms—including all of the popular commercial platforms outside of Rosetta/Bench—therefore have limited ability to carry out robust protein redesign.
Rosetta/Bench is different. The scoring function reflects not only the core physics of the traditional energy function, but also incorporates terms that have been inferred from careful statistical analysis of the Protein Data Bank (Alford, JCTS, 2017). The resulting function is thus a hybrid physical/statistical scoring function which– through careful calibration–obviates the need for explicit solvent and is predictive even for reduced atomic representations. The scoring function in Rosetta / Bench is coupled with sophisticated algorithmic methods to further improve calculation efficiency. The scoring function and algorithmic methods are integrated in a Monte Carlo approach, which allows additional improvements in efficiency, through use of finely tuned rotamer libraries and clever mutation moves. The implementation of Monte Carlo in Rosetta/Bench has been carefully coded to allow optimal parallelism.
The net result of these optimizations (scoring function, algorithmic methods, specialized Monte Carlo moves) is the ability to sample orders of magnitude more changes in sequence/conformational space than is possible using the approaches in most software packages or even laboratory screening/display technologies.
The overall approach in Rosetta/Bench has been widely validated in hundreds of publications from dozens of laboratories across many facets of protein design. Rosetta has repeatedly performed best in the bi-annual CASP competition, where participants are asked to predict protein structure from sequence (Song, Structure, 2013). Work performed using Rosetta has also been responsible for an impressive number of firsts in the field of protein design, including: the first design of a novel protein with a structure and sequence never observed in nature (Kuhlman, Science, 2003); the first design of a novel protein-protein interaction (Fleishman, Science, 2011); the first design of a novel small-molecule binding protein (Tinberg, Nature, 2013); the first design of a protein nanostructure (King, Nature, 2014); and the first design of a pH dependent antibody binder (Strauch, PNAS, 2014).