Rosetta is the leading protein modeling tool, with proven performance in protein structure prediction, protein-ligand docking, protein-protein docking, antibody modeling, and structure modeling with experimental information (X-ray, NMR, Cryo-EM). Rosetta algorithms are tested and refined on real experimental data, and have consistently delivered actionable and verifiable wet-lab results.
Protein Structure Prediction
Rosetta consistently outperforms the competition in protein homology modeling at the bi-annual CASP competition and the weekly CAMEO contest (Song, Structure, 2013). Rosetta was the first software package to consistently predict small protein structures “ab initio”, with no homology (Kim, Proteins, 2014).
Rosetta can be combined with experimental structural data to produce better structures, or new atomic-resolution structures that are otherwise impossible. Low-quality x-ray data can be used to produce high-resolution structures (Adams, Ann. Rev. Biophys. 2013), or sparse NMR data can be transformed into useful structures (Lange, PNAS, 2012).
Rosetta is the world leader in computer design of proteins, and has achieved a number of “firsts”, including the first full-computationally designed and experimentally verified protein and the first protein-binding protein designed in a computer.
Protein-ligand Interaction Design
Rosetta has proven the ability to re-design natural enzymes to act on novel substrates, even in cases where traditional in vitro evolution methods have failed (Liu & Nivon, PNAS, 2014) . Rosetta has also shown the ability to design nano-molar affinity small-molecule binders “de novo” into previously inactive scaffolds (Tinberg, Nature, 2013). More recently Rosetta has been used to build entirely new proteins with ligand binding activity and concomitant fluorescence, an “artificial GFP” (Dou, Nature, 2018).
Protein-protein Interaction Design
Rosetta is able to design novel target-protein-binding activity into a huge number of inactive protein scaffolds. The hemagluttinin (influenza virus) binder, HB36, was the first-in-class computationally designed nano-molar protein/protein binder (Fleishman, Science, 2011). Since then a BHRF1-protein binder was demonstrated using Rosetta protein engineering on a designed helical bundle (Procko, Cell, 2014), and more recently very high affinity IL2 and IL15 mimic proteins were designed and are being developed as candidate cancer therapeutics by Neoleukin (Silva, Nature, 2019).
Biologics Liability Design
Rosetta in Bench is able to design for protein stability or immunogenicity. Rosetta has demonstrated a number of methods to computationally stabilize candidate biologics, including some in clinical trials (Korkegian, Science, 2005). More recently a variety of methods of automated the process of enhancing protein stability in Rosetta (Lau, JBC, 2018). Rosetta has the built in ability to detect likely immunogenic epitopes using a machine learning algorithm, as well as the unique ability to computationally remove those epitopes while leaving wild-type protein activity intact (King, PNAS, 2014).
CORE TECHNOLOGY: ROSETTA
Rosetta has been developed over the past 16 years, beginning at the lab of Prof. David Baker at the University of Washington, and at over 30 other labs around the world. Rosetta began as a tool for protein modeling, combining knowledge-based and physical modeling approaches with a consistent focus on actionable experimental results. Over the last 10 years Rosetta has evolved into the world-leading tool for computational protein design.
Rosetta primarily uses Monte Carlo based sampling using knowledge of protein structure from the protein data bank (pdb). Protein backbones are primarily modeled using “fragments” derived from the pdb using an array of powerful bioinformatics tools, for example BLAST — all of which is built in to Cyrus Bench behind the scenes. Protein sidechains are modeled using the now 30-year-old “rotamer” concept, with constant refinement over the years.
Rosetta uses a combination of physical and knowledge-derived potentials to score proteins during modeling and design. For example, a physics-based coulombic potentials is employed, and statistically-derived hydrogen bonding potential. Each protocol is highly tuned on large amounts of experimental data, often with custom scoring methods, and has been tested multiple times in real experimental contexts. For example a method to predict mutational free energies (Kellogg et al, Proteins, 2011) is tuned on large experimental datasets before deployment in Bench.
THE ROSETTA DIFFERENCE
In the half century since 1963, when Ramachandran published his seminal study of conformational preferences in the protein backbone, massive improvements in the ability to predict small molecule and protein conformations have been made. Initially, the bulk of work in the field of computational modeling and design was focused on developing a reliably predictive energy function that is derived from the fundamental physics of molecular interaction. The energy function is the calculation at the core of any simulation of a molecule, so a predictive function forms the basis of any useful simulation. This function was combined with approaches such as energy minimization, molecular dynamics and Monte Carlo, to 1) reproduce the rapidly expanding pools of publicly available experimental X-ray crystal structures and of increasingly-precise quantum mechanical predictions and 2) make prospective predictions (binding energies to screen lead compounds, structures to generate lead compounds or understand SAR, dynamics to understand target molecules in physical detail thus generating better leads).
As drug discovery has historically been focused on small molecules, most efforts up through the 1990s were focused on small molecule elaboration. A satisfactory energy function for the protein receptor was also necessary, but there was only a modest amount of work directed toward actual redesign of the protein receptor itself.
The relative inattention to potentials and methods that could allow redesign of proteins themselves started to change rapidly in the late 1990s. Astounding advancements in our understanding of biologics (proteins as drugs), including the identification and optimization of monoclonal antibodies, opened new paths to drug discovery. Subsequently, an increasing number of biologic drugs were identified and brought to market, many of them major blockbusters. Biologic drugs now account for roughly a third of all commercial pharmaceutical revenues–seven of the top ten best-selling drugs in 2017–and about half of commercial research budgets, and those fractions are expected to continue to increase.
With the heightened focus on redesigned proteins as drugs, diagnostics and enzymes, there is a concomitant increased need for computational tools to aid in protein design. While the tools developed over the decades for small molecule design provide a starting point, protein design presents its own challenges that require new scoring functions and approaches. In particular, protein design requires that we be able to sample very large numbers of possible changes in both sequence and conformation (typically hundreds of thousands, and frequently millions) in order to reliably characterize potential designs.
Standard approaches developed for small molecule design (e.g. minimization, molecular dynamics, free energy calculations), which are rooted in a high resolution physics-based energy function and (typically) an all atom explicit solvent representation of the system, are simply not practical for reliable protein design. They are computationally too costly to allow examination of the necessary numbers of changes necessary to derive satisfactory search convergence. Most widely-used molecular design platforms—including all of the popular commercial platforms outside of Rosetta/Bench—therefore have limited ability to carry out robust protein redesign.
Rosetta/Bench is different. The scoring function reflects not only the core physics of the traditional energy function, but also incorporates terms that have been inferred from careful statistical analysis of the Protein Data Bank (Alford, JCTS, 2017). The resulting function is thus a hybrid physical/statistical scoring function which– through careful calibration–obviates the need for explicit solvent and is predictive even for reduced atomic representations. The scoring function in Rosetta / Bench is coupled with sophisticated algorithmic methods to further improve calculation efficiency. The scoring function and algorithmic methods are integrated in a Monte Carlo approach, which allows additional improvements in efficiency, through use of finely tuned rotamer libraries and clever mutation moves. The implementation of Monte Carlo in Rosetta/Bench has been carefully coded to allow optimal parallelism.
The net result of these optimizations (scoring function, algorithmic methods, specialized Monte Carlo moves) is the ability to sample orders of magnitude more changes in sequence/conformational space than is possible using the approaches in most software packages or even laboratory screening/display technologies.
The overall approach in Rosetta/Bench has been widely validated in hundreds of publications from dozens of laboratories across many facets of protein design. Rosetta has repeatedly performed best in the bi-annual CASP competition, where participants are asked to predict protein structure from sequence (Song, Structure, 2013). Work performed using Rosetta has also been responsible for an impressive number of firsts in the field of protein design, including: the first design of a novel protein with a structure and sequence never observed in nature (Kuhlman, Science, 2003); the first design of a novel protein-protein interaction (Fleishman, Science, 2011); the first design of a novel small-molecule binding protein (Tinberg, Nature, 2013); the first design of a protein nanostructure (King, Nature, 2014); and the first design of a pH dependent antibody binder (Strauch, PNAS, 2014).