The jwohlwend/boltz repository is the official home for the Boltz family of biomolecular interaction models. Its primary purpose is to provide open-source access to cutting-edge deep learning models designed for predicting biomolecular interactions, specifically focusing on protein structure prediction and binding affinity. The repository offers both the code and pre-trained weights, making it freely available for academic and commercial use under the MIT license. This open-source approach aims to democratize access to advanced biomolecular modeling techniques, enabling researchers and developers to leverage these tools for various applications, including drug discovery and molecular design.
The core functionality of the repository revolves around the Boltz models, with the latest iteration being Boltz-2. Boltz-2 represents a significant advancement over its predecessor, Boltz-1, and other models like AlphaFold3. It goes beyond simple structure prediction by jointly modeling complex structures and binding affinities. This integrated approach is crucial for accurate molecular design, as it allows for a more holistic understanding of how molecules interact. A key advantage of Boltz-2 is its computational efficiency; it achieves accuracy comparable to physics-based free-energy perturbation (FEP) methods but runs approximately 1000 times faster. This speedup makes in silico screening practical for early-stage drug discovery, allowing researchers to quickly evaluate a large number of potential drug candidates.
The repository provides clear instructions for installing and using the Boltz models. Users can install the package via PyPI or directly from GitHub. The installation process is straightforward, with options for both CPU and GPU-accelerated versions. The repository emphasizes the use of a fresh Python environment to avoid potential conflicts. The core inference functionality is accessed through the `boltz predict` command, which takes an input YAML file (or a directory of YAML files for batch processing) specifying the biomolecules and desired predictions. The repository also provides detailed documentation on the input formats and available options, ensuring ease of use for researchers.
A key feature of the Boltz models is their ability to predict binding affinities. The output includes two primary fields: `affinity_pred_value` and `affinity_probability_binary`. These fields are trained on different datasets and serve different purposes. The `affinity_probability_binary` is designed to identify binders from decoys, useful in hit-discovery stages, while the `affinity_pred_value` aims to quantify the specific affinity of binders and how it changes with molecular modifications, suitable for ligand optimization. The repository provides guidance on how to interpret and utilize these outputs effectively.
The repository also emphasizes reproducibility and community engagement. While the evaluation and training code for Boltz-2 are still under development, the repository promises to provide evaluation scripts and structural predictions for Boltz-2, Boltz-1, and other models on relevant benchmark datasets. This will facilitate comparisons and encourage the adoption of Boltz models. The repository actively encourages contributions and collaboration through its Slack channel, fostering a community around the development and application of Boltz models. Furthermore, the repository acknowledges the use of NVIDIA cuEquivariance kernels for acceleration on NVIDIA GPUs and supports Tenstorrent hardware through a fork. Finally, the repository includes proper citation information for the Boltz papers and related resources, ensuring proper attribution for the work.