ProteinMPNN
by
dauparas

Description: Code for the ProteinMPNN paper

View on GitHub ↗

Summary Information

Updated 29 minutes ago

Added to GitGenius on May 15th, 2026

Created on May 26th, 2022

Open Issues & Pull Requests: 88 (+0)

Number of forks: 478

Total Stargazers: 1,786 (+0)

Total Subscribers: 27 (+0)

Issue Activity (beta)

Open issues: 44

New in 7 days: 1

Closed in 7 days: 0

Avg open age: 485 days

Stale 30+ days: 43

Stale 90+ days: 40

Recent activity

Opened in 7 days: 1

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 2.7 days

Mean response time: 78.1 days

90th percentile: 274.1 days

Tracked items: 36

Most active contributors

dauparas - 7 events, 3 issues
roccomoretti - 7 events, 6 issues
adrienchaton - 5 events, 2 issues
anar-rzayev - 4 events, 4 issues
kosonocky - 3 events, 2 issues

Related by overlapping contributors

Detailed Description

ProteinMPNN is a deep learning framework for protein sequence design based on protein backbone structures. The repository contains the implementation code for the ProteinMPNN paper and is primarily composed of Jupyter Notebooks. It enables users to generate novel protein sequences that are compatible with a given three-dimensional protein structure, a task central to computational protein engineering and design applications.

The core functionality is accessed through the main script protein_mpnn_run.py, which initializes and executes the model, supported by utility functions in protein_mpnn_utils.py. The repository provides multiple pre-trained model weights organized by architecture type. Full protein backbone models are available in vanilla_model_weights and soluble_model_weights directories with various version checkpoints. Additionally, CA only models are provided for cases where only alpha-carbon coordinates are available, which can be enabled via the ca_only flag. This flexibility allows users to work with different levels of structural information depending on their input data.

The codebase includes extensive helper scripts for common protein design workflows, such as parsing PDB files, specifying which protein chains should be designed versus kept fixed, constraining specific residue positions, applying amino acid biases, and implementing symmetry constraints through residue tying. The repository demonstrates these capabilities through eight numbered example scripts covering scenarios from simple monomers to multi-chain complexes and homooligomers. Additional examples show how to use the model for scoring only, loading sequences from FASTA files, generating unconditional sequence probabilities similar to position-specific scoring matrices, and incorporating PSSM bias during design.

Installation requires Python 3.0 or higher, PyTorch, and NumPy. The repository provides setup instructions including conda environment creation and PyTorch installation guidance. Output from the model includes multiple metrics: a score representing average negative log probability of designed residues, a global score averaging across all residues, identification of fixed versus designed chains, the model name used, git hash for reproducibility, random seed, sampling temperature, and sample numbering for multiple sequence outputs.

According to GitGenius activity tracking, the repository has moderate engagement with a median issue and pull request response latency of 65.7 hours, though mean latency is substantially higher at 1875.2 hours across 36 tracked items, indicating variable response times. The primary maintainer dauparas has logged 7 events, with roccomoretti and adrienchaton also contributing significantly at 7 and 5 events respectively. The repository shares contributors with other major computational biology projects including google-deepmind/alphafold3, indicating its position within the broader protein structure prediction and design ecosystem. GitGenius classifies the repository across multiple domains including protein design, sequence optimization, neural networks, machine learning, structure prediction, protein engineering, deep learning, bioinformatics, amino acid sequences, and computational biology, reflecting its multidisciplinary nature and relevance to contemporary protein science research.

ProteinMPNN
by
dauparas

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

ProteinMPNN
by
dauparasdauparas/ProteinMPNN

Repository Details

ProteinMPNN by dauparas

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

ProteinMPNN by dauparasdauparas/ProteinMPNN

Repository Details

ProteinMPNN
by
dauparas

ProteinMPNN
by
dauparasdauparas/ProteinMPNN