esm
by
facebookresearch

Description: Evolutionary Scale Modeling (esm): Pretrained language models for proteins

View facebookresearch/esm on GitHub ↗

Summary Information

Updated 14 minutes ago
Added to GitGenius on February 27th, 2026
Created on August 31st, 2020
Open Issues/Pull Requests: 115 (+0)
Number of forks: 785
Total Stargazers: 4,034 (+0)
Total Subscribers: 65 (+0)

Detailed Description

This repository, developed by the Meta Fundamental AI Research Protein Team (FAIR), houses code and pre-trained weights for a suite of advanced protein language models, collectively known as Evolutionary Scale Modeling (ESM). The primary purpose of ESM is to leverage the power of Transformer-based language models to understand and predict various aspects of protein structure, function, and evolution. The repository provides tools and resources for researchers and developers to explore and utilize these models for a wide range of applications in the field of protein science.

The core functionality of the repository revolves around several key models, each designed for specific tasks. The flagship models are the ESM-2 and ESMFold. ESM-2 is a state-of-the-art, general-purpose protein language model that excels in predicting protein structure, function, and other properties directly from individual amino acid sequences. It has demonstrated superior performance compared to other single-sequence protein language models across various structure prediction benchmarks. ESMFold, built upon the ESM-2 architecture, takes a protein sequence as input and generates accurate, end-to-end 3D structure predictions. This eliminates the need for computationally expensive and time-consuming experimental methods for structure determination.

Beyond ESM-2 and ESMFold, the repository also includes other notable models. The MSA Transformer is designed to extract embeddings from multiple sequence alignments (MSAs), enabling enhanced structure prediction capabilities. ESM-1v is specifically tailored for predicting the effects of amino acid sequence variations, allowing researchers to understand how mutations impact protein function. Finally, ESM-IF1 is an inverse folding model, which can be used to design sequences for given structures or predict the functional effects of sequence variation for given structures.

The repository's features extend beyond just providing pre-trained models. It offers a comprehensive set of tools and resources to facilitate model usage and exploration. This includes a quick-start guide, detailed instructions for getting started, and examples demonstrating how to load and utilize the models. The repository also provides command-line interfaces for tasks like bulk embedding extraction from FASTA files and efficient structure prediction using ESMFold. Furthermore, the repository supports CPU offloading for inference with large models, enabling users to run these models even on machines with limited GPU memory.

The repository also provides access to the ESM Metagenomic Atlas, a vast open resource containing predicted structures for hundreds of millions of metagenomic proteins. This atlas, updated regularly, offers a valuable resource for exploring the diversity of protein structures in the microbial world. The repository also includes code and resources for protein design, including code for "Language models generalize beyond natural proteins" and "A high-level programming language for generative protein design."

The repository is actively maintained and updated, with new models, features, and datasets being released periodically. The "What's New" section of the README highlights recent updates, including the release of new models, improvements to existing ones, and the expansion of the ESM Metagenomic Atlas. The repository also provides detailed documentation, including citations for the models and datasets, and a table of contents to help users navigate the available resources. The repository is licensed under a permissive license, allowing for broad use and adaptation. In summary, the Facebook Research ESM repository is a powerful and versatile resource for anyone interested in leveraging the power of language models to advance protein science, offering state-of-the-art models, practical tools, and a commitment to open access and ongoing development.

esm
by
facebookresearchfacebookresearch/esm

Repository Details

Fetching additional details & charts...