← Back

ESM-Drift

One-step protein backbone generation via drift-based generative modeling on ESMFold embeddings

Python
PyTorch
ESMFold
Generative Models
GitHub

Overview

ESM-Drift applies the Generative Modeling via Drifting framework to ESMFold's representation space. The goal: generate novel protein backbone structures in a single forward pass, without the many denoising steps required by diffusion-based methods like RFdiffusion or Chroma.

ESMFold (Meta FAIR) folds a protein sequence into a 3D structure by passing it through a protein language model trunk, producing a dense sequence embedding that encodes structural information. ESM-Drift treats this embedding space as the target for generative modeling — learn to map from noise to valid protein embeddings, then decode through ESMFold's folding trunk to recover 3D coordinates.

Drift models define a deterministic ODE that transports samples from a source distribution (Gaussian noise) to the data distribution (real protein embeddings). In the one-step limit, this collapses to a single neural network evaluation — significantly faster than iterative DDPM/DDIM-style sampling.

Method

  1. 1

    Encode real proteins

    Pass PDB backbone structures through ESMFold to extract the per-residue embedding vectors from the language model trunk. These form the training data distribution.

  2. 2

    Train the drift network

    A transformer-based network learns the drift velocity field: given a noisy embedding at time t, predict the direction toward a real protein embedding. Training uses flow-matching loss over interpolated (noise, data) pairs.

  3. 3

    One-step generation

    At inference, sample Gaussian noise and apply the drift network in a single pass to produce a protein embedding. The ODE integrator is replaced by a direct prediction, trading sample quality for speed.

  4. 4

    Decode via ESMFold

    The generated embedding is passed to ESMFold's structure module, which decodes it into 3D backbone coordinates (N, Cα, C, O atoms per residue). No inverse folding or sequence design step is needed for backbone-only generation.

Example Structures

The proteins below are from RCSB PDB and represent the compact, well-folded backbones ESM-Drift is trained to generate. Drag to rotate, scroll to zoom.