ESM-Drift
One-step protein backbone generation via drift-based generative modeling on ESMFold embeddings
Overview
ESM-Drift applies the Generative Modeling via Drifting framework to ESMFold's representation space. The goal: generate novel protein backbone structures in a single forward pass, without the many denoising steps required by diffusion-based methods like RFdiffusion or Chroma.
ESMFold (Meta FAIR) folds a protein sequence into a 3D structure by passing it through a protein language model trunk, producing a dense sequence embedding that encodes structural information. ESM-Drift treats this embedding space as the target for generative modeling — learn to map from noise to valid protein embeddings, then decode through ESMFold's folding trunk to recover 3D coordinates.
Drift models define a deterministic ODE that transports samples from a source distribution (Gaussian noise) to the data distribution (real protein embeddings). In the one-step limit, this collapses to a single neural network evaluation — significantly faster than iterative DDPM/DDIM-style sampling.
Method
- 1
Encode real proteins
Pass PDB backbone structures through ESMFold to extract the per-residue embedding vectors from the language model trunk. These form the training data distribution.
- 2
Train the drift network
A transformer-based network learns the drift velocity field: given a noisy embedding at time t, predict the direction toward a real protein embedding. Training uses flow-matching loss over interpolated (noise, data) pairs.
- 3
One-step generation
At inference, sample Gaussian noise and apply the drift network in a single pass to produce a protein embedding. The ODE integrator is replaced by a direct prediction, trading sample quality for speed.
- 4
Decode via ESMFold
The generated embedding is passed to ESMFold's structure module, which decodes it into 3D backbone coordinates (N, Cα, C, O atoms per residue). No inverse folding or sequence design step is needed for backbone-only generation.
Example Structures
The proteins below are from RCSB PDB and represent the compact, well-folded backbones ESM-Drift is trained to generate. Drag to rotate, scroll to zoom.