-
Towards a Generative Evolution Machine with DPLM-Evo
Xinyou Wang*†,
Liang Hong*†,
Jiasheng Ye†,
Zaixiang Zheng¶,
Yu Li, Shujian Huang, Quanquan Gu.
ICML 2026. arXiv:2605.00182.
Presents DPLM-Evo, an evolutionary discrete-diffusion framework that explicitly models substitution, insertion, and deletion during denoising — aligning diffusion with how proteins actually evolve. A decoupled latent-alignment space enables indel-aware, variable-length generation, while a contextualised evolutionary noising kernel injects biologically informed mutation patterns. Achieves state-of-the-art mutation-effect prediction on ProteinGym (single-sequence) and supports simulated evolution and targeted post-editing of proteins.
discrete diffusion
protein evolution
indel-aware generation
mutation effect prediction
-
Protein Autoregressive Modeling via Multiscale Structure Generation
Yanru Qu*†,
Cheng-Yen Hsieh*,
Zaixiang Zheng,
Ge Liu, Quanquan Gu.
ICML 2026 Spotlight. arXiv:2602.04883.
Introduces PAR, the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction — akin to sculpting a statue from rough topology to fine detail. A flow-based backbone decoder turns the AR transformer's multi-scale embeddings into atom-level coordinates, and noisy context learning + scheduled sampling mitigate exposure bias. Enables flexible prompted / motif-scaffolding generation zero-shot, with favourable scaling behaviour.
autoregressive generation
multi-scale modeling
protein backbone design
motif scaffolding
-
An All-atom Protein Generative Model for Designing Protein Complexes
Ruizhe Chen*†,
Dongyu Xue*,
Xiangxin Zhou†,
Zaixiang Zheng,
Xiangxiang Zeng, Quanquan Gu.
ICML 2025. arXiv:2504.13075.
Introduces APM, an all-atom generative model purpose-built for protein complexes. By integrating atom-level information and training on multi-chain data, APM accurately models inter-chain interactions and can design binding-competent complexes from scratch. It unifies multi-chain folding and inverse-folding in one backbone, and supports both supervised fine-tuning and zero-shot sampling for downstream design tasks.
protein complexes
all-atom generative model
multi-chain modeling
binder design
-
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh*,
Xinyou Wang*†,
Daiheng Zhang†,
Dongyu Xue, Fei Ye, Shujian Huang,
Zaixiang Zheng¶,
Quanquan Gu.
ICML 2025 Spotlight. arXiv:2504.11454.
Systematically elucidates the design space of multimodal protein language models that tokenize 3D structure. Identifies tokenization loss and inaccurate structure-token prediction as the main bottlenecks, and proposes improvements across generative modeling, structure-aware architecture, and data. The resulting 650M DPLM-2.1 cuts PDB folding RMSD from 5.52 to 2.36, outperforming 3B baselines and matching specialized folding models.
multimodal protein language model
structure tokenization
folding
representation learning
-
DPLM-2: A Multimodal Diffusion Protein Language Model
Xinyou Wang†,
Zaixiang Zheng¶,
Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.
ICLR 2025 ByteDance Highlight. arXiv:2410.13782.
Extends DPLM into a multimodal model that jointly diffuses over protein sequence and 3D structure. DPLM-2 learns a unified distribution over sequence and structure, enabling simultaneous structure-sequence co-generation, structure-conditioned design (inverse folding), and sequence-conditioned folding within one pre-trained backbone — a single model for the full sequence-structure design cycle.
multimodal protein model
sequence-structure co-generation
discrete diffusion
inverse folding
-
ProteinBench: A Holistic Evaluation of Protein Foundation Models
Fei Ye*,
Zaixiang Zheng*,
Dongyu Xue*,
Yuning Shen*,
Lihao Wang*,
Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu.
ICLR 2025. arXiv:2409.06744.
ProteinBench is a holistic evaluation framework for protein foundation models, built on three pillars: (i) a taxonomy of tasks spanning the main protein modalities; (ii) multi-metric evaluation along quality, novelty, diversity, and robustness; and (iii) user-oriented analyses that expose current strengths and blind spots. Released with a public leaderboard, evaluation dataset, and modular toolkit as a living benchmark for the field.
benchmark
protein foundation models
holistic evaluation
leaderboard
-
Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization
Xiangxin Zhou*†,
Dongyu Xue*,
Ruizhe Chen*†,
Zaixiang Zheng,
Liang Wang, Quanquan Gu.
NeurIPS 2024. arXiv:2403.16576.
Casts antigen-specific antibody design as preference optimization over a pre-trained conditional diffusion model that jointly models antibody sequence and structure. AbDPO fine-tunes with a residue-level decomposed energy preference and uses gradient surgery to resolve conflicts between attractive and repulsive forces. Sets state-of-the-art on the RAbD benchmark, simultaneously lowering total energy and improving binding affinity.
antibody design
preference optimization
diffusion model
antigen binding
-
Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang*†,
Zaixiang Zheng*¶,
Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.
ICML 2024 ByteDance Highlight. arXiv:2402.18567.
Introduces DPLM, a versatile diffusion-based protein language model pre-trained on evolutionary-scale sequences. DPLM unifies protein representation learning and unconditional / conditional generation under a single discrete-diffusion objective, scales to billions of parameters, and enables controllable generation from arbitrary partial contexts without task-specific retraining — a foundational step toward general-purpose protein foundation models.
protein foundation model
discrete diffusion
representation learning
controllable generation
-
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye†,
Zaixiang Zheng¶,
Yu Bao, Lihua Qian, Quanquan Gu.
Preprint, arXiv:2308.12219. The first exploration of dLLM above 10B with discrete diffusion.
Demonstrates that diffusion language models can become strong general-purpose language learners once scaled. The recipe: acquire knowledge via masked-language-model pretraining, then reprogram the pretrained MLM into a diffusion LM through diffusive adaptation, followed by task- and instruction-finetuning. Instruction tuning elicits zero- and few-shot in-context learning and reasoning, making this the first demonstration of a competent diffusion LM above 10B parameters.
diffusion language model
instruction finetuning
in-context learning
non-autoregressive generation
-
Structure-informed Language Models Are Protein Designers
Zaixiang Zheng*¶,
Yifan Deng*†,
Dongyu Xue, Yi Zhou, Fei Ye, Quanquan Gu.
ICML 2023 Oral. arXiv:2302.01649.
Reframes protein inverse folding as structure-conditioned language modeling: a pretrained protein language model is lightly adapted with structural cues to directly generate sequences that fold to a given backbone. LM-Design sets new state-of-the-art on CATH benchmarks with a small fraction of prior compute, showing that strong sequence priors + minimal structural conditioning rival heavy structure-native models.
inverse folding
protein language model
structure-conditioned generation
parameter-efficient adaptation
-
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
Jiasheng Ye†,
Zaixiang Zheng¶,
Yu Bao, Lihua Qian, Mingxuan Wang.
TACL 2024 Oral @ ACL 2024 ByteDance Highlight. arXiv:2302.10025.
Diagnoses why continuous-embedding diffusion models struggle with discrete sequences — the scale of noise is decisive. DINOISER adaptively determines the range of sampled noise scales during training to counter discreteness, and amplifies inference-time noise scales so the model faithfully leverages source conditions. Consistent gains across conditional sequence-generation benchmarks.
diffusion model
sequence generation
noise schedule
conditional generation
-
Deep Equilibrium Non-autoregressive Sequence Learning
Zaixiang Zheng, Yi Zhou, Hao Zhou.
ACL 2023.
Views iterative non-autoregressive translation as seeking a fixed point of a state-update map, and models it with a deep-equilibrium (DEQ) layer. This gives constant-memory training and enables adaptive computation at inference, closing the gap with autoregressive NMT while preserving the parallel-decoding advantage.
non-autoregressive translation
deep equilibrium model
iterative refinement
adaptive computation
-
LAFT: Cross-lingual Transfer for Text Generation by Language-Agnostic Finetuning
Xianze Wu†,
Zaixiang Zheng¶,
Hao Zhou, Yong Yu.
INLG 2022 Oral Best Short-Paper.
Studies how to transfer a multilingual generation model to low-resource target languages without parallel data. LAFT finetunes with a language-agnostic objective that disentangles content from language identity, yielding consistent gains across summarization and data-to-text benchmarks.
cross-lingual transfer
text generation
low-resource nlg
language-agnostic finetuning
-
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT 2021
Lihua Qian*,
Yi Zhou*,
Zaixiang Zheng*,
Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou.
WMT 2021 — Rank #1 on German→English, beating strong autoregressive systems.
Our WMT'21 submission built on the Glancing Transformer for fully parallel (non-autoregressive) translation. To our knowledge the first parallel system scaled to a WMT-level setting, it achieves 35.0 BLEU on German→English — the top score in the task, outperforming all strong autoregressive counterparts.
non-autoregressive translation
glancing transformer
wmt 2021
parallel decoding
-
Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li.
NeurIPS 2021.
Introduces REDER, a reversible duplex sequence-to-sequence architecture in which the same network parameters can be executed forward or backward to translate in both directions. Shares parameters between source→target and target→source, improving data efficiency and enabling dual-direction cycle consistency within a single model.
reversible machine translation
duality
parameter sharing
cycle consistency
-
Vocabulary Learning via Optimal Transport for Neural Machine Translation
Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li.
ACL 2021 Oral Best Paper.
Recasts subword vocabulary construction as an optimal-transport problem that balances entropy and vocabulary size under a principled marginal-utility objective. VOLT yields a search-free algorithm that finds strong vocabularies in minutes — not hours of brute-force BPE sweeps — and transfers well across 40+ language pairs. ACL 2021 Best Paper.
subword vocabulary
optimal transport
neural machine translation
tokenization
-
Improving Self-Attention Networks with Sequential Relations
Zaixiang Zheng, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen.
IEEE/ACM TASLP 2020.
Injects explicit sequential-relation inductive biases — relative distance and local-order cues — into self-attention, complementing position embeddings. Consistent improvements on machine translation, language modelling, and NLU benchmarks, with minimal compute overhead.
self-attention
inductive bias
sequence modeling
position encoding
-
Towards Making the Most of Context in Neural Machine Translation
Zaixiang Zheng*,
Xiang Yue*†,
Shujian Huang, Jiajun Chen, Alexandra Birch.
IJCAI 2020 Oral.
A document-level NMT framework that jointly models each sentence's local context with the global context of the whole document, in both source and target sides. One unified model handles any document length — including isolated sentences — without separate sentence- vs. document-level training. Up to +2.1 BLEU over Transformer baselines, with benefit extending far beyond the usual two-or-three-sentence window.
document-level translation
context modeling
neural machine translation
long-range dependency
-
Mirror-Generative Neural Machine Translation
Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen.
ICLR 2020 Oral (8/8/8).
MGNMT unifies source-to-target and target-to-source translation along with both language models into a single mirror-symmetric latent-variable model. This joint generative formulation lets the model exploit non-parallel monolingual data from both sides and naturally supports semi-supervised learning, bidirectional decoding, and reranking in one framework.
generative machine translation
latent variable model
semi-supervised learning
bidirectional decoding
-
Dynamic Past and Future for Neural Machine Translation
Zaixiang Zheng, Zhaopeng Tu, Shujian Huang, Xin-Yu Dai, Jiajun Chen.
EMNLP 2019.
Extends past/future modelling in NMT with a dynamic capsule that adaptively segments translated versus untranslated content during decoding, instead of relying on a fixed split. Consistent BLEU gains and more interpretable coverage behaviour on multiple WMT language pairs.
neural machine translation
coverage modeling
capsule network
decoding
-
Modeling Past and Future for Neural Machine Translation
Zaixiang Zheng*,
Hao Zhou*,
Shujian Huang, Lili Mou, Xin-Yu Dai, Jiajun Chen, Zhaopeng Tu.
TACL 2018 (presented at ACL 2018).
Proposes to explicitly split the source representation at every decoding step into a past part (already translated) and a future part (still to translate), with recurrent update rules that preserve this bookkeeping throughout decoding. Reduces over- and under-translation and improves BLEU across multiple WMT benchmarks.
neural machine translation
coverage
decoding
sequence modeling