Zaixiang Zheng 郑在翔

Research Scientist · ByteDance Seed · AGI · AI for Science

Please reach out via my personal email if I don't respond promptly elsewhere.

Millennium Bridge, London, 2020.

About

I build multimodal foundation models and generative AI systems for machine intelligence and life sciences, spanning scalable generative modeling (e.g., LLMs & diffusion), post-training & alignment, and fast, steerable inference.

I am currently a research lead at ByteDance Seed — AI for Science, working with Prof. Quanquan Gu, where I lead the research and development of multimodal generative biomolecular foundation models — delivering the DPLM family of multimodal diffusion protein language models, and applying generative protein design to antibody design at all-atom resolution with energy-based preference alignment. I also pioneered diffusion language models (dLLMs) and built the first 10B-scale dLLM back in mid-2023.

🚀 To learn more about our research roadmap, visit our project page at https://bytedance.github.io/dplm.

Before joining ByteDance, I completed my five-year Ph.D. in Computer Science at NJUNLP Lab, Nanjing University (2016–2021), advised by Prof. Jiajun Chen and Prof. Shujian Huang. During my Ph.D., I spent a wonderful year at ILCC, University of Edinburgh with Prof. Alexandra Birch (2019–2020), and interned at ByteDance AI Lab with Prof. Hao Zhou and Prof. Lei Li (2020–2021).

News

Apr 2026 DPLM-Evo and PAR are accepted to ICML 2026. Congrats to the team — see you in Seoul!
May 2025 APM and DPLM-2.1 (Spotlight) are accepted to ICML 2025.
Apr 2025 Serving as an Area Chair for NeurIPS 2025.
Jan 2025 DPLM-2 and ProteinBench are accepted to ICLR 2025. DPLM-2 is a ByteDance ICLR 2025 Research Highlight.
May 2024 DPLM — a versatile diffusion protein foundation model — is accepted to ICML 2024. Invited talk at the ML for Protein Engineering seminar.
Apr 2024 DINOISER is accepted to TACL; also selected as an oral at ACL 2024.
Jan 2024 Serving as an Area Chair for ACL 2024.
Dec 2023 Serving as an Area Chair for NAACL 2024.
Nov 2023 Invited talk about LM-DESIGN at the ML for Protein Engineering seminar.
Oct 2023 Serving as an Area Chair for EACL 2024.
May–Oct 2023 Invited talks on deep generative sequence modeling for human languages and proteins at UC Santa Barbara, Tongji Univ., TechBeat, MLNLP seminar, IWNLG and SUFE.
Apr 2023 Deep Equilibrium Non-autoregressive Sequence Learning is accepted to Findings of ACL 2023.
Apr 2023 LM-Design is accepted to ICML 2023 as an oral presentation.
Nov 2022 Received the CIPS Best Doctoral Dissertation Award.
Jul 2022 Our LAFT paper received the Best Short-Paper Award at INLG 2022.
Oct 2021 REDER is accepted to NeurIPS 2021.
Jul 2021 Joined ByteDance as a research scientist.
Jun 2021 Passed my viva and received the Ph.D. degree.

Research Highlights

Protein LM for Design

LM-Design

The first reprogramming of pre-trained protein LMs into structure-conditioned masked diffusion for protein sequence design — +10–15% over prior SOTA.

ICML'23 Oral Code

DPLM Family

DPLM · DPLM-2 · DPLM-2.1 · DPLM-Evo

A unified, multimodal protein foundation built from discrete diffusion — scaling from sequence to sequence+structure, with bit-based tokens, flow-based decoding, and evolutionary edit-based diffusion (substitutions / insertions / deletions).

DPLM DPLM-2 DPLM-2.1 DPLM-Evo Project

Steerable Generative Protein Design

AbDPO · APM · PAR

Fast and steerable protein generation for antibody & complex design — energy-based preference optimization, all-atom generative complex modeling, and the first multiscale autoregressive structure generator with 2.5× faster sampling.

AbDPO APM PAR

dLLM Pioneer

Diffusion-LLM · DINOISER

The first 10B-scale discrete diffusion language model demonstrating scaling & instruction finetuning, and principled noise manipulation for Gaussian text diffusion.

Diffusion-LLM DINOISER

Benchmark

ProteinBench

A unified task taxonomy and multi-axis evaluation (quality / novelty / diversity / robustness) of protein foundation models with transparent tooling.

arXiv Project

NMT · Generaive Models · Pre-LLM Era

MGNMT · REDER · Volctrans GLAT

Mirror-generative translation (ICLR 2020 Oral, 8/8/8), duplex reversible seq2seq (NeurIPS 2021), and the non-autoregressive system that ranked #1 at WMT 2021 De→En.

MGNMT Volctrans GLAT

Selected Publications

[Google Scholar full list] · ¶ project (co-)lead, * equal contribution, † student/intern I mentored.

Towards a Generative Evolution Machine with DPLM-Evo

Xinyou Wang*†, Liang Hong*†, Jiasheng Ye†, Zaixiang Zheng¶, Yu Li, Shujian Huang, Quanquan Gu.

ICML 2026. arXiv:2605.00182.

[arXiv] [Project]

Presents DPLM-Evo, an evolutionary discrete-diffusion framework that explicitly models substitution, insertion, and deletion during denoising — aligning diffusion with how proteins actually evolve. A decoupled latent-alignment space enables indel-aware, variable-length generation, while a contextualised evolutionary noising kernel injects biologically informed mutation patterns. Achieves state-of-the-art mutation-effect prediction on ProteinGym (single-sequence) and supports simulated evolution and targeted post-editing of proteins.

discrete diffusion protein evolution indel-aware generation mutation effect prediction
Protein Autoregressive Modeling via Multiscale Structure Generation

Yanru Qu*†, Cheng-Yen Hsieh*, Zaixiang Zheng, Ge Liu, Quanquan Gu.

ICML 2026 Spotlight. arXiv:2602.04883.

[arXiv] [Project]

Introduces PAR, the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction — akin to sculpting a statue from rough topology to fine detail. A flow-based backbone decoder turns the AR transformer's multi-scale embeddings into atom-level coordinates, and noisy context learning + scheduled sampling mitigate exposure bias. Enables flexible prompted / motif-scaffolding generation zero-shot, with favourable scaling behaviour.

autoregressive generation multi-scale modeling protein backbone design motif scaffolding
An All-atom Protein Generative Model for Designing Protein Complexes

Ruizhe Chen*†, Dongyu Xue*, Xiangxin Zhou†, Zaixiang Zheng, Xiangxiang Zeng, Quanquan Gu.

ICML 2025. arXiv:2504.13075.

[arXiv] [Code]

Introduces APM, an all-atom generative model purpose-built for protein complexes. By integrating atom-level information and training on multi-chain data, APM accurately models inter-chain interactions and can design binding-competent complexes from scratch. It unifies multi-chain folding and inverse-folding in one backbone, and supports both supervised fine-tuning and zero-shot sampling for downstream design tasks.

protein complexes all-atom generative model multi-chain modeling binder design
Elucidating the Design Space of Multimodal Protein Language Models

Cheng-Yen Hsieh*, Xinyou Wang*†, Daiheng Zhang†, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng¶, Quanquan Gu.

ICML 2025 Spotlight. arXiv:2504.11454.

[arXiv] [Project] [Code]

Systematically elucidates the design space of multimodal protein language models that tokenize 3D structure. Identifies tokenization loss and inaccurate structure-token prediction as the main bottlenecks, and proposes improvements across generative modeling, structure-aware architecture, and data. The resulting 650M DPLM-2.1 cuts PDB folding RMSD from 5.52 to 2.36, outperforming 3B baselines and matching specialized folding models.

multimodal protein language model structure tokenization folding representation learning
DPLM-2: A Multimodal Diffusion Protein Language Model

Xinyou Wang†, Zaixiang Zheng¶, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.

ICLR 2025 ByteDance Highlight. arXiv:2410.13782.

[arXiv] [Project] [Code]

Extends DPLM into a multimodal model that jointly diffuses over protein sequence and 3D structure. DPLM-2 learns a unified distribution over sequence and structure, enabling simultaneous structure-sequence co-generation, structure-conditioned design (inverse folding), and sequence-conditioned folding within one pre-trained backbone — a single model for the full sequence-structure design cycle.

multimodal protein model sequence-structure co-generation discrete diffusion inverse folding
ProteinBench: A Holistic Evaluation of Protein Foundation Models

Fei Ye*, Zaixiang Zheng*, Dongyu Xue*, Yuning Shen*, Lihao Wang*, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu.

ICLR 2025. arXiv:2409.06744.

[arXiv] [Project]

ProteinBench is a holistic evaluation framework for protein foundation models, built on three pillars: (i) a taxonomy of tasks spanning the main protein modalities; (ii) multi-metric evaluation along quality, novelty, diversity, and robustness; and (iii) user-oriented analyses that expose current strengths and blind spots. Released with a public leaderboard, evaluation dataset, and modular toolkit as a living benchmark for the field.

benchmark protein foundation models holistic evaluation leaderboard
Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization

Xiangxin Zhou*†, Dongyu Xue*, Ruizhe Chen*†, Zaixiang Zheng, Liang Wang, Quanquan Gu.

NeurIPS 2024. arXiv:2403.16576.

[arXiv]

Casts antigen-specific antibody design as preference optimization over a pre-trained conditional diffusion model that jointly models antibody sequence and structure. AbDPO fine-tunes with a residue-level decomposed energy preference and uses gradient surgery to resolve conflicts between attractive and repulsive forces. Sets state-of-the-art on the RAbD benchmark, simultaneously lowering total energy and improving binding affinity.

antibody design preference optimization diffusion model antigen binding
Diffusion Language Models Are Versatile Protein Learners

Xinyou Wang*†, Zaixiang Zheng*¶, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.

ICML 2024 ByteDance Highlight. arXiv:2402.18567.

[arXiv] [Code]

Introduces DPLM, a versatile diffusion-based protein language model pre-trained on evolutionary-scale sequences. DPLM unifies protein representation learning and unconditional / conditional generation under a single discrete-diffusion objective, scales to billions of parameters, and enables controllable generation from arbitrary partial contexts without task-specific retraining — a foundational step toward general-purpose protein foundation models.

protein foundation model discrete diffusion representation learning controllable generation
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

Jiasheng Ye†, Zaixiang Zheng¶, Yu Bao, Lihua Qian, Quanquan Gu.

Preprint, arXiv:2308.12219. The first exploration of dLLM above 10B with discrete diffusion.

[arXiv] [Code]

Demonstrates that diffusion language models can become strong general-purpose language learners once scaled. The recipe: acquire knowledge via masked-language-model pretraining, then reprogram the pretrained MLM into a diffusion LM through diffusive adaptation, followed by task- and instruction-finetuning. Instruction tuning elicits zero- and few-shot in-context learning and reasoning, making this the first demonstration of a competent diffusion LM above 10B parameters.

diffusion language model instruction finetuning in-context learning non-autoregressive generation
Structure-informed Language Models Are Protein Designers

Zaixiang Zheng*¶, Yifan Deng*†, Dongyu Xue, Yi Zhou, Fei Ye, Quanquan Gu.

ICML 2023 Oral. arXiv:2302.01649.

[arXiv] [Code]

Reframes protein inverse folding as structure-conditioned language modeling: a pretrained protein language model is lightly adapted with structural cues to directly generate sequences that fold to a given backbone. LM-Design sets new state-of-the-art on CATH benchmarks with a small fraction of prior compute, showing that strong sequence priors + minimal structural conditioning rival heavy structure-native models.

inverse folding protein language model structure-conditioned generation parameter-efficient adaptation
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

Jiasheng Ye†, Zaixiang Zheng¶, Yu Bao, Lihua Qian, Mingxuan Wang.

TACL 2024 Oral @ ACL 2024 ByteDance Highlight. arXiv:2302.10025.

[arXiv] [Code]

Diagnoses why continuous-embedding diffusion models struggle with discrete sequences — the scale of noise is decisive. DINOISER adaptively determines the range of sampled noise scales during training to counter discreteness, and amplifies inference-time noise scales so the model faithfully leverages source conditions. Consistent gains across conditional sequence-generation benchmarks.

diffusion model sequence generation noise schedule conditional generation
Deep Equilibrium Non-autoregressive Sequence Learning

Zaixiang Zheng, Yi Zhou, Hao Zhou.

ACL 2023.

[Paper]

Views iterative non-autoregressive translation as seeking a fixed point of a state-update map, and models it with a deep-equilibrium (DEQ) layer. This gives constant-memory training and enables adaptive computation at inference, closing the gap with autoregressive NMT while preserving the parallel-decoding advantage.

non-autoregressive translation deep equilibrium model iterative refinement adaptive computation
LAFT: Cross-lingual Transfer for Text Generation by Language-Agnostic Finetuning

Xianze Wu†, Zaixiang Zheng¶, Hao Zhou, Yong Yu.

INLG 2022 Oral Best Short-Paper.

[Paper]

Studies how to transfer a multilingual generation model to low-resource target languages without parallel data. LAFT finetunes with a language-agnostic objective that disentangles content from language identity, yielding consistent gains across summarization and data-to-text benchmarks.

cross-lingual transfer text generation low-resource nlg language-agnostic finetuning
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT 2021

Lihua Qian*, Yi Zhou*, Zaixiang Zheng*, Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou.

WMT 2021 — Rank #1 on German→English, beating strong autoregressive systems.

[arXiv] [Code]

Our WMT'21 submission built on the Glancing Transformer for fully parallel (non-autoregressive) translation. To our knowledge the first parallel system scaled to a WMT-level setting, it achieves 35.0 BLEU on German→English — the top score in the task, outperforming all strong autoregressive counterparts.

non-autoregressive translation glancing transformer wmt 2021 parallel decoding
Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li.

NeurIPS 2021.

[Paper] [Code]

Introduces REDER, a reversible duplex sequence-to-sequence architecture in which the same network parameters can be executed forward or backward to translate in both directions. Shares parameters between source→target and target→source, improving data efficiency and enabling dual-direction cycle consistency within a single model.

reversible machine translation duality parameter sharing cycle consistency
Vocabulary Learning via Optimal Transport for Neural Machine Translation

Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li.

ACL 2021 Oral Best Paper.

[Paper] [Code]

Recasts subword vocabulary construction as an optimal-transport problem that balances entropy and vocabulary size under a principled marginal-utility objective. VOLT yields a search-free algorithm that finds strong vocabularies in minutes — not hours of brute-force BPE sweeps — and transfers well across 40+ language pairs. ACL 2021 Best Paper.

subword vocabulary optimal transport neural machine translation tokenization
Improving Self-Attention Networks with Sequential Relations

Zaixiang Zheng, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen.

IEEE/ACM TASLP 2020.

[Paper]

Injects explicit sequential-relation inductive biases — relative distance and local-order cues — into self-attention, complementing position embeddings. Consistent improvements on machine translation, language modelling, and NLU benchmarks, with minimal compute overhead.

self-attention inductive bias sequence modeling position encoding
Towards Making the Most of Context in Neural Machine Translation

Zaixiang Zheng*, Xiang Yue*†, Shujian Huang, Jiajun Chen, Alexandra Birch.

IJCAI 2020 Oral.

[arXiv] [Code]

A document-level NMT framework that jointly models each sentence's local context with the global context of the whole document, in both source and target sides. One unified model handles any document length — including isolated sentences — without separate sentence- vs. document-level training. Up to +2.1 BLEU over Transformer baselines, with benefit extending far beyond the usual two-or-three-sentence window.

document-level translation context modeling neural machine translation long-range dependency
Mirror-Generative Neural Machine Translation

Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen.

ICLR 2020 Oral (8/8/8).

[Paper] [Code]

MGNMT unifies source-to-target and target-to-source translation along with both language models into a single mirror-symmetric latent-variable model. This joint generative formulation lets the model exploit non-parallel monolingual data from both sides and naturally supports semi-supervised learning, bidirectional decoding, and reranking in one framework.

generative machine translation latent variable model semi-supervised learning bidirectional decoding
Dynamic Past and Future for Neural Machine Translation

Zaixiang Zheng, Zhaopeng Tu, Shujian Huang, Xin-Yu Dai, Jiajun Chen.

EMNLP 2019.

[Paper] [Code]

Extends past/future modelling in NMT with a dynamic capsule that adaptively segments translated versus untranslated content during decoding, instead of relying on a fixed split. Consistent BLEU gains and more interpretable coverage behaviour on multiple WMT language pairs.

neural machine translation coverage modeling capsule network decoding
Modeling Past and Future for Neural Machine Translation

Zaixiang Zheng*, Hao Zhou*, Shujian Huang, Lili Mou, Xin-Yu Dai, Jiajun Chen, Zhaopeng Tu.

TACL 2018 (presented at ACL 2018).

[Paper] [Code]

Proposes to explicitly split the source representation at every decoding step into a past part (already translated) and a future part (still to translate), with recurrent update rules that preserve this bookkeeping throughout decoding. Reduces over- and under-translation and improves BLEU across multiple WMT benchmarks.

neural machine translation coverage decoding sequence modeling

Awards & Talks

Awards

CIPS Best Doctoral Dissertation Award, Chinese Information Processing Society of China (中国中文信息学会优秀博士论文奖), 2022.
Best Paper Award, ACL 2021 — Vocabulary Learning via Optimal Transport for NMT.
Best Short-Paper Award, INLG 2022 — LAFT: Cross-lingual Transfer for Text Generation.
WMT 2021 Shared Task — 1st Place on German→English, with the Volctrans GLAT non-autoregressive translation system.
ByteDance ICLR 2025 Research Highlight.
ByteDance ICML 2024 Research Highlight.
ByteDance ACL 2024 Research Highlight.

Invited Talks (selected)

Towards Multimodal Generative Protein Modeling and Design, 2025 — AI4Protein Seminar; AIBC 2025; Shanghai Jiao Tong University; National University of Singapore; ByteDance Lightning Talk at ByteDance Booths, ICLR 2025 & ICML 2025.
Towards Generative Protein Modeling and Design, 2024 — ML for Protein Engineering Seminar; AICon 2024; CIPS YSSNLP 2024.
Deep generative sequence modeling for human languages and proteins, May–Oct 2023 — UC Santa Barbara; Tongji University; TechBeat; MLNLP seminar; IWNLG; Shanghai University of Finance and Economics.
LM-Design, 2023 — ML for Protein Engineering Seminar.
Deep generative modeling for natural languages, 2022 — FudanNLP.

Academic Services

Area Chair: NeurIPS 2025–present; ICLR/ICML 2026–present; ACL / EMNLP / NAACL / EACL 2023–2025.
Conference Reviewer: ICLR / NeurIPS 2021–present; ICML 2022–present; ACL / EMNLP / NAACL 2019–2023; ARR 2021–2023; AAAI / IJCAI 2020–present.
Journal Reviewer: TPAMI 2022–present; JMLR 2024–present.
Program Committee Member of AAAI 2023 Doctoral Consortium.

Mentoring

I am fortunate and grateful to work with the following talented students and interns:

Jiasheng Ye — BS @ Nanjing Univ. → MS @ Fudan Univ. (2022–2023, 2025–now; DINOISER / Diffusion-LLM / DPLM-Evo)
Liang Hong — PhD @ CUHK. (2025–now; DPLM-Evo)
Xinyou Wang — BS / PhD @ Nanjing Univ. (2023–now; DPLM / DPLM-2 / DPLM-2.1 / DPLM-Evo)
Yanru Qu — PhD @ UIUC. (2025; PAR)
Ruizhe Chen — MS @ Hunan Univ. (2025–now; AbDPO / APM)
Daiheng Zhang — MS @ NYU → PhD @ Rutgers. (2024–2025; DPLM-2.1)
Xiangxin Zhou — PhD @ CASIA. (2023–2025; AbDPO / APM)
Yifan Deng — MS @ Fudan Univ. → PhD @ UW-Madison. (2022–2023; LM-DESIGN)
Xianze Wu — MS @ SJTU. (2021–2022; LAFT)
Xiang Yue — BS @ Nanjing Univ. → MS @ CMU. (2019–2020; Long-context NMT)

Misc.

A fun fact about me: my name is pronounced roughly as /dʒeng tsai-hsiang/ in Mandarin Chinese, which could literally be interpreted as "[I am] flying".