Zaixiang Zheng 郑在翔

Research Scientist · ByteDance Seed · AGI · AI for Science

Please reach out via my personal email if I don't respond promptly elsewhere.

Zaixiang Zheng
Millennium Bridge, London, 2020.

About

I build multimodal foundation models and generative AI systems for machine intelligence and life sciences, spanning scalable generative modeling (e.g., LLMs & diffusion), post-training & alignment, and fast, steerable inference.

I am currently a research lead at ByteDance Seed — AI for Science, working with Prof. Quanquan Gu, where I lead the research and development of multimodal generative biomolecular foundation models — delivering the DPLM family of multimodal diffusion protein language models, and applying generative protein design to antibody design at all-atom resolution with energy-based preference alignment. I also pioneered diffusion language models (dLLMs) and built the first 10B-scale dLLM back in mid-2023.

🚀 To learn more about our research roadmap, visit our project page at https://bytedance.github.io/dplm.

Before joining ByteDance, I completed my five-year Ph.D. in Computer Science at NJUNLP Lab, Nanjing University (2016–2021), advised by Prof. Jiajun Chen and Prof. Shujian Huang. During my Ph.D., I spent a wonderful year at ILCC, University of Edinburgh with Prof. Alexandra Birch (2019–2020), and interned at ByteDance AI Lab with Prof. Hao Zhou and Prof. Lei Li (2020–2021).

News

  • Apr 2026 DPLM-Evo and PAR are accepted to ICML 2026. Congrats to the team — see you in Seoul!
  • May 2025 APM and DPLM-2.1 (Spotlight) are accepted to ICML 2025.
  • Apr 2025 Serving as an Area Chair for NeurIPS 2025.
  • Jan 2025 DPLM-2 and ProteinBench are accepted to ICLR 2025. DPLM-2 is a ByteDance ICLR 2025 Research Highlight.
  • May 2024 DPLM — a versatile diffusion protein foundation model — is accepted to ICML 2024. Invited talk at the ML for Protein Engineering seminar.
  • Apr 2024 DINOISER is accepted to TACL; also selected as an oral at ACL 2024.
  • Jan 2024 Serving as an Area Chair for ACL 2024.
  • Dec 2023 Serving as an Area Chair for NAACL 2024.
  • Nov 2023 Invited talk about LM-DESIGN at the ML for Protein Engineering seminar.
  • Oct 2023 Serving as an Area Chair for EACL 2024.
  • May–Oct 2023 Invited talks on deep generative sequence modeling for human languages and proteins at UC Santa Barbara, Tongji Univ., TechBeat, MLNLP seminar, IWNLG and SUFE.
  • Apr 2023 Deep Equilibrium Non-autoregressive Sequence Learning is accepted to Findings of ACL 2023.
  • Apr 2023 LM-Design is accepted to ICML 2023 as an oral presentation.
  • Nov 2022 Received the CIPS Best Doctoral Dissertation Award.
  • Jul 2022 Our LAFT paper received the Best Short-Paper Award at INLG 2022.
  • Oct 2021 REDER is accepted to NeurIPS 2021.
  • Jul 2021 Joined ByteDance as a research scientist.
  • Jun 2021 Passed my viva and received the Ph.D. degree.

Research Highlights

Protein LM for Design

LM-Design

The first reprogramming of pre-trained protein LMs into structure-conditioned masked diffusion for protein sequence design — +10–15% over prior SOTA.

DPLM Family

DPLM · DPLM-2 · DPLM-2.1 · DPLM-Evo

A unified, multimodal protein foundation built from discrete diffusion — scaling from sequence to sequence+structure, with bit-based tokens, flow-based decoding, and evolutionary edit-based diffusion (substitutions / insertions / deletions).

Steerable Generative Protein Design

AbDPO · APM · PAR

Fast and steerable protein generation for antibody & complex design — energy-based preference optimization, all-atom generative complex modeling, and the first multiscale autoregressive structure generator with 2.5× faster sampling.

dLLM Pioneer

Diffusion-LLM · DINOISER

The first 10B-scale discrete diffusion language model demonstrating scaling & instruction finetuning, and principled noise manipulation for Gaussian text diffusion.

Benchmark

ProteinBench

A unified task taxonomy and multi-axis evaluation (quality / novelty / diversity / robustness) of protein foundation models with transparent tooling.

NMT · Generaive Models · Pre-LLM Era

MGNMT · REDER · Volctrans GLAT

Mirror-generative translation (ICLR 2020 Oral, 8/8/8), duplex reversible seq2seq (NeurIPS 2021), and the non-autoregressive system that ranked #1 at WMT 2021 De→En.

Selected Publications

[Google Scholar full list]  ·  project (co-)lead, * equal contribution, student/intern I mentored.

  1. Towards a Generative Evolution Machine with DPLM-Evo
    Xinyou Wang*, Liang Hong*, Jiasheng Ye, Zaixiang Zheng, Yu Li, Shujian Huang, Quanquan Gu.
    ICML 2026. arXiv:2605.00182.

    Presents DPLM-Evo, an evolutionary discrete-diffusion framework that explicitly models substitution, insertion, and deletion during denoising — aligning diffusion with how proteins actually evolve. A decoupled latent-alignment space enables indel-aware, variable-length generation, while a contextualised evolutionary noising kernel injects biologically informed mutation patterns. Achieves state-of-the-art mutation-effect prediction on ProteinGym (single-sequence) and supports simulated evolution and targeted post-editing of proteins.

    discrete diffusion protein evolution indel-aware generation mutation effect prediction
  2. Protein Autoregressive Modeling via Multiscale Structure Generation
    Yanru Qu*, Cheng-Yen Hsieh*, Zaixiang Zheng, Ge Liu, Quanquan Gu.
    ICML 2026 Spotlight. arXiv:2602.04883.

    Introduces PAR, the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction — akin to sculpting a statue from rough topology to fine detail. A flow-based backbone decoder turns the AR transformer's multi-scale embeddings into atom-level coordinates, and noisy context learning + scheduled sampling mitigate exposure bias. Enables flexible prompted / motif-scaffolding generation zero-shot, with favourable scaling behaviour.

    autoregressive generation multi-scale modeling protein backbone design motif scaffolding
  3. An All-atom Protein Generative Model for Designing Protein Complexes
    Ruizhe Chen*, Dongyu Xue*, Xiangxin Zhou, Zaixiang Zheng, Xiangxiang Zeng, Quanquan Gu.
    ICML 2025. arXiv:2504.13075.

    Introduces APM, an all-atom generative model purpose-built for protein complexes. By integrating atom-level information and training on multi-chain data, APM accurately models inter-chain interactions and can design binding-competent complexes from scratch. It unifies multi-chain folding and inverse-folding in one backbone, and supports both supervised fine-tuning and zero-shot sampling for downstream design tasks.

    protein complexes all-atom generative model multi-chain modeling binder design
  4. Elucidating the Design Space of Multimodal Protein Language Models
    Cheng-Yen Hsieh*, Xinyou Wang*, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu.
    ICML 2025 Spotlight. arXiv:2504.11454.

    Systematically elucidates the design space of multimodal protein language models that tokenize 3D structure. Identifies tokenization loss and inaccurate structure-token prediction as the main bottlenecks, and proposes improvements across generative modeling, structure-aware architecture, and data. The resulting 650M DPLM-2.1 cuts PDB folding RMSD from 5.52 to 2.36, outperforming 3B baselines and matching specialized folding models.

    multimodal protein language model structure tokenization folding representation learning
  5. DPLM-2: A Multimodal Diffusion Protein Language Model
    Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.
    ICLR 2025 ByteDance Highlight. arXiv:2410.13782.

    Extends DPLM into a multimodal model that jointly diffuses over protein sequence and 3D structure. DPLM-2 learns a unified distribution over sequence and structure, enabling simultaneous structure-sequence co-generation, structure-conditioned design (inverse folding), and sequence-conditioned folding within one pre-trained backbone — a single model for the full sequence-structure design cycle.

    multimodal protein model sequence-structure co-generation discrete diffusion inverse folding
  6. ProteinBench: A Holistic Evaluation of Protein Foundation Models
    Fei Ye*, Zaixiang Zheng*, Dongyu Xue*, Yuning Shen*, Lihao Wang*, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu.
    ICLR 2025. arXiv:2409.06744.

    ProteinBench is a holistic evaluation framework for protein foundation models, built on three pillars: (i) a taxonomy of tasks spanning the main protein modalities; (ii) multi-metric evaluation along quality, novelty, diversity, and robustness; and (iii) user-oriented analyses that expose current strengths and blind spots. Released with a public leaderboard, evaluation dataset, and modular toolkit as a living benchmark for the field.

    benchmark protein foundation models holistic evaluation leaderboard
  7. Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization
    Xiangxin Zhou*, Dongyu Xue*, Ruizhe Chen*, Zaixiang Zheng, Liang Wang, Quanquan Gu.
    NeurIPS 2024. arXiv:2403.16576.

    Casts antigen-specific antibody design as preference optimization over a pre-trained conditional diffusion model that jointly models antibody sequence and structure. AbDPO fine-tunes with a residue-level decomposed energy preference and uses gradient surgery to resolve conflicts between attractive and repulsive forces. Sets state-of-the-art on the RAbD benchmark, simultaneously lowering total energy and improving binding affinity.

    antibody design preference optimization diffusion model antigen binding
  8. Diffusion Language Models Are Versatile Protein Learners
    Xinyou Wang*, Zaixiang Zheng*, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.
    ICML 2024 ByteDance Highlight. arXiv:2402.18567.

    Introduces DPLM, a versatile diffusion-based protein language model pre-trained on evolutionary-scale sequences. DPLM unifies protein representation learning and unconditional / conditional generation under a single discrete-diffusion objective, scales to billions of parameters, and enables controllable generation from arbitrary partial contexts without task-specific retraining — a foundational step toward general-purpose protein foundation models.

    protein foundation model discrete diffusion representation learning controllable generation
  9. Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
    Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Quanquan Gu.
    Preprint, arXiv:2308.12219. The first exploration of dLLM above 10B with discrete diffusion.

    Demonstrates that diffusion language models can become strong general-purpose language learners once scaled. The recipe: acquire knowledge via masked-language-model pretraining, then reprogram the pretrained MLM into a diffusion LM through diffusive adaptation, followed by task- and instruction-finetuning. Instruction tuning elicits zero- and few-shot in-context learning and reasoning, making this the first demonstration of a competent diffusion LM above 10B parameters.

    diffusion language model instruction finetuning in-context learning non-autoregressive generation
  10. Structure-informed Language Models Are Protein Designers
    Zaixiang Zheng*, Yifan Deng*, Dongyu Xue, Yi Zhou, Fei Ye, Quanquan Gu.
    ICML 2023 Oral. arXiv:2302.01649.

    Reframes protein inverse folding as structure-conditioned language modeling: a pretrained protein language model is lightly adapted with structural cues to directly generate sequences that fold to a given backbone. LM-Design sets new state-of-the-art on CATH benchmarks with a small fraction of prior compute, showing that strong sequence priors + minimal structural conditioning rival heavy structure-native models.

    inverse folding protein language model structure-conditioned generation parameter-efficient adaptation
  11. DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
    Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Mingxuan Wang.
    TACL 2024 Oral @ ACL 2024 ByteDance Highlight. arXiv:2302.10025.

    Diagnoses why continuous-embedding diffusion models struggle with discrete sequences — the scale of noise is decisive. DINOISER adaptively determines the range of sampled noise scales during training to counter discreteness, and amplifies inference-time noise scales so the model faithfully leverages source conditions. Consistent gains across conditional sequence-generation benchmarks.

    diffusion model sequence generation noise schedule conditional generation
  12. Deep Equilibrium Non-autoregressive Sequence Learning
    Zaixiang Zheng, Yi Zhou, Hao Zhou.
    ACL 2023.

    Views iterative non-autoregressive translation as seeking a fixed point of a state-update map, and models it with a deep-equilibrium (DEQ) layer. This gives constant-memory training and enables adaptive computation at inference, closing the gap with autoregressive NMT while preserving the parallel-decoding advantage.

    non-autoregressive translation deep equilibrium model iterative refinement adaptive computation
  13. LAFT: Cross-lingual Transfer for Text Generation by Language-Agnostic Finetuning
    Xianze Wu, Zaixiang Zheng, Hao Zhou, Yong Yu.
    INLG 2022 Oral Best Short-Paper.

    Studies how to transfer a multilingual generation model to low-resource target languages without parallel data. LAFT finetunes with a language-agnostic objective that disentangles content from language identity, yielding consistent gains across summarization and data-to-text benchmarks.

    cross-lingual transfer text generation low-resource nlg language-agnostic finetuning
  14. The Volctrans GLAT System: Non-autoregressive Translation Meets WMT 2021
    Lihua Qian*, Yi Zhou*, Zaixiang Zheng*, Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou.
    WMT 2021 Rank #1 on German→English, beating strong autoregressive systems.

    Our WMT'21 submission built on the Glancing Transformer for fully parallel (non-autoregressive) translation. To our knowledge the first parallel system scaled to a WMT-level setting, it achieves 35.0 BLEU on German→English — the top score in the task, outperforming all strong autoregressive counterparts.

    non-autoregressive translation glancing transformer wmt 2021 parallel decoding
  15. Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
    Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li.
    NeurIPS 2021.

    Introduces REDER, a reversible duplex sequence-to-sequence architecture in which the same network parameters can be executed forward or backward to translate in both directions. Shares parameters between source→target and target→source, improving data efficiency and enabling dual-direction cycle consistency within a single model.

    reversible machine translation duality parameter sharing cycle consistency
  16. Vocabulary Learning via Optimal Transport for Neural Machine Translation
    Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li.
    ACL 2021 Oral Best Paper.

    Recasts subword vocabulary construction as an optimal-transport problem that balances entropy and vocabulary size under a principled marginal-utility objective. VOLT yields a search-free algorithm that finds strong vocabularies in minutes — not hours of brute-force BPE sweeps — and transfers well across 40+ language pairs. ACL 2021 Best Paper.

    subword vocabulary optimal transport neural machine translation tokenization
  17. Improving Self-Attention Networks with Sequential Relations
    Zaixiang Zheng, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen.
    IEEE/ACM TASLP 2020.

    Injects explicit sequential-relation inductive biases — relative distance and local-order cues — into self-attention, complementing position embeddings. Consistent improvements on machine translation, language modelling, and NLU benchmarks, with minimal compute overhead.

    self-attention inductive bias sequence modeling position encoding
  18. Towards Making the Most of Context in Neural Machine Translation
    Zaixiang Zheng*, Xiang Yue*, Shujian Huang, Jiajun Chen, Alexandra Birch.
    IJCAI 2020 Oral.

    A document-level NMT framework that jointly models each sentence's local context with the global context of the whole document, in both source and target sides. One unified model handles any document length — including isolated sentences — without separate sentence- vs. document-level training. Up to +2.1 BLEU over Transformer baselines, with benefit extending far beyond the usual two-or-three-sentence window.

    document-level translation context modeling neural machine translation long-range dependency
  19. Mirror-Generative Neural Machine Translation
    Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen.
    ICLR 2020 Oral (8/8/8).

    MGNMT unifies source-to-target and target-to-source translation along with both language models into a single mirror-symmetric latent-variable model. This joint generative formulation lets the model exploit non-parallel monolingual data from both sides and naturally supports semi-supervised learning, bidirectional decoding, and reranking in one framework.

    generative machine translation latent variable model semi-supervised learning bidirectional decoding
  20. Dynamic Past and Future for Neural Machine Translation
    Zaixiang Zheng, Zhaopeng Tu, Shujian Huang, Xin-Yu Dai, Jiajun Chen.
    EMNLP 2019.

    Extends past/future modelling in NMT with a dynamic capsule that adaptively segments translated versus untranslated content during decoding, instead of relying on a fixed split. Consistent BLEU gains and more interpretable coverage behaviour on multiple WMT language pairs.

    neural machine translation coverage modeling capsule network decoding
  21. Modeling Past and Future for Neural Machine Translation
    Zaixiang Zheng*, Hao Zhou*, Shujian Huang, Lili Mou, Xin-Yu Dai, Jiajun Chen, Zhaopeng Tu.
    TACL 2018 (presented at ACL 2018).

    Proposes to explicitly split the source representation at every decoding step into a past part (already translated) and a future part (still to translate), with recurrent update rules that preserve this bookkeeping throughout decoding. Reduces over- and under-translation and improves BLEU across multiple WMT benchmarks.

    neural machine translation coverage decoding sequence modeling

Awards & Talks

Awards

Invited Talks (selected)

Academic Services

Mentoring

I am fortunate and grateful to work with the following talented students and interns:

Misc.