Publications

Symbols (†) denotes student co-authors.

Beyond Pixel Histories: World Models with Persistent 3D State

Samuel Garcin, Thomas Walker, Steven McDonagh, Tim Pearce, Hakan Bilen, Tianyu He, Kaixin Wang, Jiang Bian

arXiv preprint arXiv:2603.03482 2026

A new paradigm of world model which simulates the evolution of a latent 3D scene: environment, camera, and renderer.

Paper Project

LIVE: Long-horizon Interactive Video World Modeling

Junchao Huang^†, Ziyang Ye^†, Xinting Hu, Tianyu He, Guiyu Zhang^†, Shaoshuai Shi, Jiang Bian, Li Jiang

arXiv preprint arXiv:2602.03747 2026

A framework that alleviates autoregressive error accumulation via a cycle-consistency constraint.

Paper Project

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

Jiayi Xu^†, Zhang Zhang^†, Yuanrui Zhang^†, Ruitao Chen^†, Yixian Xu^†, Tianyu He, Di He

arXiv preprint arXiv:2601.01085 2026

A training-free and probabilistically-certified watermarking method for general vision generative models.

Paper

Quotient-Space Diffusion Models

Yixian Xu^†, Yusong Wang^†, Shengjie Luo^†, Kaiyuan Gao^†, Tianyu He, Di He, Chang Liu

International Conference on Learning Representations (ICLR) Oral 2026

A formal framework for diffusion modeling on a general quotient space.

Paper

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Haoyu Wu^†, Diankun Wu^†, Tianyu He, Junliang Guo, Yang Ye^†, Yueqi Duan, Jiang Bian

International Conference on Learning Representations (ICLR) 2026

Combining video diffusion with 3D representation for geometrically consistent world modeling.

Paper Project Code Media

Fast Autoregressive Video Generation with Diagonal Decoding

Yang Ye^†, Junliang Guo, Haoyu Wu^†, Tianyu He, Tim Pearce, Tabish Rashid, Katja Hofmann, Jiang Bian

Findings of the Computer Vision and Pattern Recognition Conference (CVPR) 2026

Accelerating autoregressive video generation through diagonal decoding strategy.

Paper Project

AR4D: Autoregressive 4D Generation from Monocular Videos

Hanxin Zhu^†, Tianyu He, Xiqian Yu^†, Junliang Guo, Zhibo Chen, Jiang Bian

Findings of the Computer Vision and Pattern Recognition Conference (CVPR) 2026

Autoregressive 4D content generation from monocular video input.

Paper Project Media

Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft

Junchao Huang^†, Xinting Hu, Boyao Han^†, Shaoshuai Shi, Zhuotao Tian, Tianyu He, Li Jiang

arXiv preprint arXiv:2510.03198 2025

A learning framework that pairs training protocols with a geometry-indexed spatial memory.

Paper Project

Reinforcement Learning with Inverse Rewards for World Model Post-training

Yang Ye^†, Tianyu He, Shuo Yang^†, Jiang Bian

arXiv preprint arXiv:2509.23958 2025

A post-training framework that derives verifiable reward signals by recovering input actions from generated videos using an Inverse Dynamics Model.

Paper

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Siyi Xie^†, Hanxin Zhu^†, Xinyi Chen^†, Tianyu He, Xin Li, Zhibo Chen

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2025

Generating spatial audio for immersive 4D scene exploration.

Paper Project Code

Playing with Transformer at 30+ FPS via Next-Frame Diffusion

Xinle Cheng^†, Tianyu He, Jiayi Xu^†, Junliang Guo, Di He, Jiang Bian

arXiv preprint arXiv:2506.01380 2025

Achieving autoregressive video generation at 30+ FPS through next-frame diffusion.

Paper Project Media

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft

Junliang Guo, Yang Ye^†, Tianyu He, Haoyu Wu^†, Yushu Jiang^†, Tim Pearce, Jiang Bian

arXiv preprint arXiv:2504.08388 2025

A real-time and open-source interactive world model built on Minecraft.

Paper Code Media

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang, Tianyu Wang, Feras Dayoub, Ian Reid

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) 2025

A simple yet highly powerful 3D LMM designed to act as an intelligent assistant in comprehending, reasoning, and interacting with the 3D world.

Paper Code

VidTwin: Video VAE with Decoupled Structure and Dynamics

Yuchi Wang^†, Junliang Guo, Xinyi Xie^†, Tianyu He, Xu Sun, Jiang Bian

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) 2025

A video VAE approach that decouples structure and dynamics for improved video representation.

Paper Project Code

Video in-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang^†, Junliang Guo, Tianyu He, Li Zhao, Linli Xu, Jiang Bian

International Conference on Learning Representations (ICLR) 2025

Demonstrating that autoregressive transformers can perform zero-shot video imitation via in-context learning.

Paper Project Media

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang^†, Junliang Guo, Jianhong Bai^†, Runyi Yu^†, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2025

Text-guided emotion and motion control for avatar generation.

Paper Project

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

Jianhong Bai^†, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian

Proceedings of the ACM International Conference on Multimedia (ACM MM) 2025

A unified tuning-free framework for both video motion and appearance editing.

Paper Project Code Media

VidTok: A Versatile and Open-Source Video Tokenizer

Anni Tang^†, Tianyu He, Junliang Guo, Xinle Cheng^†, Li Song, Jiang Bian

arXiv preprint arXiv:2412.13061 2024

A versatile and open-source video tokenizer for video understanding and generation.

Paper Code Media

IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI

Xiaoyu Chen^†, Junliang Guo, Tianyu He, Chuheng Zhang, Pushi Zhang, Derek Cathera Yang, Li Zhao, Jiang Bian

arXiv preprint arXiv:2411.00785 2024

Proposing latent action models for embodied AI foundation models.

Paper Project Media

Compositional 3D-aware Video Generation with LLM Director

Hanxin Zhu^†, Tianyu He, Anni Tang^†, Junliang Guo, Zhibo Chen, Jiang Bian

Advances in Neural Information Processing Systems (NeurIPS) 2024

Compositional 3D-aware video generation directed by large language models.

Paper Project

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang^†, Hanxin Zhu^†, Tianyu He, Runsen Feng^†, Jiajun Deng, Jiang Bian, Zhibo Chen

The European Conference on Computer Vision (ECCV) 2024

Formulating the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem.

Paper Project Code

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

Hanxin Zhu^†, Tianyu He, Xin Li, Bingchen Li^†, Zhibo Chen

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) 2024

Investigating the sufficiency of vanilla MLP architectures in NeRF for few-shot view synthesis.

Paper Media

GAIA: Zero-Shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu^†, Yuchi Wang^†, Jialiang Zhu^†, Kaikai An^†, Leyi Li^†, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

International Conference on Learning Representations (ICLR) 2024

Zero-shot talking avatar generation without subject-specific fine-tuning.

Paper Project Media

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Anni Tang^†, Tianyu He, Xu Tan, Jun Ling^†, Li Song

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024

Using memory mechanisms to alleviate one-to-many mapping in talking face generation.

Paper Project

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Chenpeng Du^†, Qi Chen^†, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian

Proceedings of the ACM International Conference on Multimedia (ACM MM) 2023

High fidelity speech-driven talking face generation using diffusion autoencoder.

Paper Project

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details

Zenghao Chai^†, Tianke Zhang^†, Tianyu He, Xu Tan, Tadas Baltruvsaitis, HsiangTao Wu, Runnan Li, Sheng Zhao, Chun Yuan, Jiang Bian

Proceedings of the International Conference on Computer Vision (ICCV) 2023

High-fidelity 3D face reconstruction with both static and dynamic facial details.

Paper Project Media