Research
I'm interested in computer vision, generative AI. I mainly focus on diffusion-based image and video generation.
Your browser does not support the video tag.
FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning
Xirui Li , Zhe Liu, Xiaoqing Ye, Wenhua Han, Yifeng Pan, Junyu Han, Hengshuang Zhao
arXiv , 2026.
arxiv /
project page /
code /
A flow-matching planner that learns the reward-to-action distribution from dense trajectory-reward pairs for multimodal driving planning.
Your browser does not support the video tag.
UniCon: A Simple Approach to Unifying Diffusion-based Conditional Generation
Xirui Li , Charles Herrmann, Kelvin C.K. Chan, Yinxiao Li, Deqing Sun, Chao Ma, Ming-Hsuan Yang
ICLR , 2025.
arxiv /
project page /
code /
A simple, unified framework to handle diverse conditional generation tasks involving a specific image-condition correlation in one diffusion model.
Your browser does not support the video tag.
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li , Chao Ma, Xiaokang Yang, Ming-Hsuan Yang
CVPR , 2024.
arxiv /
project page /
code /
A zero-shot video editing method utilizing a pretrained image diffusion model. The key idea is to enforce video temporal consistency by merging self-attention tokens across frames.
Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
Xirui Li , Feng Wang, Naiyan Wang, Chao Ma
ICRA , 2024.
arxiv /
A detection enhancement method which improves 3D object detection results by forwarding and fusing history detection results.