Biography
I am a Ph.D. at the Multimedia Laboratory (MMLab) of The Chinese University of Hong Kong (CUHK), fortunate to be supervised by Prof. Hongsheng Li. I also work closely with Prof. Xihui Liu. I completed my doctorate in 2025.
My research is driven by a passion for Artificial General Intelligence (AGI) with a focus on visual understanding and generation. I am dedicated to developing integrated systems capable of perceiving, understanding, and generating visual content by leveraging advanced techniques with Multimodal Large Language Models.
Previously, I was a visiting scholar at MIT CSAIL, advised by Prof. Dina Katabi. I obtained my B.Eng. degree from Shanghai Jiao Tong University, where I was ranked 1st/157 and advised by Prof. Bingbing Ni.
Education
-
[2021 - 2025] Ph.D. at MMLab, The Chinese University of Hong Kong.
-
[2016 - 2020] B.Eng. in Information Engineering, Shanghai Jiao Tong University (Ranking: 1st/157).
-
[2019 - 2020] Visiting Scholar at CSAIL, Massachusetts Institute of Technology.
Publications
(* indicates equal contribution)
-
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
C Duan*, K Sun*, Rongyao Fang*, M Zhang, Y Feng, Y Luo, Y Liu, K Wang, P Pei, X Cai, H Li, Y Ma, X Liu
arXiv preprint, 2025 [
Paper]
-
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
arXiv preprint, 2025 [
Paper]
-
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
arXiv preprint, 2025 [
Paper]
-
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
arXiv preprint, 2025 [
Paper]
-
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Rongyao Fang, C Duan, K Wang, L Huang, H Li, S Yan, H Tian, X Zeng, R Zhao, J Dai, X Liu, H Li
Conference on Neural Information Processing Systems (NeurIPS), 2025 [
Paper]
-
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
arXiv preprint, 2025 [
Paper]
-
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [
Paper]
-
StreamChat: Chatting with Streaming Video
arXiv preprint, 2024 [
Paper]
-
Puma: Empowering Unified MLLM with Multi-Granular Visual Generation
International Conference on Computer Vision (ICCV), 2025 [
Paper]
-
Mimic Before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
International Journal of Computer Vision (IJCV), 2024 [
Paper]
-
FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 [
Paper]
-
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
European Conference on Computer Vision (ECCV), 2024 [
Paper]
-
Clip-Adapter: Better Vision-Language Models with Feature Adapters
International Journal of Computer Vision (IJCV), 2024 [
Paper]
-
InstructSeq: Unifying Vision Tasks with Instruction-Conditioned Multi-Modal Sequence Generation
arXiv preprint, 2023 [
Paper]
-
Point-M2AE: Multi-Scale Masked Autoencoders for Hierarchical Point Cloud Pre-Training
Conference on Neural Information Processing Systems (NeurIPS), 2022 [
Paper]
-
RBGNet: Ray-Based Grouping for 3D Object Detection
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 [
Paper]
-
Tip-Adapter: Training-Free CLIP-Adapter for Better Vision-Language Modeling
European Conference on Computer Vision (ECCV), 2022 [
Paper]
-
Learning Longterm Representations for Person Re-Identification Using Radio Signals
Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [
Paper]
-
Probabilistic Radiomics: Ambiguous Diagnosis with Controllable Shape Analysis
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2019 [
Paper]
-
Adversarial Attack and Defense on Point Sets
arXiv preprint, 2019 [
Paper]
Experience
-
[Feb. 2024 - Aug. 2025] Research Intern, SenseTime.
-
[Jun. 2022 - Apr. 2023] Research Intern, Shanghai AI Laboratory.
Selected Awards
-
[2021] Hong Kong PhD Fellowship.
-
[2020] Outstanding Graduates of Shanghai (Top 1%).
-
[2017 & 2018] National Scholarship (Top 1%).
-
[2017 & 2018] Zhiyuan College Honors Scholarship (Top 5%).