Rongyao Fang 方荣耀方荣耀

Research Scientist研究员

Qwen VL TeamQwen VL 团队
Alibaba Group阿里巴巴集团

Email:邮箱： rongyaofang@gmail.com

[Google Scholar] [GitHub] [CV]

Biography个人简介

I am currently a Research Scientist at Alibaba Qwen VL Team, working on enhancing LLMs' web development capabilities, including vision-to-code generation and vision-language agents for autonomous front-end development. Previously, my research centered on unified multimodal models for visual understanding and generation. I am driven by a passion for Artificial General Intelligence (AGI), with a focus on building systems that bridge perception, reasoning, and autonomous creation.

I obtained my Ph.D. from the Multimedia Laboratory (MMLab) of The Chinese University of Hong Kong (CUHK) in 2025, fortunate to be supervised by Prof. Hongsheng Li. I also worked closely with Prof. Xihui Liu.

Previously, I was a visiting scholar at MIT CSAIL, advised by Prof. Dina Katabi. I obtained my B.Eng. degree from Shanghai Jiao Tong University, where I was ranked 1st/157 and advised by Prof. Bingbing Ni.

我目前在阿里巴巴 Qwen VL 团队担任研究员，主要致力于提升大模型的 Web 开发能力，包括视觉到代码生成（Vision-to-Code）以及面向自主前端开发的视觉语言智能体。此前，我的研究聚焦于统一多模态大模型（视觉理解与生成一体化）。我的研究以通用人工智能（AGI）为长期目标，致力于构建连接感知、推理与自主创作的智能系统。

我于2025年在香港中文大学多媒体实验室（MMLab）获得博士学位，导师为李鸿升教授。博士期间，我也与刘希慧教授保持紧密合作。

此前，我曾在麻省理工学院计算机科学与人工智能实验室（MIT CSAIL）担任访问学者，导师为 Prof. Dina Katabi。本科毕业于上海交通大学信息工程专业，排名第1/157名，导师为倪冰冰教授。

News最新动态

[Jun. 2026] Two papers accepted to ECCV 2026.
[Apr. 2026] One paper accepted to ICML 2026.
[Mar. 2026] Two papers accepted to ACL 2026.
[Feb. 2026] Three papers accepted to CVPR 2026.
[Jan. 2026] Two papers accepted to ICLR 2026.
[Sep. 2025] One paper accepted to NeurIPS 2025.
[Jul. 2025] One paper accepted to ICCV 2025.
[Feb. 2025] One paper accepted to CVPR 2025.

[2026年6月] 两篇论文被 ECCV 2026 接收。
[2026年4月] 一篇论文被 ICML 2026 接收。
[2026年3月] 两篇论文被 ACL 2026 接收。
[2026年2月] 三篇论文被 CVPR 2026 接收。
[2026年1月] 两篇论文被 ICLR 2026 接收。
[2025年9月] 一篇论文被 NeurIPS 2025 接收。
[2025年7月] 一篇论文被 ICCV 2025 接收。
[2025年2月] 一篇论文被 CVPR 2025 接收。

Education教育背景

[2021 - 2025] Ph.D. at MMLab, The Chinese University of Hong Kong.
[2016 - 2020] B.Eng. in Information Engineering, Shanghai Jiao Tong University (Ranking: 1st/157).
[2019 - 2020] Visiting Scholar at CSAIL, Massachusetts Institute of Technology.

[2021 - 2025] 香港中文大学多媒体实验室（MMLab）博士。
[2016 - 2020] 上海交通大学信息工程专业工学学士（排名：第1/157名）。
[2019 - 2020] 麻省理工学院计算机科学与人工智能实验室（CSAIL）访问学者。

Publications学术论文

(* indicates equal contribution)(* 表示同等贡献)

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

B Wang, C Zhang, D Liu, J Zhang, J Chen, M Chen, Rongyao Fang, S Zhang, X Wang, Y Jing, Z Ma, Z Cui (alphabetical order)

Qwen Team Official Technical Report, 2026 [Paper]
SpecV: Specification Verification for Robust Unified Multimodal Evaluation

W Yu*, Rongyao Fang*, Y Cai*, L Huang, Y Yang, X Zhuang, J Lin, Y Yuan, S Bai

European Conference on Computer Vision (ECCV), 2026
Roam2Room: A Unified Floorplan-to-Furnished Framework for Controllable Indoor Scene Generation

W Li, Z Qin, X Ju, Rongyao Fang, H Li

European Conference on Computer Vision (ECCV), 2026
UniAR: Unified Multimodal Autoregressive Modeling with Shared Context

W Peng*, L Meng*, Y Cai, X Zhuang, Y Yang, Rongyao Fang, C Wu, J Lin, Z Wu, S Bai

International Conference on Machine Learning (ICML), 2026 [Paper]
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

W Shi*, A Yu*, Rongyao Fang*, H Ren, K Wang, A Zhou, C Tian, X Fu, Y Hu, Z Lu, L Huang, S Liu, R Liu, H Li

Annual Meeting of the Association for Computational Linguistics (ACL), 2026 [Paper] [Project]
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

K Sun, Rongyao Fang, C Duan, X Liu, X Liu

Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2026 [Paper]
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

C Duan*, K Sun*, Rongyao Fang*, M Zhang, Y Feng, Y Luo, Y Liu, K Wang, P Pei, X Cai, H Li, Y Ma, X Liu

Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026 [Paper] [Best Paper Award, 2nd Workshop on Knowledge-Intensive Multimodal Reasoning, CVPR 2026]
UniVerse: Empower Unified Generation with Reasoning and Knowledge

K Sun, W Jin, C Duan, Rongyao Fang, X Liu, Y Niu, C Wang, A Li, X Liu

Conference on Computer Vision and Pattern Recognition (CVPR), 2026 [Paper]
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

D Jiang, R Zhang, H Li, Z Zong, Z Guo, J He, C Guo, J Ye, Rongyao Fang, W Li, R Liu, H Li

Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026 [Paper]
Qwen3-VL Technical Report

Official Technical Report of Qwen3-VL, Alibaba Group, 2025 [Paper]
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Rongyao Fang, A Yu, C Duan, L Huang, S Bai, Y Cai, K Wang, S Liu, X Liu, H Li

International Conference on Learning Representations (ICLR), 2026 [Paper]
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

C Duan*, Rongyao Fang*, Y Wang*, K Wang, L Huang, X Zeng, H Li, X Liu

International Conference on Learning Representations (ICLR), 2026 [Paper]
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Rongyao Fang, C Duan, K Wang, L Huang, H Li, S Yan, H Tian, X Zeng, R Zhao, J Dai, X Liu, H Li

Conference on Neural Information Processing Systems (NeurIPS), 2025 [Paper]
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

S Yan, J Han, J Tsai, H Xue, Rongyao Fang, L Hong, Z Guo, R Zhang

arXiv preprint, 2025 [Paper]
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

X Chen, L Huang, T Ma, Rongyao Fang, S Shi, H Li

Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [Paper]
StreamChat: Chatting with Streaming Video

J Liu, Z Yu, S Lan, S Wang, Rongyao Fang, J Kautz, H Li, JM Alvare

arXiv preprint, 2024 [Paper]
Puma: Empowering Unified MLLM with Multi-Granular Visual Generation

Rongyao Fang, C Duan, K Wang, H Li, H Tian, X Zeng, R Zhao, J Dai, H Li, X Liu

International Conference on Computer Vision (ICCV), 2025 [Paper]
Mimic Before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

P Gao, Z Lin, R Zhang, Rongyao Fang, H Li, H Li, Y Qiao

International Journal of Computer Vision (IJCV), 2024 [Paper]
FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

Rongyao Fang, P Gao, A Zhou, Y Cai, S Liu, J Dai, H Li

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 [Paper]
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

L Huang*, Rongyao Fang*, A Zhang, G Song, S Liu, Y Liu, H Li

European Conference on Computer Vision (ECCV), 2024 [Paper]
Clip-Adapter: Better Vision-Language Models with Feature Adapters

P Gao, S Geng, R Zhang, T Ma, Rongyao Fang, Y Zhang, H Li, Y Qiao

International Journal of Computer Vision (IJCV), 2024 [Paper]
InstructSeq: Unifying Vision Tasks with Instruction-Conditioned Multi-Modal Sequence Generation

Rongyao Fang, S Yan, Z Huang, J Zhou, H Tian, J Dai, H Li

arXiv preprint, 2023 [Paper]
Point-M2AE: Multi-Scale Masked Autoencoders for Hierarchical Point Cloud Pre-Training

R Zhang, Z Guo, P Gao, Rongyao Fang, B Zhao, D Wang, Y Qiao, H Li

Conference on Neural Information Processing Systems (NeurIPS), 2022 [Paper]
RBGNet: Ray-Based Grouping for 3D Object Detection

H Wang, S Shi, Z Yang, Rongyao Fang, Q Qian, H Li, B Schiele, L Wang

Conference on Computer Vision and Pattern Recognition (CVPR), 2022 [Paper]
Tip-Adapter: Training-Free CLIP-Adapter for Better Vision-Language Modeling

R Zhang*, Rongyao Fang*, P Gao*, W Zhang, K Li, J Dai, Y Qiao, H Li

European Conference on Computer Vision (ECCV), 2022 [Paper]
Learning Longterm Representations for Person Re-Identification Using Radio Signals

L Fan*, T Li*, Rongyao Fang*, R Hristov, Y Yuan, D Katabi

Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [Paper]
Probabilistic Radiomics: Ambiguous Diagnosis with Controllable Shape Analysis

J Yang*, Rongyao Fang*, B Ni, Y Li, Y Xu, L Li

International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2019 [Paper]
Adversarial Attack and Defense on Point Sets

J Yang*, Q Zhang*, Rongyao Fang*, B Ni, J Liu, Q Tian

arXiv preprint, 2019 [Paper]

Experience工作经历

[Sept. 2025 - Present] Research Scientist, Alibaba Qwen VL Team.
[Feb. 2024 - Aug. 2025] Research Intern, SenseTime.
[Jun. 2022 - Apr. 2023] Research Intern, Shanghai AI Laboratory.

[2025年9月 - 至今] 阿里巴巴 Qwen VL 团队，研究员。
[2024年2月 - 2025年8月] 商汤科技，研究实习生。
[2022年6月 - 2023年4月] 上海人工智能实验室，研究实习生。

Selected Awards所获荣誉

[2021] Hong Kong PhD Fellowship.
[2020] Outstanding Graduates of Shanghai (Top 1%).
[2017 & 2018] National Scholarship (Top 1%).
[2017 & 2018] Zhiyuan College Honors Scholarship (Top 5%).

[2021] 香港政府博士奖学金（Hong Kong PhD Fellowship）。
[2020] 上海市优秀毕业生（前 1%）。
[2017 & 2018] 国家奖学金（前 1%）。
[2017 & 2018] 致远学院荣誉奖学金（前 5%）。