The CVNext lab focuses on advancing general-purpose embodied intelligence, building upon foundations in long video understanding and reasoning in dynamic, complex scenes. The core objective is to develop open, adaptive embodied agents that tightly integrate environment perception, interactive reasoning, and personalized adaptation and decision-making. Ultimately, the research aims to establish both theoretical frameworks and practical systems for general and domain-specific embodied agents, contributing to scalable, transferable, and real-world embodied AI. Our main research directions include:
- Interactive 3D Scene Reconstruction and Generation
- Unified World-Reasoning-Action Modeling for Embodied Agents
- Personalized Adaptation with Active Perception
Professor
Gaoang Wang [Web]
Assistant Professor
Office: C417, ZJUI Building
Email: gaoangwang@intl.zju.edu.cn
Research Interests:
- Visual Perception
- Transfer Learning
- Spatial Intelligence
- Embodied Intelligence
News:
Ph.D. Students
guanhongwang@zju.edu.cn
Multi-modality LearningVideo Understanding
Vision and Language
Zhonghan Zhao
zhaozhonghan@zju.edu.cn
Embodied AIReinforcement Learning
Incontext Learning
Chenlu Zhan
(Main Advisor: Hongwei Wang)
chenlu.22@intl.zju.edu.cn
Medical Vision LanguageMedical Multimodality
Visual-Language Pretraining
Wendi Hu
3200105651@zju.edu.cn
Multi-object Tracking
Kewei Wei
3200104125@zju.edu.cn
Multimodality Learning
Tielong Cai
tielong.22@intl.zju.edu.cn
Generative modelEmbodied AI
Master Students
Xuan Wang
xuanw@zju.edu.cn
Multi-modality LearningEmbodied AI
Fang Liang
3D VisionImage Reconstruction
Dongping Li
dongping.23@intl.zju.edu.cn
Multi-modality LearningActive Perception
Unified Model
Junsheng Huang
junsheng.24@intl.zju.edu.cn
3D VisionMulti-modality Learning
Tianci Tang
tianci_tang@tiu.edu.cn
Embodied AIDiffusion Model
Yizhi Li
yizhi.20@intl.zju.edu.cn
Multi-modality LearningComputer Vision
Xuexiang Wen
xuexiang.24@intl.zju.edu.cn
Multi-modality Learning
Jiawu Zhang
2540614031@qq.com
Multi-modal logistics large models
Bocheng Hu
bocheng.25@intl.zju.edu.cn
Motion GenerationVision–Language Models (VLMs)
Vision–Language–Action Models (VLAs)
Jie Cao
jie.25@intl.zju.edu.cn
Multi-modality Learning
Haonan Zhou
haonan1.25@intl.zju.edu.cn
3D Scene Generation
Xiaohan Chen
xiaohan.25@intl.zju.edu.cn
Multi-modality LearningLarge Language Models (LLMs)
Alumni
Shengyu Hao
shengyuhao@zju.edu.cn
Multi-object TrackingRepresentation Learning
Domain Adaptation
Xiaoyue Li
(Main Advisor: Mark Butala)
xiaoyue98@zju.edu.cn
Image GenerationImage Reconstruction
Medical Image Inverse Problems
Shidong Cao
shidong.22@intl.zju.edu.cn
Generative ModelsMulti-modality Learning
Graph Machine Learning
Meiqi Sun
meiqi.22@intl.zju.edu.cn
Animal Action RecognitionAnimal Pose Estimation
Xuechen Guo
xuechen.22@intl.zju.edu.cn
Computer VisionMulti-modality Learning
Jianshu Guo
jianshu.22@intl.zju.edu.cn
Diffusion ModelVision Language
Chang Su
changs.19@intl.zju.edu.cn
Smart City
Yichen Xu
Wenhao Chai (Alumni)[Web]
wchai@uw.edu
Multi-modality RepresentationUnified Perception Model
Embodied Intelligence