The CVNext lab focuses on advancing general-purpose embodied intelligence, building upon foundations in long video understanding and reasoning in dynamic, complex scenes. The core objective is to develop open, adaptive embodied agents that tightly integrate environment perception, interactive reasoning, and personalized adaptation and decision-making. Ultimately, the research aims to establish both theoretical frameworks and practical systems for general and domain-specific embodied agents, contributing to scalable, transferable, and real-world embodied AI. Our main research directions include:
  • Interactive 3D Scene Reconstruction and Generation
  • Unified World-Reasoning-Action Modeling for Embodied Agents
  • Personalized Adaptation with Active Perception

Professor


Gaoang Wang [Web]

Assistant Professor

Office: C417, ZJUI Building
Email: gaoangwang@intl.zju.edu.cn

Research Interests:

  • Visual Perception
  • Transfer Learning
  • Spatial Intelligence
  • Embodied Intelligence


News:

  • [Nov. 2025] One paper was accepted by IJCV, 2025.
  • [Nov. 2025] Three papers were accepted by AAAI 2026, including one oral paper.
  • [Oct. 2025] We got an outstanding paper award in ICCV KnowledgeMR workshop.
  • [Aug. 2025] One paper was accepted by TPAMI, 2025.
  • [Jul. 2025] One paper was accepted by ECAI 2025.
  • [Jul. 2025] One paper was accepted by ICCV Findings Workshop, 2025.
  • [Jun. 2025] One paper was accepted by TIP, 2025.
  • [Jun. 2025] One paper was accepted by ICCV 2025.
  • [May 2025] One paper was accepted by Information Fusion, 2025.
  • [May 2025] One paper was accepted by ICML 2025.
  • [Apr. 2025] One paper was accepted by CVPR Workshop on Urban Scene Modeling, 2025.
  • [Mar. 2025] One paper was accepted by TVCG, 2025.
  • [Feb. 2025] One paper was accepted by TCSVT, 2025.
  • [Feb. 2025] One paper was accepted by CVPR 2025.
  • [Jan. 2025] One paper was accepted by MIA, 2025.
  • [Jan. 2025] One paper was accepted by TMM, 2025.
  • [Dec. 2024] Two papers were accepted by ICASSP 2025.
  • [Dec. 2024] One paper was accepted by AAAI 2025.
  • [Sep. 2024] One paper was accepted by NeurIPS 2024.
  • [Jul. 2024] One paper was accepted by MICCAI Workshop on Deep Generative Models, 2024.
  • [Jun. 2024] Two papers were accepted by ACM MM 2024.
  • [Jun. 2024] One paper was accepted by ECCV 2024.
  • [Jun. 2024] One paper was accepted by PRCV 2024.
  • [Apr. 2024] One paper was accepted by TMM, 2024.
  • [Mar. 2024] "Long-term Video Question Answering Competition (LOVEU@CVPR'24 Track 1)" was released. More details can be found here.
  • [Mar. 2024] One paper was accepted by ICLR Workshop on LLM Agents, 2024.
  • [Feb. 2024] Three papers were accepted by CVPR 2024.
  • [Dec. 2023] Two papers were accepted by ICASSP 2024.
  • [Dec. 2023] Two papers were accepted by AAAI 2024.
  • [Dec. 2023] One paper was accepted by Neurocomputing, 2023.
  • [Sep. 2023] One paper was accepted by IJCV, 2023.
  • [Sep. 2023] One paper was accepted by TMM, 2023.
  • [Aug. 2023] One paper was accepted by PRCV 2023.
  • [Jul. 2023] Three papers were accepted by ICCV 2023.
  • [Jun. 2023] One paper was accepted by MICCAI 2023.
  • [May 2023] One paper was accepted by Findings of ACL 2023.
  • [Apr. 2023] Two papers were accepted by IJCAI 2023.
  • [Apr. 2023] One paper was accepted by CVPR workshop, Computer Vision for Fashion, Art, and Design, 2023.
  • [Mar. 2023] One paper was accepted by ICME 2023.
  • [Mar. 2023] One paper was accepted by ICASSP 2023.
  • [Feb. 2023] One paper was accepted by CVPR 2023.
  • [Feb. 2023] One paper was accepted by TAI, 2023.
  • [Nov. 2022] One paper was accepted by TMI, 2022.
  • [Jul. 2022] One paper was accepted by ECCV 2022.
  • [Apr. 2022] One paper was accepted by CVPR workshop, the 2nd Workshop on Sketch-Oriented Deep Learning, 2022.
  • [Mar. 2022] One paper was accepted by ICME 2022.
  • [Jan. 2022] One paper was accepted by TMM, 2022.
  • [Aug. 2021] One paper was accepted by CVIU, 2021.
  • [Jul. 2021] One paper was accepted by ICCV 2021.
  • [Apr. 2021] One paper was accepted by CVPR workshop, the Workshop on Autonomous Driving, 2021.
  • [Jan. 2021] ROD2021 Challenge @ICMR 2021 was released.



  • Ph.D. Students


    Shengyu Hao
    shengyuhao@zju.edu.cn
    Multi-object Tracking
    Representation Learning
    Domain Adaptation
    guanhongwang@zju.edu.cn
    Multi-modality Learning
    Video Understanding
    Vision and Language
    Wenhao Hu [Web]
    whu@zju.edu.cn
    3D Vision
    Generative Models
    Anomaly Detection

    Zhonghan Zhao
    zhaozhonghan@zju.edu.cn
    Embodied AI
    Reinforcement Learning
    Incontext Learning
    Xiaoyue Li
    (Main Advisor: Mark Butala)
    xiaoyue98@zju.edu.cn
    Image Generation
    Image Reconstruction
    Medical Image Inverse Problems
    Chenlu Zhan
    (Main Advisor: Hongwei Wang)
    chenlu.22@intl.zju.edu.cn
    Medical Vision Language
    Medical Multimodality
    Visual-Language Pretraining

    Wendi Hu
    3200105651@zju.edu.cn
    Multi-object Tracking
    Kewei Wei
    3200104125@zju.edu.cn
    Multimodality Learning
    Ke Ma
    (Main Advisor: Hongwei Wang)
    kema@berkeley.edu
    Generative Models
    3D Vision

    Tielong Cai
    tielong.22@intl.zju.edu.cn
    Generative model
    Embodied AI



    Master Students


    Shidong Cao
    shidong.22@intl.zju.edu.cn
    Generative Models
    Multi-modality Learning
    Graph Machine Learning
    Yichen Ouyang [Web]
    22271110@zju.edu.cn
    Generative Models
    3D Vision
    Multi-modality Learning
    Meiqi Sun
    meiqi.22@intl.zju.edu.cn
    Animal Action Recognition
    Animal Pose Estimation

    Xuechen Guo
    xuechen.22@intl.zju.edu.cn
    Computer Vision
    Multi-modality Learning
    Jianshu Guo
    jianshu.22@intl.zju.edu.cn
    Diffusion Model
    Vision Language
    Chang Su
    changs.19@intl.zju.edu.cn
    Smart City

    Enxin Song [Web]
    enxin.23@intl.zju.edu.cn
    Video Understanding
    Image Generation
    Xuan Wang
    xuanw@zju.edu.cn
    Multi-modality Learning
    Embodied AI
    Fang Liang
    3D Vision
    Image Reconstruction

    Dongping Li
    dongping.23@intl.zju.edu.cn
    Multi-modality Learning
    Active Perception
    Unified Model
    Junsheng Huang
    junsheng.24@intl.zju.edu.cn
    3D Vision
    Multi-modality Learning
    Tianci Tang
    tianci_tang@tiu.edu.cn
    Embodied AI
    Diffusion Model
    Yizhi Li
    yizhi.20@intl.zju.edu.cn
    Multi-modality Learning
    Computer Vision
    Xuexiang Wen
    xuexiang.24@intl.zju.edu.cn
    Multi-modality Learning
    Jiawu Zhang
    2540614031@qq.com
    Multi-modal logistics large models



    Research Assistants


    Jie Deng
    dengj325@gmail.com
    3D Scene Generation



    Undergraduate


    Yichen Xu
    Wenhao Chai (Alumni)[Web]
    wchai@uw.edu
    Multi-modality Representation
    Unified Perception Model
    Embodied Intelligence