 
   
    
      MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
    
    
     Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
    
    
    Computer Vision and Pattern Recognition (CVPR), 2024
    
    
    [Paper]
    [Code]
    [Dataset]
    [Website]
    
    
    
      MovieChat achieves state-of-the-art performace in long video understanding by introducing memory mechanism.
    
  
 
     
      
        MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
      
      
      Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu
      
      
      Computer Vision and Pattern Recognition (CVPR), 2024
      
      
      [Paper]
      
      
        MedM2G has the ability for unified conversion between medical images and text, text and images, as well as the unified generation of various medical modalities such as CT, MRI, and X-ray.
    
 
   
    
      UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning
    
    
    Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
    
    
    Association for the Advancement of Artificial Intelligence (AAAI), 2024
    
    
    [Paper]
    [Code]
    [Website]
    
    
    
      UniAP, a novel Universal Animal Perception model that leverages few-shot learning to enable cross-species perception among various visual tasks.
    
  
 
   
    
      Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models
    
    
    Zhiyao Ren, Yibing Zhan, Liang Ding, Gaoang Wang, Chaoyue Wang, Zhongyi Fan, Dacheng Tao
    
    
    Association for the Advancement of Artificial Intelligence (AAAI), 2024
    
    
    [Paper]
    
    
    We propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs.
    
  
 
   
    
      DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes
    
    
    Shengyu Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
    
    
    International Journal of Computer Vision (IJCV), 2023
    
    
    [Paper]
    [Dataset]
    [Code]
    
    
    
      A new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
    
  
 
     
      
        Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
      
      
      Peihan Miao, Wei Su, Gaoang Wang, Xuewei Li, Xi Li
      
      
      IEEE Transactions on Image Processing (TIP), 2023
      
      
      [Paper]
      
      
        We propose a Self-paced Multi-grained Cross-modal Interaction Modeling framework, which improves the language-to-vision localization ability through innovations in network structure and learning mechanism.
    
 
   
    
      DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
    
    
    Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
    
    
    IEEE Transactions on Multimedia (TMM), 2023
    
    
    [Paper]
    [Code]
    
    
    
      We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image. 
    
  
 
   
    
    StableVideo: Text-driven Consistency-aware Diffusion Video Editing
    
    
    Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
    
    
    International Conference on Computer Vision (ICCV), 2023
    
    
    [Website]
    [Paper]
   	[Demo]
    [Code]
    
    
    
    We tackle introduce temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects.
    
  
 
   
    
    Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
    
    
    Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang
    
    
    International Conference on Computer Vision (ICCV), 2023
    
    
    [Paper]
    [Code]
    
    
    
    A simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation.
    
  
 
     
      
        Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
      
      
      Longrong Yang, Xianpan Zhou, Xuewei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li
      
      
      International Conference on Computer Vision (ICCV), 2023
      
      
      [Paper]
      
      
        A novel distillation method with cross-task consistent protocols, tailored for dense object detection.
    
 
   
    
    PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Enhanced 3D Human Pose Estimation
    
    
    Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie 
    
    
    ACM Multimedia (ACM MM), 2023
    
    
    [Paper]
    [Code]
    
    
    
    PoSynDA offers a state-of-the-art domain adaptation solution for 3D pose estimation.
    
  
 
     
      
        SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
      
      
      Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
      
      
      International Joint Conference on Artificial Intelligence (IJCAI), 2023
      
      
      [Paper]
      
      
        As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation gives complete scene perception based on an ultra-wide angle of view.
    
 
     
      
        Language Adaptive Weight Generation for Multi-task Visual Grounding
      
      
      Wei Su, Peihan Miao, Huanzhang Dou, Gaoang Wang, Liang Qiao, Zheyang Li, Xi Li
      
      
      IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
      
      
      [Paper]
      
      
        Although the impressive performance in visual grounding, the prevailing approaches usually exploit the visual backbone in a passive way.
 
     
      
        Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting
      
      
      Gaoang Wang, Mingli Song
      
      
      International Joint Conference on Artificial Intelligence (IJCAI), 2023
      
      
      [Paper]
      
      
        Human pose forecasting is a sequential modeling task that aims to predict future poses from historical motions.
 
     
      
        Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection
      
      
      Gaoang Wang, Yibing Zhan, Xinchao Wang, Mingli Song, Klara Nahrstedt
      
      
      European Conference on Computer Vision (ECCV), 2022
      
      
      [Paper]
      
      
        Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies. 
    
 
     
      
        Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking
      
      
      Gaoang Wang, Yizhou Wang, Renshu Gu, Weijie Hu, Jenq-Neng Hwang
      
      
      IEEE Transactions on Multimedia (TMM), 2022
      
      
      [Paper]
      
      
        We propose a novel tracklet boosting model, consisting of a Splitter and a Connector, to directly address the temporal association errors that exist in almost all trackers in the MOT field.
    
 
     
      
        Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking
      
      
      Gaoang Wang, Renshu Gu, Zuozhu Liu, Weijie Hu, Mingli Song, Jenq-Neng Hwang
      
      
      IEEE International Conference on Computer Vision (ICCV), 2021
      
      
      [Paper]
      [Code]
      
      
      
        We try to explore the significance of motion patterns for vehicle tracking without appearance information.