MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]
[Code]
[Dataset]
[Website]
MovieChat achieves state-of-the-art performace in long video understanding by introducing memory mechanism.
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu
Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]
MedM2G has the ability for unified conversion between medical images and text, text and images, as well as the unified generation of various medical modalities such as CT, MRI, and X-ray.
UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning
Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
Association for the Advancement of Artificial Intelligence (AAAI), 2024
[Paper]
[Code]
[Website]
UniAP, a novel Universal Animal Perception model that leverages few-shot learning to enable cross-species perception among various visual tasks.
Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models
Zhiyao Ren, Yibing Zhan, Liang Ding, Gaoang Wang, Chaoyue Wang, Zhongyi Fan, Dacheng Tao
Association for the Advancement of Artificial Intelligence (AAAI), 2024
[Paper]
We propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs.
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes
Shengyu Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
International Journal of Computer Vision (IJCV), 2023
[Paper]
[Dataset]
[Code]
A new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
Peihan Miao, Wei Su, Gaoang Wang, Xuewei Li, Xi Li
IEEE Transactions on Image Processing (TIP), 2023
[Paper]
We propose a Self-paced Multi-grained Cross-modal Interaction Modeling framework, which improves the language-to-vision localization ability through innovations in network structure and learning mechanism.
DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
IEEE Transactions on Multimedia (TMM), 2023
[Paper]
[Code]
We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
International Conference on Computer Vision (ICCV), 2023
[Website]
[Paper]
[Demo]
[Code]
We tackle introduce temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects.
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang
International Conference on Computer Vision (ICCV), 2023
[Paper]
[Code]
A simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation.
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
Longrong Yang, Xianpan Zhou, Xuewei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li
International Conference on Computer Vision (ICCV), 2023
[Paper]
A novel distillation method with cross-task consistent protocols, tailored for dense object detection.
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Enhanced 3D Human Pose Estimation
Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
ACM Multimedia (ACM MM), 2023
[Paper]
[Code]
PoSynDA offers a state-of-the-art domain adaptation solution for 3D pose estimation.
SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
International Joint Conference on Artificial Intelligence (IJCAI), 2023
[Paper]
As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation gives complete scene perception based on an ultra-wide angle of view.
Language Adaptive Weight Generation for Multi-task Visual Grounding
Wei Su, Peihan Miao, Huanzhang Dou, Gaoang Wang, Liang Qiao, Zheyang Li, Xi Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper]
Although the impressive performance in visual grounding, the prevailing approaches usually exploit the visual backbone in a passive way.
Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting
Gaoang Wang, Mingli Song
International Joint Conference on Artificial Intelligence (IJCAI), 2023
[Paper]
Human pose forecasting is a sequential modeling task that aims to predict future poses from historical motions.
Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection
Gaoang Wang, Yibing Zhan, Xinchao Wang, Mingli Song, Klara Nahrstedt
European Conference on Computer Vision (ECCV), 2022
[Paper]
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking
Gaoang Wang, Yizhou Wang, Renshu Gu, Weijie Hu, Jenq-Neng Hwang
IEEE Transactions on Multimedia (TMM), 2022
[Paper]
We propose a novel tracklet boosting model, consisting of a Splitter and a Connector, to directly address the temporal association errors that exist in almost all trackers in the MOT field.
Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking
Gaoang Wang, Renshu Gu, Zuozhu Liu, Weijie Hu, Mingli Song, Jenq-Neng Hwang
IEEE International Conference on Computer Vision (ICCV), 2021
[Paper]
[Code]
We try to explore the significance of motion patterns for vehicle tracking without appearance information.