HOGSA:Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Wentian Qu1,2
Jiahe Li1,2
Jian Cheng1,2
Jian Shi3,2
Chenyu Meng1,2
Cuixia Ma1,2
Hongan Wang1,2

1Institute of Software, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3Institute of Automation, Chinese Academy of Sciences
4Google


We propose a new 3DGS-based data augmentation framework for bimanual hand-object interaction to augment existing dataset with various hand-object pose and view points. Our method can improve the performance of the baselines, and achieve more accurate pose and contact.



Abstract

Understanding of bimanual hand-object interaction plays an important role in robotics and virtual reality. However, due to significant occlusions between hands and object as well as the high degree-of-freedom motions, it is challenging to collect and annotate a high-quality, large-scale dataset, which prevents further improvement of bimanual hand-object interaction-related baselines. In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. First, we use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used, we design a super-resolution module. Second, we extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction, which can significantly expand the pose distribution of the dataset. Third, we conduct an analysis for the impact of different aspects of the proposed data augmentation on the understanding of the bimanual hand-object interaction. We perform our data augmentation on two benchmarks, H2O and Arctic, and verify that our method can improve the performance of the baselines.




Overview

we propose a 3DGS-based data aug mentation framework Hand-Object Gaussian Splatting Augmentation (HOGSA) for bimanual hand-object interaction understanding. First, we use mesh-based 3DGS to model the hand and object based on the hand-object interaction images, which can efficiently synthesize interaction images with the input hand-object pose and viewpoints. Second, in order to enhance the pose diversity of the dataset, we use the pose optimization module to generate diverse poses of two hands and object to drive the hand-object Gaussian splatting model to render images of novel interaction poses. Third, in order to ensure the realism of the rendered images, we design the super-resolution module to improve the rendering quality of the coarse images generated by 3DGS. Finally, we combine our augmented dataset with the original dataset to refine the baseline of bimanual hand-object interaction, and conduct a systematic analysis of different aspects that affect interaction understanding accuracy in the augmented dataset.

Overview of our data augmentation framework for bimanual hand-object interaction. Based on the original dataset, we first establish mesh-based 3DGS models and input the original poses to pose optimization module to expand the diversity of interaction. The novel pose and 3DGS can be combined to render the low-quality image, which is then fed into the super resolution module to further enhance the realism. Based on the above modules, we can automatically build an expanded dataset and support model fine-tuning for the interaction understanding baseline to improve performance.
Examples of our HOGSA, which contains diverse interactive poses and ensures the realism of the images.
The augmented data we used to train the baseline. Compared with the original data, our images ensure realism and have various poses.



Paper and Code

W. Qu, J. Li, J. Cheng, J. Shi, C. Meng, C. Ma, H. Wang, X. Deng, Y. Zhang

HOGSA:Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

AAAI, 2025.

[Paper]     [Code, Coming Soon]     [Dataset, Coming Soon]    



Results

Qualitative results of our data augmentation method HOGSA on the baseline. After optimization, the model can cover a wider range of interactive poses and achievea more accurate estimation of the pose and contact area.
Ablation study on SRM. After using SRM, the realism of the image rendered by HOGS is significantly improved, especially the texture details of the object. This greatly reduces the gap between synthetic and real data.



Acknowledgements

This work was supported in part by National Science and Technology Major Project (2022ZD0119404), National Natural Science Foundation of China (62473356,62373061), Beijing Natural Science Foundation (L232028), CAS Major Project (RCJJ-145-24-14), Science and Technology Innovation Key R&D Program of Chongqing (CSTB2023TIAD STX0027), and Beijing Hospitals Authority Clinical Medicine Development of Special Funding Support No. ZLRK202330. The websiteis modified from this template.