Recurrent 3D Hand Pose Estimation Using
Cascaded Pose-guided 3D Alignments

Illustration of our cascaded pose-guided 3D alignments and feature extraction in 3D space. Although the point clouds of the same hand pose from different viewpoints (row 1 vs. row 2) and the point clouds of different hand poses (row 1 and row 2 vs. row 3) are quite different, they become similar for palm and finger parts using palm alignment and finger alignment (see col 3 and col 7), respectively. The hand shapes in the fourth and sixth column are only used for a clear illustration of alignment transformations.

Network Architecture

Illustration of our recurrent hand pose network using cascaded pose-guided alignments. We first convert the input hand foreground depth to point cloud. Then we adopt multiple recurrent iterations to estimate the 3D hand pose. Specifically, we introduce several LSTM modules among multiple palm stages to refine the hand pose. In each recurrent iteration, we adopt a multi-stage network (i.e. global, palm and finger stages) to predict hand joints by iterative pose regression and cascaded pose-guided 3D alignment, and we adopt the hand pose of the previous iteration to align the input point cloud of the current iteration. “PointNet Encoder” denotes the network before the last multi-layer perception (MLP) of PointNet++. “A_0,g” is the transformation via the estimated hand pose P_0,global of the global stage in the initial recurrent iteration, “A_t,p” are the transformations to align each finger via the estimated hand pose of the palm stage P_t,palm in the t-th recurrent iteration, and “A_t” is the transformation via the composited hand pose “P_t” of the palm stage and the finger stage in the t-th recurrent iteration. “⊗” denotes matrix multiplication.

This work was supported in part by the National Key R&D Program of China under Grant 2021YFF0307702, National Natural Science Foundation of China (No.~61473276), Beijing Natural Science Foundation (L182052), and the Distinguished Young Researcher Program, Institute of Software, Chinese Academy of Sciences. The websiteis modified from this template.

Abstract

Network Architecture

Video

Paper and Code

Acknowledgements