Abstract and 1. Introduction
Related Work
2.1. Motion Reconstruction from Sparse Input
2.2. Human Motion Generation
SAGE: Stratified Avatar Generation and 3.1. Problem Statement and Notation
3.2. Disentangled Motion Representation
3.3. Stratified Motion Diffusion
3.4. Implementation Details
Experiments and Evaluation Metrics
4.1. Dataset and Evaluation Metrics
4.2. Quantitative and Qualitative Results
4.3. Ablation Study
Conclusion and References
\ Supplementary Material
A. Extra Ablation Studies
B. Implementation Details
We train and evaluate our method on AMASS [25], which unifies multiple motion capture datasets [2, 4, 6, 12, 23, 26, 28, 37, 41–43] as SMPL [22] representations.
\ We report several metrics for evaluations and comparisons: mean per joint rotation error (MPJRE) and mean per joint position error (MPJPE) for measuring the average relative rotation and position error across all joints respectively, as well as the average position error of the root joints (Root PE), hand joints (Hand PE), upper-body joints (Upper PE), and lower-body joints (Lower PE).
\ Besides the above reconstruction accuracy, we also evaluate the spatial and temporal consistency of the generated sequences, as it significantly contributes to the visual quality. Specifically, we calculate the mean per joint velocity error (MPJVE) and Jitter, where MPJVE measures the average velocity error of all body joints, and Jitter quantifies the average jerk (time derivative of acceleration) of all body joints. In both cases, lower values indicate better results.
\
:::info Authors:
(1) Han Feng, equal contributions, ordered by alphabet from Wuhan University;
(2) Wenchao Ma, equal contributions, ordered by alphabet from Pennsylvania State University;
(3) Quankai Gao, University of Southern California;
(4) Xianwei Zheng, Wuhan University;
(5) Nan Xue, Ant Group (xuenan@ieee.org);
(6) Huijuan Xu, Pennsylvania State University.
:::
:::info This paper is available on arxiv under CC BY 4.0 DEED license.
:::
\


