(*: indicates joint first authors)
Yan Zhao 1 Hao Dong1Articulated objects (\emph{e.g.}, doors and drawers) exist everywhere in our life. Different from rigid objects, articulated objects have higher degrees of freedom and are rich in geometries, semantics, and part functions. Modelling different kinds of parts and articulations with neural networks plays an essential role in articulated object understanding and manipulation, and will further benefit 3D vision and robotics communities. To model articulated objects, most previous works directly encode articulated objects into a latent space without explicit and interpretable representation. % representations, without specific designs for parts, articulations and part motions. To provide interpretable representation for articulated object modelling, in this paper, we introduce a novel framework that explicitly disentangles the part motion of articulated objects by predicting the movements of articulated parts. We utilise spatially continuous neural implicit representations to model the part motion smoothly in the space, and we for the first time achieve few-shot generalisation on novel object categories and different joint motions (\emph{e.g.}, rotation and displacement over different axis).
Figure 1. There are a plethora of 3D objects around us in the real world. Compared to those rigid objects with only 6 degrees of freedom (DoF), articulated objects (\emph{e.g.}, doors and drawers) additionally contain semantically and functionally important articulated parts (\emph{e.g.}, the screen of laptops), resulting in their higher DoFs in state space, and more complicated geometries and functions. Therefore, understanding and representing articulated objects with diverse geometries and functions is an essential but challenging task for 3D computer vision. |
Figure 2. Overview of our proposed framework Our proposed framework receives two point clouds from the same articulated object under two different part poses. Then generate the object point cloud with a new part pose. It aggregates the geometric information and the pose information into a spatially continuous Transformation Grid. During inferencing, conditioned on the new part pose, it decodes the transformation of each point by querying each point in the Grid to generate the input object with the novel pose. |
Figure 3. Qualitative results |
If you have any questions, please feel free to contact Ruihai Wu at wuruihai_at_pku_edu_cn and Yushi Du at yushidu_at_icloud_com.