Hi there! I am a research scientist at NVIDIA Research. I obtained my PhD at Max Planck Institute for Intelligent Systems and ETH Zürich, co-advised by Michael Black and Siyu Tang. Prior to that, I received my Master's degree in Optics and Photonics from Karlsruhe Institute of Technology and Bachelor's degree in Physics from Peking University.
My research uses machine learning to solve computer vision and graphics problems, with a current focus on generative modeling and reconstruction of dynamic 3D/4D scenes.
  Email  /  Google Scholar  /    Twitter  /    Github
@inproceedings{zhang2024degrees,
title={Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories},
author={Zhang, Yan and Prokudin, Sergey and Mihajlovic, Marko and Ma, Qianli and Tang, Siyu},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages = {2018-2028},
month = jun,
year={2024}
}
How to infer scene dynamics from sparse point trajectory observations? We show a simple yet effective solution using a spatiotemporal MLP with carefully designed regularizations. No need for scene-specific priors.
@inproceedings{prokudin2023dynamic,
title={Dynamic Point Fields},
author={Prokudin, Sergey and Ma, Qianli and Raafat, Maxime and Valentin, Julien and Tang, Siyu},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages = {7964--7976},
month = oct,
year={2023}
}
Explicit point-based representation + implicit deformation field = dynamic surface models with instant inference and high quality geometry. Robust single-scan animation of challenging clothing types even under extreme poses.
@inproceedings{zhang2023egohmr,
title = {Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views},
author = {Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages = {7989--8000},
month = oct,
year = {2023}
}
Generative human mesh recovery for images with body occlusion and truncations: scene-conditioned diffusion model + collision-guided sampling = accurate pose estimation on observed body parts and plausible generation of unobserved parts.
@inproceedings{SkiRT:3DV:2022,
title = {Neural Point-based Shape Modeling of Humans in Challenging Clothing},
author = {Ma, Qianli and Yang, Jinlong and Black, Michael J. and Tang, Siyu},
booktitle = {International Conference on 3D Vision (3DV)},
pages = {679--689},
month = sep,
year = {2022}
}
The power of point-based digital human representations further unleashed: SkiRT models dynamic shapes of 3D clothed humans including those that wear challenging outfits such as skirts and dresses.
@inproceedings{Egobody:ECCV:2022,
title = {{EgoBody}: Human Body Shape and Motion of Interacting People from Head-Mounted Devices},
author = {Zhang, Siwei and Ma, Qianli and Zhang, Yan and Qian, Zhiyin and Kwon, Taein and Pollefeys, Marc and Bogo, Federica and Tang, Siyu},
booktitle = {European Conference on Computer Vision (ECCV)},
month = oct,
year = {2022}
}
A large-scale dataset of accurate 3D body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.
@inproceedings{POP:ICCV:2021,
title = {The Power of Points for Modeling Humans in Clothing},
author = {Ma, Qianli and Yang, Jinlong and Tang, Siyu and Black, Michael J.},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages = {10974--10984},
month = oct,
year = {2021},
}
PoP — a point-based, unified model for multiple subjects and outfits that can turn a single, static 3D scan into an animatable avatar with natural pose-dependent clothing deformations.
@inproceedings{MetaAvatar:NeurIPS:2021,
title = {{MetaAvatar}: Learning Animatable Clothed Human Models from Few Depth Images},
author={Wang, Shaofei and Mihajlovic, Marko and Ma, Qianli and Geiger, Andreas and Tang, Siyu},
journal={Advances in Neural Information Processing Systems},
volume={34},
pages={2810--2822},
month=dec,
year={2021}
}
Creating an avatar of unseen subjects from as few as eight monocular depth images using a meta-learned, multi-subject, articulated, neural signed distance field model for clothed humans.
@inproceedings{SCALE:CVPR:2021,
title = {{SCALE}: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements},
author = {Ma, Qianli and Saito, Shunsuke and Yang, Jinlong and Tang, Siyu and Black, Michael J.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages = {16082-16093},
month = jun,
year = {2021},
}
Modeling pose-dependent shapes of clothed humans explicitly with hundreds of articulated surface elements: the clothing deforms naturally even in the presence of topological change.
@inproceedings{SCANimate:CVPR:2021,
title={{SCANimate}: Weakly Supervised Learning of Skinned Clothed Avatar Networks},
author={Saito, Shunsuke and Yang, Jinlong and Ma, Qianli and Black, Michael J},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={2886--2897},
month=jun,
year={2021}
}
Cycle-consistent implicit skinning fields + locally pose-aware implicit function = a fully animatable avatar with implicit surface from raw scans without surface registration.
@inproceedings{PLACE:3DV:2020,
title = {{PLACE}: Proximity Learning of Articulation and Contact in {3D} Environments},
author = {Zhang, Siwei and Zhang, Yan and Ma, Qianli and Black, Michael J. and Tang, Siyu},
booktitle = {International Conference on 3D Vision (3DV)},
pages = {642--651},
month = nov,
year = {2020}
}
An explicit representation for 3D person-scene contact relations that enables automated synthesis of realistic humans posed naturally in a given scene.
@inproceedings{CAPE:CVPR:20,
title = {Learning to Dress {3D} People in Generative Clothing},
author = {Ma, Qianli and Yang, Jinlong and Ranjan, Anurag and Pujades, Sergi and Pons-Moll, Gerard and Tang, Siyu and Black, Michael J.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={6468-6477},
month = jun,
year = {2020}
}
CAPE — a graph-CNN-based generative model and a large-scale dataset for 3D human meshes in clothing in varied poses and garment types.