Benjamin Busam is Professor and Director of Photogrammetry and Remote Sensing at the Technical University of Munich. He previously served as Head of Research at FRAMOS Imaging Systems and led the 3D Computer Vision Team at Huawei Research London (2018–2020). Benjamin studied Mathematics in Munich, Paris, and Melbourne before earning a PhD in Computer Science. He regularly serves on the programme committees of leading computer vision and robotics conferences including CVPR, ICCV, ECCV, IROS, and ICRA. His team’s research centres on 3D computer vision and robotics for embodied AI, with a focus on multi-modal sensor fusion, self- and physics-supervised learning, and robust perception of photometrically challenging objects.
Achieving human-like 3D perception requires unifying geometry, appearance, and semantics. This talk explores advances in category-level object pose estimation - a necessary step toward generalizable 3D understanding beyond controlled lab setups. We discuss how systems can infer object pose under real-world variations in lighting, texture, and occlusion by fusing geometric reasoning with photometric and semantic cues from RGB-D data, inspired by 3D reconstruction and registration pipelines. Leveraging global category priors and semantic shape reconstruction, we outline a path toward physically grounded, data-driven perception where machines can perceive and reason about the 3D world with human-like fluency.

He Wang specializes in 3D computer vision, robotics, and machine learning with a focus on object manipulation and interaction in dynamic environments. His work includes developing algorithms for generalizable transparent object reconstruction and 6-DoF grasp detection, making him an ideal speaker to discuss innovations in category-level pose estimation where adaptability and generalizability are crucial.
Hyung Jin Chang is known for his work in machine learning and computer vision with applications to dynamic scene understanding and motion analysis. His research contributes directly to enhancing algorithms for category-level pose estimation by focusing on adaptability and accuracy in changing environments. He is also the author of the state-of-the-art category-level pose estimation method HS-Pose.

Muhammad Zubair Irshad specializes in robotics and 3D perception, particularly in 3D robotics perception using inductive priors. His work on omni-scene reconstruction can bring attendees insights on understanding object poses in complex scenarios with single RGB images.
Taeyeop Lee is a Postdoctoral Researcher at KAIST. His research focuses on Physical AI—integrating computer vision and robotics. He has collaborated with NVIDIA Research and Adobe Research. His recent works, including Any6D, DeLTa, and GraspClutter6D, push the boundaries of 6D object pose estimation and robot manipulation in complex, real-world environments.
This talk explores the evolution from manufacturing manipulation to physical-level manipulation in robotics. While manufacturing systems achieve high precision, accuracy, and proven safety, they lack generalization capabilities and remain limited to single-task operations. Our goal is to develop Physical AI that combines manufacturing-level precision and safety with general-purpose generalization for multi-task manipulation.
Xiaolong Wang is recognized for his contributions to computer vision, machine learning, and robotics, with specific expertise in learning visual representations that connect image understanding to 3D structure and robotics. His research on sim-to-real generalizable feature learning and 6D pose estimation aligns well with the workshop's theme of enhancing category-level pose estimation in the wild.

Yan Xu's research intersects computer vision, robotics, and embodied AI, focusing on actionable representation learning from natural data, which is pivotal for autonomous systems like those used in category-level pose estimation. His work on object pose estimation and object localization is highly relevant for developing accurate and scalable pose estimation algorithms.