Skip to content

IDS Interdisciplinary Seminar - by Professor Ping LUO

Title: Unveiling Large Language Models for Visual Perception, Generation, Interaction, and Beyond

Speaker: Professor Ping LUO
Associate Director (Innovation and outreach), HKU IDS
Associate Professor, Dept of CS, HKU
Date: June 26, 2024
Time: 3:00pm – 4:00pm 
Venue: IDS Seminar Room, P603, Graduate House / Zoom
Mode: Hybrid. Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered.


This presentation is divided into three parts. Firstly, we will go through a series of advancements that have redefined the landscape of image and video generation, such as GenTron (CVPR’24), Video DiT (ICLR’24), and MotionCtrl (SIGGRAPH’24) and the PixArt series—PixArt-alpha (ICLR’24), PixArt-delta (arXiv:2401.05252), and PixArt-sigma (arXiv:2403.04692). Secondly, we introduce how to unify perception and generation capacity in a single multimodal LLM, such as the instance-level Vision Language Models for image understanding and generation tasks (RegionGPT, CVPR’24). These models are distinct from traditional multimodal LLMs fine-tuned with image-text pairs, which often face challenges in achieving detailed instance-level visual concepts. Thirdly, building on the success of large multimodal models in high-level understanding, we design a multimodal code generation framework, RoboCodeX (ICML’24), crafted to convert task plans into precise robotic actions, ensuring adaptability across diverse scenarios. Our approach seeks to seamlessly integrate high-level cognitive processing with practical robotic applications, paving the way for enhanced robotic autonomy and versatility.


Professor Ping LUO
Associate Professor @ The University of Hong Kong
Professor Ping Luo’s researches aim at 1) developing Differentiable/ Meta/ Reinforcement Learning algorithms that endow machines and devices to solve complex tasks with larger autonomy, 2) understanding foundations of deep learning algorithms, and 3) enabling applications in Computer Vision and Artificial Intelligence. Professor Ping Luo received his PhD degree in 2014 in Information Engineering, the Chinese University of Hong Kong (CUHK), supervised by Prof. Xiaoou Tang (founder of SenseTime Group Ltd.) and Prof. Xiaogang Wang. He was a Research Director in SenseTime Research. He has published 70+ peer-reviewed articles (including 20 first author papers) in top-tier conferences and journals such as TPAMI, IJCV, ICML, ICLR, NeurIPS and CVPR. He has won a number of competitions and awards such as the first runner up in 2014 ImageNet ILSVRC Challenge, the first place in 2017 DAVIS Challenge on Video Object Segmentation, Gold medal in 2017 Youtube‐8M Video Classification Challenge, the first place in 2018 Drivable Area Segmentation Challenge for Autonomous Driving, 2011 HK PhD Fellow Award, and 2013 Microsoft Research Fellow Award (ten PhDs in Asia).

For information, please contact: