IDS Interdisciplinary Seminar - by Professor Ping LUO
Title: Unveiling Large Language Models for Visual Perception, Generation, Interaction, and Beyond
Speaker: Professor Ping LUO
Associate Director (Innovation and outreach), HKU IDS
Associate Professor, Dept of CS, HKU
Date: June 26, 2024
Time: 3:00pm – 4:00pm
Venue: IDS Seminar Room, P603, Graduate House / Zoom
Mode: Hybrid. Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered.
Abstract
This presentation is divided into three parts. Firstly, we
will go through a series of advancements that have redefined the
landscape of image and video generation, such as GenTron (CVPR’24),
Video DiT (ICLR’24), and MotionCtrl (SIGGRAPH’24) and the PixArt
series—PixArt-alpha (ICLR’24), PixArt-delta (arXiv:2401.05252), and
PixArt-sigma (arXiv:2403.04692). Secondly, we introduce how to unify
perception and generation capacity in a single multimodal LLM, such as
the instance-level Vision Language Models for image understanding and
generation tasks (RegionGPT, CVPR’24). These models are distinct from
traditional multimodal LLMs fine-tuned with image-text pairs, which
often face challenges in achieving detailed instance-level visual
concepts. Thirdly, building on the success of large multimodal models
in high-level understanding, we design a multimodal code generation
framework, RoboCodeX (ICML’24), crafted to convert task plans into
precise robotic actions, ensuring adaptability across diverse
scenarios. Our approach seeks to seamlessly integrate high-level
cognitive processing with practical robotic applications, paving the
way for enhanced robotic autonomy and versatility.
Speaker
Professor Ping LUO
Associate Professor @ The University of Hong Kong
Professor Ping Luo’s researches aim at 1) developing Differentiable/ Meta/ Reinforcement Learning algorithms that endow machines and devices to solve complex tasks with larger autonomy, 2) understanding foundations of deep learning algorithms, and 3) enabling applications in Computer Vision and Artificial Intelligence. Professor Ping Luo received his PhD degree in 2014 in Information Engineering, the Chinese University of Hong Kong (CUHK), supervised by Prof. Xiaoou Tang (founder of SenseTime Group Ltd.) and Prof. Xiaogang Wang. He was a Research Director in SenseTime Research. He has published 70+ peer-reviewed articles (including 20 first author papers) in top-tier conferences and journals such as TPAMI, IJCV, ICML, ICLR, NeurIPS and CVPR. He has won a number of competitions and awards such as the first runner up in 2014 ImageNet ILSVRC Challenge, the first place in 2017 DAVIS Challenge on Video Object Segmentation, Gold medal in 2017 Youtube‐8M Video Classification Challenge, the first place in 2018 Drivable Area Segmentation Challenge for Autonomous Driving, 2011 HK PhD Fellow Award, and 2013 Microsoft Research Fellow Award (ten PhDs in Asia).
For information, please contact:
Email: datascience@hku.hk
- June 12, 2024
- Events, What's New
- IDS Interdisciplinary Seminar Series