HKU IDS Seminar: Controllable Creation with Multi-modal Cues

Speaker:
Ms Nanxuan (Cherry) Zhao
Research Scientist, Adobe Research
Nanxuan (Cherry) Zhao is a Research Scientist at Adobe Research, where she leads work on instruction-based image editing, a technology that has been adopted in Adobe products. Previously, she was an Assistant Professor in the Department of Computer Science at the University of Bath. Her research spans computer graphics, computer vision, and human–computer interaction, with a recent focus on data-driven content generation and editing, and graphic design. She is particularly interested in multi-modal control for customizing pixels, vector graphics, and 3D shapes. She has served the research community, such as (Lead) Area Chair for CVPR 2026, Poster Chair for Pacific Graphics 2026, Area Chair for 3DV 2026, Associate Editor for IEEE CG&A, and on the program committees for SIGGRAPH and SIGGRAPH Asia.
Abstract:
Modern generative models can synthesize images, graphics, and 3D content with remarkable fidelity, but turning human intent into precise control remains a central challenge. In this talk, I will discuss how we can steer creation and editing using a rich set of multi-modal cues, such as natural language instructions, visual examples, layout and vector constraints, and 3D priors, to achieve controllable, reusable, and user-aligned results.

