HKU IDS Scholar Seminar Series #24: Unlocking Interpretable Control for Large Language Models and Beyond via Sparse Autoencoders - HKU Musketeers Foundation Institute of Data Science

HKU IDS Scholar Seminar Series #24:

Unlocking Interpretable Control for Large Language Models and Beyond via Sparse Autoencoders

Speaker

Prof Difan Zou, Assistant Professor, HKU IDS & Department of Computer Science, School of Computing and Data Science, HKU

Date

Mar 20, 2026 (Fri)

Time

11:00am – 12:00nn

Venue

Tam Wing Fan Innovation Wing Two | Zoom

Mode

Hybrid. Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered.

Abstract

Interpretability is the foundation of trustworthy and controllable large language model (LLM) deployment, yet unlocking actionable insights from black-box models remains a critical challenge. Sparse Autoencoders (SAEs) have emerged as a transformative solution decomposing dense LLM hidden states into sparse, human-understandable features that reveal the model’s internal mechanisms.

In this talk, I will discuss how SAEs enable genuine interpretability by isolating semantically coherent latent features, offering a clear window into which internal representations correspond to specific concepts. I will show core applications of this interpretability: targeted model unlearning that precisely removes unwanted knowledge without sacrificing general capability, controlled model steering that refines behavior through meaningful feature interventions, and extensions to emerging architectures like diffusion language models (DLMs) where SAEs unlock new interpretive and manipulative possibilities. I will also share key considerations for balancing interpretability, utility, and robustness, and outline how SAE-driven insights are turning theoretical interpretability into a practical tool for building more reliable, responsible AI systems. Join to explore how SAEs are redefining what it means to understand and control LLMs.

Speaker

Prof Difan ZOU

Assistant Professor @ HKU IDS & SCDS

Professor Difan Zou is an Assistant Professor in HKU IDS & Computer Science, School of Computing and Data Science, at The University of Hong Kong. He received his Ph.D. in Computer Science, University of California, Los Angeles (UCLA). He received a B. S degree in Applied Physics, from School of Gifted Young, USTC and a M. S degree in Electrical Engineering from USTC. He has published multiple papers on top-tier machine learning conferences including ICML, NeurIPS, ICLR, COLT, etc. He is a recipient of Bloomberg Data Science Ph.D. fellowship. His research interests are broadly in machine learning, optimization, and learning structured data (e.g., time-series or graph data), with a focus on theoretical understanding of the optimization and generalization in deep learning problems.

For full biography of Prof. ZOU, please refer to: https://datascience.hku.hk/people/difan-zou/

Moderator

Prof Guodong LI

Associate Head (Research) & Professor @ HKU IDS & SCDS

Professor Guodong Li joined the Department of Statistics & Actuarial Science, The University of Hong Kong, in 2009 as an Assistant Professor, and currently is a Professor. Prior to this, Professor Li had worked at the Division of Mathematical Sciences, Nanyang Technological University, Singapore, as an Assistant Professor since he received his PhD degree in statistics from the University of Hong Kong in 2007. He got his Bachelor and Master degrees in Statistics from Peking University.

For full biography of Prof. Li, please refer to: https://datascience.hku.hk/people/professor-guodong-li/