Skip to content

HKU IDS Scholar Seminar Series #10: Why Larger Language Models Do In-context Learning Differently?

Title: Why Larger Language Models Do In-context Learning Differently?
Speaker: Prof. Yingyu LIANG, Associate Professor, IDS and Department of Computer Science, HKU
Date: July 11, 2024
Time: 10:30am – 11:30am

Venue: IDS Seminar Room, P603, Graduate House / Zoom
Mode: Hybrid. Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered. 

Abstract

Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend to be more sensitive to noise in the test context. This work studies this observation theoretically aiming to improve the understanding of LLM and ICL. We analyze two stylized settings: (1) linear regression with one-layer single-head linear transformers and (2) parity classification with two-layer multiple attention heads transformers (non-linear data and non-linear model). In both settings, we give closed-form optimal solutions and find that smaller models emphasize important hidden features while larger ones cover more hidden features; thus, smaller models are more robust to noise while larger ones are more easily distracted, leading to different ICL behaviors. This sheds light on where transformers pay attention to and how that affects ICL. Preliminary experimental results on large base and chat models provide positive support for our analysis. This joint work with Zhenmei Shi, Junyi Wei, and Zhuoyan Xu will appear in ICML’24.

Speaker

Prof. Yingyu LIANG
Associate Professor @ HKU IDS & Department of Computer Science
Prof. Yingyu Liang is an Associate Professor in the Musketeers Foundation Institute of Data Science and Department of Computer Science at The University of Hong Kong. He is also an Associate Professor at the Department of Computer Sciences at the University of Wisconsin-Madison. Before that, he was a postdoc at Princeton University. He received his Ph.D. in 2014 from Georgia Tech, and M.S. (2010) and B.S. (2008) from Tsinghua University. He is a recipient of the NSF CAREER award. His research group aims at providing theoretical foundations for modern machine learning models and designing efficient algorithms for real world applications. Recent focuses include optimization and generalization in deep learning, robust machine learning, and their applications. For full biography of Prof. Liang, please refer to: https://datascience.hku.hk/people/yingyu-liang/

Moderator

Prof. Yi Ma
Director; Professor, Chair of Artificial Intelligence @ HKU IDS & Department of Computer Science 

Professor Yi Ma is a Chair Professor in the Musketeers Foundation Institute of Data Science (HKU IDS) and Department of Computer Science at the University of Hong Kong. He took up the Directorship of HKU IDS on January 12, 2023. He is also a Professor at the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He has published about 60 journal papers, 120 conference papers, and three textbooks in computer vision, generalized principal component analysis, and high-dimensional data analysis. 

Professor Ma’s research interests cover computer vision, high-dimensional data analysis, and intelligent systems. For full biography of Professor Ma, please refer to: https://datascience.hku.hk/people/yi-ma/

For information, please contact:
Email: datascience@hku.hk