IDS Seminar - by Dr. Yuandong Tian from Meta
Mode: Hybrid. Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered.
Large Language Models (LLMs) have demonstrated remarkable efficacy across diverse applications, with the multi-layer Transformer architecture and self-attention playing a pivotal role. In this talk, we analyze the training dynamics of self-attention in 1-layer and multi-layer Transformer in a mathematically rigorous manner. This analysis characterizes the training dynamics of self-attention and how tokens are composed to form high-level latent patterns. Our theoretical insights are corroborated by extensive experimental evidence. Notably, one property called “contextual sparsity” enables us to develop novel approaches such as Deja Vu and H2O that substantially accelerate LLM inference. Finally, further study of the attention behavior yields positional interpolation (PI) that extends context window beyond pre-trained models with very few fine-tuning steps.
Dr. Yuandong Tian is a Research Scientist and Senior Manager in Meta AI Research (FAIR), working on reinforcement learning, optimization and understanding of neural networks. He has been the project lead for story generation (2023) and OpenGo project (2018). He is the first-author recipient of 2021 ICML Outstanding Paper Honorable Mentions and 2013 ICCV Marr Prize Honorable Mentions, and also received the 2022 CGO Distinguished Paper Award. Prior to that, he worked in Google Self-driving Car team in 2013-2014 and received a Ph.D in Robotics Institute, Carnegie Mellon University in 2013. He has been appointed as area chairs for NeurIPS, ICML, AAAI and AIStats.
Professor Yi Ma is a Chair Professor in the Musketeers Foundation Institute of Data Science (HKU IDS) and Department of Computer Science at the University of Hong Kong. He took up the Directorship of HKU IDS on January 12, 2023. He is also a Professor at the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He has published about 60 journal papers, 120 conference papers, and three textbooks in computer vision, generalized principal component analysis, and high-dimensional data analysis.
Professor Ma’s research interests cover computer vision, high-dimensional data analysis, and intelligent systems. For full biography of Professor Ma, please refer to: https://datascience.hku.hk/people/yi-ma/
For information, please contact: