Skip to content

Overview

The HKU Musketeers Foundation Institute of Data Science (IDS) is the first of the ten research institutes in the University’s technological development blueprint for the next decade. Drawing on strengths in varied disciplines, the Institute aspires to connect our expertise and a University-wide interest in establishing critical mass in key areas, and to explore frontier research and applications in data science, computing, mathematics, and statistics. Researchers and their research postgraduate students recruited under the Data Science Cluster in the HKU-100 Recruitment Campaign launched in 2021 will be jointly appointed by both the Institute and various Departments from STEM and non-STEM disciplines.   

The Institute on the main HKU campus will serve as the hub for a nexus formed of branches, satellite centers, labs, and other research institutes on the Mainland – in particular in the Greater Bay Area – to connect and facilitate collaborations with local partners in industry, business, government sectors, and other research institutions where data is collected.

Our goal is to establish a world-class institute in this key area and to attract the best talents from across the globe to work with us. The Institute is home to newly recruited IDS researchers under the HKU-100 recruitment campaign and to students conducting their own academic research in data science. It aims to host, facilitate, and promote research in a wide range of data science through research funding support, multidisciplinary research programs, and a network of connections between the talent pool of data scientists at IDS and domain experts on campus.

What is Data Science?

Data science is a multidisciplinary field or approach that utilizes a fusion of analytical methods from machine learning, data mining, and statistics to extract insights from raw data. A typical data science pipeline involves:

  1. identifying and framing a real-world problem,
  2. collecting and processing data,
  3. designing and training models, and
  4. testing and deploying models.

Why Data Science? 

Living in the era of big data, we have all witnessed unprecedented and revolutionary changes driven rapidly by artificial intelligence (AI) and cloud computing. AI enhances our understanding of data profoundly, while cloud computing operates data efficiently. The science behind data-related technologies becomes a productive force driving the development of our civilization and society. The amount of data produced daily has been increasing exponentially. Data has become extremely valuable to society, meaning data science is no longer just a technical topic. Not only does it help companies provide better customer services and increase profits, but also it enhances the ease and efficiency of our daily lives. For example, data science can be applied to many real-world domains, including the following:

  1. Education: Using conversational agents as virtual academic advisors with course databases.
  2. Healthcare: Retrieving information from electronic health record databases about demographics, diagnoses, procedures, prescriptions, and laboratory tests.
  3. Public Policy: Promoting transparency, e.g., helping people gain instant messages and statistics about COVID-19 cases.
  4. Industry: Building AI-based customer relationship management systems for, e.g., product information, flight booking, and restaurant reservations.

Numerous new mathematical challenges have arisen accordingly, yet research on the related topics (e.g., algorithm design for machine learning and explainable mathematical theory) is still in its infancy. Computational mathematics is the infrastructure and baseline of the development of data science, with crucial topics including optimization, operational research, scientific computing, and so on. Research on computational mathematics for data science is both fundamental and cross-disciplinary in different applications.  We have the exciting opportunity to consider what we can explore to understand and make a better use of the knowledge in data science. The sky is the limit.