Skip to content

HKU IDS

Research Postgraduate Programme

DATA8006 - Advanced Data Mining (Foundation)

Course Instructor


Dr Chao HUANG 

Course Description 

Data mining is the automatic discovery of statistically interesting and potentially useful patterns from large amounts of data. The goal of the course is to study the main methods used today for data mining on various data (e.g., Web, Spatial, and Temporal data). These methods allow new scientific discoveries and intelligent business decisions be made. Topics include i) Data Mining Architecture; ii) Data Preprocessing; iii) On-Line Analytical Processing (OLAP); iv) Classification; v) Clustering; vi) Mining Association Rules; vii) Recommendation; viii) Advanced Data Mining Techniques, which are powerful tools for data analysts to process data and to extract from it interesting patterns and models. Additionally, Graph Representation Learning research has grown at an incredible pace in data mining and machine learning communities. This course will also cover recent core techniques and advances in graph representation research for modeling a variety of real-world applications and problems, including machine learning with graphs, heterogeneous graph mining, recommendation with graphs, and spatio-temporal graph learning. 

The course includes 3 hours of lectures (by the instructor) per week. Homework includes both written exercises and programming exercises. Depending on the instructor or the need, the course can be offered with midterm and final exams or with a course project (including midterm proposal and final presentation and report). The weighting of coursework and examination is subject to approval. 

Prerequisites 

Prior knowledge of undergraduate data structure, probability and programming may facilitate your understanding of the knowledge discovery process, which includes data collection, data cleaning, model building, model testing and evaluation. This course is intended for graduate students seeking to understand various data mining tasks and the principal algorithms for addressing those tasks.