HKU IDS Scholar

Professor Jinhong DU
Assistant Professor
HKU Musketeers Foundation Institute of Data Science and
Department of Statistics and Actuarial Science, School of Computing and Data Science, HKU
Key expertise
Statistics and Biostatistics
About me
Prof. Jinhong Du is an HKU-100 Assistant Professor at the University of Hong Kong, beginning in Fall 2025. He holds joint appointments in the HKU Musketeers Foundation Institute of Data Science, and the Department of Statistics and Actuarial Science, School of Computing and Data Science.
His research bridges statistical theory and high-impact applications, focusing on causal inference, interpretable machine learning, high-dimensional statistics, and statistical genomics. Dr. Du’s work has been published in leading venues across multiple disciplines, including premier statistics journals like the Journal of the American Statistical Association and the Journal of the Royal Statistical Society, Series B, top machine learning conferences like NeurIPS and ICML, and prominent scientific journals such as the Proceedings of the National Academy of Sciences.
He earned his Ph.D. in Statistics and Machine Learning from Carnegie Mellon University, an M.S. in Statistics from the University of Chicago, and a B.S. in Statistics from Sun Yat-sen University.
Current Research Project
Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce Disentangled Feature Importance (DFI), a nonparametric generalization of the classical R-squared decomposition via optimal transport. DFI transforms correlated features into independent latent variables using a transport map, eliminating correlation distortion. Importance is computed in this disentangled space and attributed back through the transport map’s sensitivity. DFI provides a principled decomposition of importance scores that sum to the total predictive variability for latent additive models and to interaction-weighted functional ANOVA variances more generally, under arbitrary feature dependencies.
We develop a comprehensive semiparametric theory for DFI. For general transport maps, we establish root-n consistency and asymptotic normality of importance estimators in the latent space, which extends to the original feature space for the Bures-Wasserstein map. Notably, our estimators achieve second-order estimation error. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.
We develop a comprehensive semiparametric theory for DFI. For general transport maps, we establish root-n consistency and asymptotic normality of importance estimators in the latent space, which extends to the original feature space for the Bures-Wasserstein map. Notably, our estimators achieve second-order estimation error. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.
Selected Publications
- Jin-Hong Du, Larry Wasserman, and Kathryn Roeder. “Simultaneous inference for generalized linear models with unmeasured confounders“. Journal of the American Statistical Association (2025)
- Yaoming Zhen, and Jin-Hong Du. “Network-based neighborhood regression“. Journal of the American Statistical Association (2025).
- Jin-Hong Du, Zhenghao Zeng, Edward H. Kennedy, Larry Wasserman, and Kathryn Roeder. “Causal inference for genomic data with multiple heterogeneous outcomes“. In: Journal of the American Statistical Association (2025)
- Jin-Hong Du, Pratik Patil, and Arun Kumar Kuchibhotla. “Subsample ridge ensembles: equivalences and generalized cross-validation“. In: Proceedings of the 40th International Conference on Machine Learning (oral) (2023).
- Jin-Hong Du, Zhanrui Cai, and Kathryn Roeder. “Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT“. In: Proceedings of the National Academy of Sciences (2022)
Research Interests
Causal inference, Interpretable machine learning, Statistical network analysis, High-dimensional statistics, Statistical genomics