I am Hengxu Yu, a final-year Ph.D. student in Data Science at the Chinese University of Hong Kong, Shenzhen. I am fortunate to be advised by Prof. Xiao Li. My research interests lie in the field of optimization, particularly in the design and analysis of algorithms for training large language models.
Research Interests
My research focuses on advancing the theoretical understanding of optimization methods’ efficiency and reliability, with the ultimate goal of designing more fast and robust algorithms. These innovations have wide-ranging applications, with a particular emphasis on enhancing the training process for large language models.
Publications
Preprint
-
Yu, H., & Li, X. (2023). High Probability Guarantees for Random Reshuffling. arXiv:2311.11841. (under revision for SIAM Journal on Optimization)
We established a set of high probability finite-time complexity guarantees for RR, including finding a stationary point, designing a stopping criterion that yields the last iterate result, and reaching second-order stationary points.
-
Luo, Q., Yu, H., & Li, X. (2024). BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. arXiv:2404.02827. (NeurIPS 2024)
BAdam is a block coordinate descent method embedded with Adam as the inner solver for fine-tuning large language models. It is memory efficient and allows training Llama 3-8B using a single RTX3090-24GB GPU.
-
Random Reshuffling Escapes Saddle Points Efficiently. (in working)
By deriving a novel concentration inequality for matrix products under sampling without replacement, which may be of broader interest, we prove that introducing a simple perturbation at each epoch enables RR to find second-order stationary points with the same complexity as that for first-order stationary points. Furthermore, this complexity is achieved with high probability, characterizing the performance of a single execution of RR.
Skills
Coding: Python (PyTorch and JAX), MATLAB, LaTeX, Git, Linux.
Education
- 2020 – 2026 (expected) Ph.D., Data Science, The Chinese University of Hong Kong, Shenzhen.
- 2016 – 2020 B.S., Mathematics, Lanzhou University.
Teaching Experience
- Spring and Fall 2023: Teaching Assistant, MAT3007: Optimization, CUHK-SZ.
- Spring 2022: Teaching Assistant, DDA4250: Mathematical Introduction to Deep Learning, CUHK-SZ.
- Fall 2021: Teaching Assistant, CSC3001: Discrete Mathematics, CUHK-SZ.
Awards
- 2020, 2021: SRIBD PhD Fellowship, Shenzhen Research Institute of Big Data.
- 2018: Third Prize in the National Finals, The Chinese Mathematics Competition.
- 2017, 2018: Outstanding Student Prize, Lanzhou University.