Hi, my name is Zizhao Mo (莫梓钊). I received my Ph.D. degree in Computer Science from the University of Macau, where I was fortunate to be advised by Prof. Huanle Xu at Cloud and Distributed Systems Lab.

My research interests broadly revolve around machine learning systems (MLSys), with a focus on scheduling, parallelization optimization, and resource management topics. The optimization goal of my research is to improve the efficiency of ML workloads in terms of latency, cost, and throughput. I also have a specific interest in designing efficient systems over heterogeneous resources, including CPU-GPU and heterogeneous GPU platforms.

My research projects:

Resource allocation in GPU clusters. Designing fine-grained scheduling policies in the (heterogeneous) GPU cluster to optimize the performance and resource efficiency for deep learning training jobs.
LLM inference optimization. Proposing optimization techniques for the LLM inference service, primarily focusing on the improvement of system-level metrics like throughput and latency.

🔥 News

2026.03: I give a talk at Cloudflare in March 3rd.
2025.06: 🎉🎉 I defense my Ph.D. thesis!

📝 Publications

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking

Accepted by Sigmod’ 26 (CCF-A, CSRanking), (to appear).
Authors: Zizhao Mo, Junlin Chen, Huanle Xu, Cheng-Zhong Xu

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism

Accepted by SC’ 25 (CCF-A, CSRanking).
Authors: Zizhao Mo, Jianxiong Liao, Huanle Xu, Zhi Zhou, Cheng-Zhong Xu

Fast and Fair Training for Deep Learning in Heterogeneous GPU Clusters

Accepted by ICS’ 25 (CCF-B, CSRanking).
Authors: Zizhao Mo, Huanle Xu, Wing Cheong Lau

Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters

Accepted by ASPLOS’ 24 (CCF-A, CSRanking).
Authors: Zizhao Mo, Huanle Xu, Cheng-Zhong Xu

Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters

Accepted by Middleware’ 24 (CCF-B).
Authors: Zizhao Mo, Huanle Xu, Wing Cheong Lau

Derm: SLA-aware Resource Management for Highly Dynamic Microservices

Accepted by ISCA’ 24 (CCF-A, CSRanking).
Authors: Liao Chen, Shutian Luo, Chenyu Lin, Zizhao Mo, Huanle Xu, Kejiang Ye, Cheng-Zhong Xu

Interference-aware Multiplexing for Deep Learning in GPU Clusters A Middleware Approach

Accepted by SC’ 23 (CCF-A, CSRanking).
Authors: Wenyan Chen, Zizhao Mo, Huanle Xu, Kejiang Ye, Cheng-Zhong Xu

🎖 Honors and Awards

2021 - 2025: Ph.D. Scholarship. University of Macau.
2024: Travel grant for ASPLOS’24.
2021: Arthur and Louis May Scholarship. Hong Kong University of Science and Technology.

🔖 Academic Activities

The reviewers of:

IEEE Transactions on Computers (IEEE TC)
Transactions on Architecture and Code Optimization (ACM TACO)
Transactions on Services Computing (IEEE TSC)
Journal of Systems Architecture (JSA)
IEEE Transactions on Consumer Electronics

The program committee member of:

MLsys’26 AE

📖 Educations

2021 - 2025: Ph.D. in Computer Science, University of Macau.
2020 - 2021: MSc in Information Technology, Hong Kong University of Science and Technology.
2014 - 2018: B.Eng. in Software Engineering, South China University of Technology.