Load Balancing under Data Locality: Extending Mean-Field Framework to Constrained Large-Scale Systems

Stochastics Seminar
Thursday, February 29, 2024 - 3:30pm for 1 hour (actually 50 minutes)
Skiles 006
Debankur Mukherjee – Georgia Tech
Cheng Mao

Large-scale parallel-processing infrastructures such as data centers and cloud networks form the cornerstone of the modern digital environment. Central to their efficiency are resource management policies, especially load balancing algorithms (LBAs), which are crucial for meeting stringent delay requirements of tasks. A contemporary challenge in designing LBAs for today's data centers is navigating data locality constraints that dictate which tasks are assigned to which servers. These constraints can be naturally modeled as a bipartite graph between servers and various task types. Most LBA heuristics lean on the mean-field approximation's accuracy. However, the non-exchangeability among servers induced by the data locality invalidates this mean-field framework, causing real-world system behaviors to significantly diverge from theoretical predictions. From a foundational standpoint, advancing our understanding in this domain demands the study of stochastic processes on large graphs, thus needing fundamental advancements in classical analytical tools.

In this presentation, we will delve into recent advancements made in extending the accuracy of mean-field approximation for a broad class of graphs. In particular, we will talk about how to design resource-efficient, asymptotically optimal data locality constraints and how the system behavior changes fundamentally, depending on whether the above bipartite graph is an expander, a spatial graph, or is inhomogeneous in nature.