Breaking the Curse of Dimensionality: Graphs, Probability Measures, and Data

Series
Applied and Computational Mathematics Seminar
Time
Monday, January 26, 2026 - 2:00pm for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/94954654170
Speaker
James Murphy – Tufts University – JM.Murphy@tufts.eduhttps://jmurphy.math.tufts.edu/
Organizer
Wenjing Liao

The curse of dimensionality renders statistical and machine learning in high dimensions intractable without additional assumptions on the underlying data.  We consider geometric models for data that allow for mathematical performance guarantees and efficient algorithms that break the curse.  The first part of the talk develops a family of data-driven metrics that balance between density and geometry in the underlying data.  We consider discrete graph operators based on these metrics, and prove performance guarantees for clustering with them in the spectral graph paradigm.  Fast algorithms based on Euclidean nearest-neighbor graphs are proposed and connections with continuum operators on manifolds are developed. 
 
In the second part of the talk, we move away from Euclidean spaces and focus on representation learning of probability measures in Wasserstein space.  We introduce a general barycentric coding model in which data are represented as Wasserstein barycenters of a set of fixed reference measures.  Leveraging the geometry of Wasserstein space, we develop a tractable optimization program to learn the barycentric coordinates when given access to the densities of the underlying measures.  We provide a consistent statistical procedure for learning these coordinates when the measures are accessed only by i.i.d. samples.  Our consistency results and algorithms exploit entropic regularization of optimal transport maps, thereby allowing our barycentric modeling approach to scale efficiently.  Extensions to learning suitable reference measures and linearizations of our barycentric coding model will be discussed.  Throughout the talk, applications to synthetic and real data demonstrate the efficacy of our methods.