A Mathematical Perspective On Contrastive Learning

Series
Applied and Computational Mathematics Seminar
Time
Monday, September 15, 2025 - 2:00pm for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Prof. Ricardo Baptista – University of Toronto
Organizer
Molei Tao

Please Note: Speaker will be in person

Multimodal contrastive learning is a methodology for linking different data modalities, such as images and text. It is typically framed as the identification of a set of encoders—one for each modality—that align representations within a common latent space. In this presentation, we interpret contrastive learning as the optimization of encoders that define conditional probability distributions, for each modality conditioned on the other, in a way consistent with the available data. This probabilistic perspective suggests two natural generalizations of contrastive learning: (i) the introduction of novel probabilistic loss functions, and (ii) the use of alternative metrics for measuring alignment in the common latent space. We investigate these generalizations of the classical approach in the multivariate Gaussian setting by viewing latent space identification as a low-rank matrix approximation problem. The proposed framework is further studied through numerical experiments on multivariate Gaussians, the labeled MNIST dataset, and a data assimilation application in oceanography.