Seminars and Colloquia by Series

Neural Oracle Search on N-BEST Hypotheses

Series
Applied and Computational Mathematics Seminar
Time
Monday, September 12, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Tongzhou ChenGoogle

In this talk, we propose a Neural Oracle Search(NOS) model in Automatic Speech Recognition(ASR) to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The model provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed model is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This model achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. In addition, we investigate the use of the NOS model on a 1st-pass multilingual model and show that similar to the 1st-pass model, the NOS model can be made multilingual.

Convergence of denoising diffusion models

Series
Applied and Computational Mathematics Seminar
Time
Monday, August 29, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Valentin DE BORTOLICNRS and ENS Ulm
Generative modeling is the task of drawing new samples from an underlying distribution known only via an empirical measure. There exists a myriad of models to tackle this problem with applications in image and speech processing, medical imaging, forecasting and protein modeling to cite a few.  Among these methods score-based generative models (or diffusion models) are a  new powerful class of generative models that exhibit remarkable empirical performance. They consist of a ``noising'' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a ``denoising'' process defined by approximating the time-reversal of the diffusion.

In this talk I will present some of their theoretical guarantees with an emphasis on their behavior under the so-called manifold hypothesis. Such theoretical guarantees are non-vacuous and provide insight on the empirical behavior of these models. I will show how these results imply generalization bounds on denoising diffusion models. This presentation is based on https://arxiv.org/abs/2208.05314

Recent advances on structure-preserving algorithms

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 25, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://gatech.zoom.us/j/96551543941
Speaker
Philippe G. LeFlochSorbonne Univ. and CNRS
Structure-preserving methodologies led to interesting advances on the design of computational algorithms: one observes that an (obvious or hidden) structure is enjoyed by the problem under consideration and one then designs numerical approximations enjoying the same structure at the discrete level. For problems involving a large number of dimensions, for instance in mathematical finance and machine learning, I have introduced the 'transport-based mesh-free method' which uses a reproducing kernel and a transport mapping in a way that is reminiscent of Lagrangian methods developed in computational fluid dynamics. This method is now implemented in a Python library (CodPy) and used in industrial applications. 
 
In compressible fluid dynamics, astrophysics, or cosmology, one needs to compute with propagating singularities, such as shock waves, moving interfaces, or gravitational singularities, I will overview recent progress on structure-preserving algorithms in presence of small-scale dependent waves which drive the global flow dynamics. I recently introduced asymptotic-preserving or dissipation-preserving methods adapted to such problems. This lecture is based on joint collaborations with F. Beyer (Dunedin), J.-M. Mercier (Paris), S. Miryusupov (Paris), and Y. Cao (Shenzhen). Blog: philippelefloch.org 

Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 18, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
Hybrid: Skiles 005 and https://gatech.zoom.us/j/96551543941
Speaker
Holden LeeDuke University

MCMC and variational inference are two competing paradigms for the problem of sampling from a given probability distribution. In this talk, I'll show how they can work together to give the first polynomial-time sampling algorithm for approximately low-rank Ising models. Sampling was previously known when all eigenvalues of the interaction matrix fit in an interval of length 1; however, a single outlier can cause Glauber dynamics to mix torpidly. Our result covers the case when all but O(1) eigenvalues lie in an interval of length 1. To deal with positive eigenvalues, we use a temperature-based heuristic for MCMC called simulated tempering, while to deal with negative eigenvalues, we define a nonconvex variational problem over Ising models, solved using SGD. Our result has applications to sampling Hopfield networks with a fixed number of patterns, Bayesian clustering models with low-dimensional contexts, and antiferromagnetic/ferromagnetic Ising model on expander graphs.

Learning Operators with Coupled Attention

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 11, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://gatech.zoom.us/j/96551543941
Speaker
Paris PerdikarisUniversity of Pennsylvania

Supervised operator learning is an emerging machine learning paradigm with applications to modeling the evolution of spatio-temporal dynamical systems and approximating general black-box relationships between functional data. We propose a novel operator learning method, LOCA (Learning Operators with Coupled Attention), motivated from the recent success of the attention mechanism. In our architecture, the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations. By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions, enabling us to approximate nonlinear operators even when the number of output function measurementsin the training set is very small. Our formulation is accompanied by rigorous approximation theoretic guarantees on the universal expressiveness of the proposed model. Empirically, we evaluate the performance of LOCA on several operator learning scenarios involving systems governed by ordinary and partial differential equations, as well as a black-box climate prediction problem. Through these scenarios we demonstrate state of the art accuracy, robustness with respect to noisy input data, and a consistently small spread of errors over testing data sets, even for out-of-distribution prediction tasks.
 

The Approximation Properties of Convex Hulls, Greedy Algorithms, and Applications to Neural Networks

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 4, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
Hybrid: Skiles 005 and https://gatech.zoom.us/j/96551543941
Speaker
Jonathan SiegelPenn State Mathematics Department

Given a collection of functions in a Banach space, typically called a dictionary in machine learning, we study the approximation properties of its convex hull. Specifically, we develop techniques for bounding the metric entropy and n-widths, which are fundamental quantities in approximation theory that control the limits of linear and non-linear approximation. Our results generalize existing methods by taking the smoothness of the dictionary into account, and in particular give sharp estimates for shallow neural networks. Consequences of these results include: the optimal approximation rates which can be attained for shallow neural networks, that shallow neural networks dramatically outperform linear methods of approximation, and indeed that shallow neural networks outperform all continuous methods of approximation on the associated convex hull. Next, we discuss greedy algorithms for constructing approximations by non-linear dictionary expansions. Specifically, we give sharp rates for the orthogonal greedy algorithm for dictionaries with small metric entropy, and for the pure greedy algorithm. Finally, we give numerical examples showing that greedy algorithms can be used to solve PDEs with shallow neural networks.

How Differential Equations Insight Benefit Deep Learning

Series
Applied and Computational Mathematics Seminar
Time
Monday, March 28, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://gatech.zoom.us/j/96551543941 (note: Zoom, not Bluejeans)
Speaker
Prof. Bao WangUniversity of Utah

We will present a new class of continuous-depth deep neural networks that were motivated by the ODE limit of the classical momentum method, named heavy-ball neural ODEs (HBNODEs). HBNODEs enjoy two properties that imply practical advantages over NODEs: (i) The adjoint state of an HBNODE also satisfies an HBNODE, accelerating both forward and backward ODE solvers, thus significantly accelerate learning and improve the utility of the trained models. (ii) The spectrum of HBNODEs is well structured, enabling effective learning of long-term dependencies from complex sequential data.

Second, we will extend HBNODE to graph learning leveraging diffusion on graphs, resulting in new algorithms for deep graph learning. The new algorithms are more accurate than existing deep graph learning algorithms and more scalable to deep architectures, and also suitable for learning at low labeling rate regimes. Moreover, we will present a fast multipole method-based efficient attention mechanism for modeling graph nodes interactions.

Third, if time permits, we will discuss proximal algorithms for accelerating learning continuous-depth neural networks.

Low-dimensional Modeling for Deep Learning

Series
Applied and Computational Mathematics Seminar
Time
Monday, March 14, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://gatech.zoom.us/j/96551543941
Speaker
Zhihui ZhuUniversity of Denvor

In the past decade, the revival of deep neural networks has led to dramatic success in numerous applications ranging from computer vision to natural language processing to scientific discovery and beyond. Nevertheless, the practice of deep networks has been shrouded with mystery as our theoretical understanding of the success of deep learning remains elusive.

In this talk, we will exploit low-dimensional modeling to help understand and improve deep learning performance. We will first provide a geometric analysis for understanding neural collapse, an intriguing empirical phenomenon that persists across different neural network architectures and a variety of standard datasets. We will utilize our understanding of neural collapse to improve training efficiency. We will then exploit principled methods for dealing with sparsity and sparse corruptions to address the challenges of overfitting for modern deep networks in the presence of training data corruptions. We will introduce a principled approach for robustly training deep networks with noisy labels and robustly recovering natural images by deep image prior.

Symmetry-preserving machine learning for computer vision, scientific computing, and distribution learning

Series
Applied and Computational Mathematics Seminar
Time
Monday, March 7, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://gatech.zoom.us/j/96551543941 (note: Zoom, not Bluejeans)
Speaker
Prof. Wei ZhuUMass Amherst

Please Note: Note the talk will be hosted by Zoom, not Bluejeans any more.

Symmetry is ubiquitous in machine learning and scientific computing. Robust incorporation of symmetry prior into the learning process has shown to achieve significant model improvement for various learning tasks, especially in the small data regime.

In the first part of the talk, I will explain a principled framework of deformation-robust symmetry-preserving machine learning. The key idea is the spectral regularization of the (group) convolutional filters, which ensures that symmetry is robustly preserved in the model even if the symmetry transformation is “contaminated” by nuisance data deformation.
 
In the second part of the talk, I will demonstrate how to incorporate additional structural information (such as group symmetry) into generative adversarial networks (GANs) for data-efficient distribution learning. This is accomplished by developing new variational representations for divergences between probability measures with embedded structures. We study, both theoretically and empirically, the effect of structural priors in the two GAN players. The resulting structure-preserving GAN is able to achieve significantly improved sample fidelity and diversity—almost an order of magnitude measured in Fréchet Inception Distance—especially in the limited data regime. 
 

Neural Networks with Inputs Based on Domain of Dependence and A Converging Sequence for Solving Conservation Laws

Series
Applied and Computational Mathematics Seminar
Time
Monday, February 28, 2022 - 14:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/457724603/4379
Speaker
Haoxiang HuangGT

Recent research on solving partial differential equations with deep neural networks (DNNs) has demonstrated that spatiotemporal-function approximators defined by auto-differentiation are effective    for approximating nonlinear problems. However, it remains a challenge to resolve discontinuities in nonlinear conservation laws using forward methods with DNNs without beginning with part of the solution. In this study, we incorporate first-order numerical schemes into DNNs to set up the loss function approximator instead of auto-differentiation from traditional deep learning framework such as the TensorFlow package, thereby improving the effectiveness of capturing discontinuities in Riemann problems. We introduce a novel neural network method.  A local low-cost solution is first used as the input of a neural network to predict the high-fidelity solution at a space-time location. The challenge lies in the fact that there is no way to distinguish a smeared discontinuity from a steep smooth solution in the input, thus resulting in “multiple predictions” of the neural network. To overcome the difficulty, two solutions of the conservation laws from a converging sequence, computed from low-cost numerical schemes, and in a local domain of dependence of the space-time location, serve as the input. Despite smeared input solutions, the output provides sharp approximations to solutions containing shocks and contact surfaces, and the method is efficient to use, once trained. It works not only for discontinuities, but also for smooth areas of the solution, implying broader applications for other differential equations.

Pages