Seminars and Colloquia by Series

Exploring Conditional Computation in Transformer models

Series
Applied and Computational Mathematics Seminar
Time
Monday, September 30, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and ONLINE
Speaker
Xin WangGoogle Research

Transformer (Vaswani et al. 2017) architecture is a popular deep learning architecture that today comprises the foundation for most tasks in natural language processing and forms the backbone of all the current state-of-the-art language models. Central to its success is the attention mechanism, which allows the model to weigh the importance of different input tokens. However, Transformers can become computationally expensive, especially for large-scale tasks. To address this, researchers have explored techniques for conditional computation, which selectively activate parts of the model based on the input. In this talk, we present two case studies of conditional computation in Transformer models. In the first case, we examine the routing mechanism in the Mixture-of-Expert (MoE) Transformer models, and show theoretical and empirical evidence that the router’s ability to route intelligently confers a significant advantage to MoE models. In the second case, we introduce Alternating Updates (AltUp), a method to take advantage of increased residual stream width in the Transformer models without increasing the computation cost.

 

Speaker's brief introduction: Xin Wang is a research engineer in the Algorithms team at Google Research. Xin finished his PhD in Mathematics at Georgia Institute of Technology before coming to Google. Xin's research interests include efficient computing, memory mechanism for machine learning, and optimization.

The talk will be presented online at

 https://gatech.zoom.us/j/93087689904

Maximal volume matrix cross approximation for image compression and least squares solution

Series
Applied and Computational Mathematics Seminar
Time
Monday, September 16, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Zhaiming ShenGeorgia Tech

We study the classic matrix cross approximation based on the maximal volume submatrices. Our main results consist of an improvement of the classic estimate for matrix cross approximation and a greedy approach for finding the maximal volume submatrices. More precisely, we present a new proof of the classic estimate of the inequality with an improved constant. Also, we present a family of greedy maximal volume algorithms to improve the computational efficiency of matrix cross approximation. The proposed algorithms are shown to have theoretical guarantees of convergence. Finally, we present two applications: image compression and the least squares approximation of continuous functions. Our numerical results demonstrate the effective performance of our approach.

Poisson Meets Poisson: Implicit boundary integral method for linearized Poisson Boltzmann equation

Series
Applied and Computational Mathematics Seminar
Time
Monday, August 26, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Yimin ZhongAuburn University

In this talk, I will give an introduction to the implicit boundary integral method based on the co-area formula and it provides a simple quadrature rule for boundary integral on general surfaces.  Then, I will focus on the application of solving the linearized Poisson Boltzmann equation, which is used to model the electric potential of protein molecules in a solvent. Near the singularity, I will briefly discuss the choices of regularization/correction and illustrate the effect of both cases. In the end, I will show the numerical analysis for the error estimate. 

Degeneracy of eigenvalues and singular values of parameter dependent matrices

Series
Applied and Computational Mathematics Seminar
Time
Monday, May 6, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/93530218689?pwd=SFkzMXZyZXhZOTdRazhyL1BoVXprdz09
Speaker
Alessandro Pugliese Università degli Studi di Bari Aldo Moro

Speaker will present in person.

Hermitian matrices have real eigenvalues and an orthonormal set of eigenvectors. Do smooth Hermitian matrix valued functions have smooth eigenvalues and eigenvectors? Starting from such question, we will first review known results on the smooth eigenvalue and singular values decompositions of matrices that depend on one or several parameters, and then focus on our contribution, which has been that of devising topological tools to detect and approximate parameters' values where eigenvalues or singular values of a matrix valued function are degenerate (i.e. repeated or zero).

The talk will be based on joint work with Luca Dieci (Georgia Tech) and Alessandra Papini (Univ. of Florence).

Generative modeling through time reversal and reflection of diffusion processes

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 29, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Nicole YangEmory University

Please Note: Speaker will present in person.

In this talk, we discuss generative modeling algorithms motivated by the time reversal and reflection properties of diffusion processes. Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. We develop SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution, our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently at multiple resolution levels. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting by ensuring the well-posedness of forward and reverse processes, and derive the convergence of the approximation of multilevel training. We illustrate that approximating the score function with an operator network is beneficial for multilevel training.

In the second part of this talk, we propose the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and demonstrate its scalability in constrained generative modeling.

Monotone generative modeling via a geometry-preserving mapping

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 15, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Wonjun LeeUniversity of Minnesota, Twin Cities

Generative Adversarial Networks (GANs) are powerful tools for creating new content, but they face challenges such as sensitivity to starting conditions and mode collapse. To address these issues, we propose a deep generative model that utilizes the Gromov-Monge embedding (GME). It helps identify the low-dimensional structure of the underlying measure of the data and then map it, while preserving its geometry, into a measure in a low-dimensional latent space, which is then optimally transported to the reference measure. We guarantee the preservation of the underlying geometry by the GME and c-cyclical monotonicity of the generative map, where c is an intrinsic embedding cost employed by the GME. The latter property is a first step in guaranteeing better robustness to initialization of parameters and mode collapse. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images, avoiding mode collapse, and exhibiting robustness to different starting conditions.

Diffusion Models: Theory and Applications (in PDEs)

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 8, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Yulong LuUniversity of Minnesota, Twin Cities

Diffusion models, particularly score-based generative models (SGMs), have emerged as powerful tools in diverse machine learning applications, spanning from computer vision to modern language processing. In the first part of this talk, we delve into the generalization theory of SGMs, exploring their capacity for learning high-dimensional distributions. Our analysis show that SGMs achieve a dimension-free generation error bound when applied to a class of sub-Gaussian distributions characterized by certain low-complexity structures.  In the second part of the talk, we consider the application of diffusion models in solving partial differential equations (PDEs). Specifically, we present the development of a physics-guided diffusion model designed for reconstructing high-fidelity solutions from their low-fidelity counterparts. This application showcases the adaptability of diffusion models and their potential to scientific computation.  

Accelerating Molecular Discovery with Machine Learning: A Geometric, Sampling and Optimization Perspective

Series
Applied and Computational Mathematics Seminar
Time
Monday, April 1, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Yuanqi DuCornell University

Please Note: Speaker will present in person. Bio: Yuanqi Du is a PhD student at the Department of Computer Science, Cornell University studying AI and its intersection with Scientific Discovery advised by Prof. Carla P. Gomes. His research interests include Geometric Deep Learning, Probabilistic Machine Learning, Sampling, Optimization, and AI for Science (with a focus on molecular discovery). Aside from his research, he is passionate about education and community building. He leads the organization of a series of events such as the Learning on Graphs conference and AI for Science, Probabilistic Machine Learning workshops at ML conferences and an educational initiative (AI for Science101) to bridge the AI and Science community.

Recent advancements in machine learning have paved the way for groundbreaking opportunities in the realm of molecular discovery. At the forefront of this evolution are improved computational tools with proper inductive biases and efficient optimization. In this talk, I will delve into our efforts around these themes from a geometry, sampling and optimization perspective. I will first introduce how to encode symmetries in the design of neural networks and the balance of expressiveness and computational efficiency. Next, I will discuss how generative models enable a wide range of design and optimization tasks in molecular discovery. In the third part, I will talk about how the advancements in stochastic optimal control, sampling and optimal transport can be applied to find transition states in chemical reactions.

Function approximation with one-bit Bernstein polynomials and one-bit neural networks

Series
Applied and Computational Mathematics Seminar
Time
Monday, March 25, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Weilin LiCity College of New York
The celebrated universal approximation theorems for neural networks typically state that every sufficiently nice function can be arbitrarily well approximated by a neural network with carefully chosen real parameters. With the emergence of large neural networks and a desire to use them on low power devices, there has been increased interest in neural network quantization (i.e., the act of replacing its real parameters with ones from a much smaller finite set). In this talk, we ask whether it is even possible to quantize neural networks without sacrificing their approximation power, especially in the extreme one-bit {+1,-1} case? We present several naive quantization strategies that yield universal approximation theorems by quantized neural networks, and discuss their advantages/disadvantages. From there, we offer an alternative approach based on Bernstein polynomials and show that {+1,-1} linear combinations of multivariate Bernstein polynomials can efficiently approximate smooth functions. This strategy can be implemented by means of a one-bit neural network and computed from point samples/queries. Joint work with Sinan Gunturk.

 

Diffusion Models for Arbitrary Discrete Markov Processes

Series
Applied and Computational Mathematics Seminar
Time
Monday, March 4, 2024 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Zachary FoxOak Ridge National Laboratory

Please Note: Speaker will present in person.

Diffusion models have become ubiquitous for image generation and are increasingly being used for scientific applications. To date, many flavors of diffusion models have been developed by varying the stochastic process that noises data, but also the domain on which these processes act. Typically, generative diffusion models rely on a Gaussian diffusion process for training the backward transformations, which can then be used to generate samples from Gaussian noise. However, real world data often takes place in discrete-state spaces, including many scientific applications. Here we develop a theoretical formulation for arbitrary discrete-state Markov processes in the forward diffusion process using exact analysis. We relate the theory to the existing continuous-state Gaussian diffusion in discrete and continuous time. This approach is validated using a simple stochastic decay process, in which the reverse process generates images from a single all-black image, rather than a noisy prior distribution.

Pages