Energy-Based Models
with Applications to Speech and Language Processing
ICASSP2022 Tutorial
14:00-17:30 (UTC+8), 22 May, 2022
Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are radically different from other popular probabilistic models, which is self-normalized (i.e., sum to one), such as hidden Markov models (HMMs), auto-regressive models, Generative Adversarial Nets (GANs) and Variational Auto-encoders (VAEs). During these years, EBMs have attracted increasing interests not only from core machine learning but also from application domains such as speech, vision, natural language processing (NLP) and so on, with significant theoretical and algorithmic progress. To the best of our knowledge, there are no tutorials about EBMs with applications to speech and language processing. The sequential nature of speech and language also presents special challenges and needs treatment different from processing fix-dimensional data (e.g., images).
The purpose of this tutorial is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing, which is organized into four chapters. First, we will introduce basics for EBMs, including classic models, recent models parameterized by neural networks, and various learning algorithms from the classic methods to the most advanced ones. The next three chapters will present how to apply EBMs in three different scenarios respectively: 1) EBMs for language modeling, 2) EBMs for speech recognition and natural language labeling, and 3) EBMs for semi-supervised natural language labeling. In addition, we will introduce open-source toolkits to help the audience to get familiar with the techniques for developing and applying energy-based models.
ICASSP2022 Tutorial Energy-Based Models with Applications to Speech and Language Processing.pdf