Overview

Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are radically different from other popular probabilistic models, which is self-normalized (i.e., sum to one), such as hidden Markov models (HMMs), auto-regressive models, Generative Adversarial Nets (GANs) and Variational Auto-encoders (VAEs). During these years, EBMs have attracted increasing interests not only from core machine learning but also from application domains such as speech, vision, natural language processing (NLP) and so on, with significant theoretical and algorithmic progress. To the best of our knowledge, there are no tutorials about EBMs with applications to speech and language processing. The sequential nature of speech and language also presents special challenges and needs treatment different from processing fix-dimensional data (e.g., images).
The purpose of this tutorial is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing, which is organized into four chapters. First, we will introduce basics for EBMs, including classic models, recent models parameterized by neural networks, and various learning algorithms from the classic methods to the most advanced ones. The next three chapters will present how to apply EBMs in three different scenarios respectively: 1) EBMs for language modeling, 2) EBMs for speech recognition and natural language labeling, and 3) EBMs for semi-supervised natural language labeling. In addition, we will introduce open-source toolkits to help the audience to get familiar with the techniques for developing and applying energy-based models.

Speaker

Zhijian Ou

Tsinghua University, Beijing

Slides

ICASSP2022 Tutorial Energy-Based Models with Applications to Speech and Language Processing.pdf

Videos

Chapter1: Basics for EBMs

Watch on YouTube

Chapter2: EBMs for language modeling

Watch on YouTube

Chapter3: EBMs for speech recognition and natural language labeling

Watch on YouTube

Chapter4: EBMs for semi-supervised natural language labeling

Watch on YouTube

If you cannot access YouTube, please access our video on Bilibili

Content

Chapter 1: Basics for EBMs

Probabilistic graphical modeling (PGM) framework and EBM model examples (classic & modern)
Learning EBMs by Monte Carlo methods
Learning EBMs by noise-contrastive estimation (NCE)

Chapter 2: EBMs for language modeling

Trans-dimensional random field (TRF) LMs for speech recognition
Residual energy-based models for text generation
Electric: an energy-based cloze model for representation learning over text

Chapter 3: EBMs for speech recognition and natural language labeling

CRFs as conditional EBMs
CRFs for speech recognition
CRFs for sequence labeling in NLP

Chapter 4: EBMs for semi-supervised natural language labeling

Upgrading EBMs to Joint EBMs (JEMs) for fixed-dimensional data
Upgrading CRFs to Joint random fields (JRFs) for sequential data
JRFs for semi-supervised natural language labeling

References

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
Eric Fosler-Lussier, et al. Conditional random fields in speech, audio, and language processing. Proceedings of the IEEE, 2013.
Zhijian Ou. A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling. arXiv:1808.01630, 2018.
Bin Wang, Zhijian Ou, Zhiqiang Tan. Trans-dimensional Random Fields for Language Modeling. Annual Meeting of the Association for Computational Linguistics (ACL Long Paper), 2015. https://github.com/thu-spmi/SPMILM
Bin Wang, Zhijian Ou, Zhiqiang Tan. Learning Trans-dimensional Random Fields with Applications to Language Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018, vol.40, no.4, pp.876-890. https://github.com/thu-spmi/SPMILM
Bin Wang, Zhijian Ou. Language modeling with neural trans-dimensional random fields. ASRU, 2017. https://github.com/thu-spmi/SPMILM
Bin Wang, Zhijian Ou. Learning neural trans-dimensional random field language models with noise-contrastive estimation. ICASSP, 2018. https://github.com/thu-spmi/SPMILM
Bin Wang, Zhijian Ou. Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation. SLT, 2018. https://github.com/thu-spmi/SPMILM
Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu. Integrating discrete and neural features via mixed-feature trans-dimensional random field language models. ICASSP (Oral), 2020. https://github.com/thu-spmi/SPMILM
Yunfu Song, Zhijian Ou. Learning Neural Random Fields with Inclusive Auxiliary Generators. arXiv:1806.00271, 2018. https://github.com/thu-spmi/Inclusive-NRF
Hongyu Xiang, Zhijian Ou. CRF-based Single-stage Acoustic Modeling with CTC Topology. ICASSP (Oral Paper), 2019. https://github.com/thu-spmi/CAT
Kai Hu, Zhijian Ou, Min Hu, Junlan Feng. Neural CRF Transducers for Sequence Labeling. ICASSP, 2019. https://github.com/thu-spmi/SPMISeq
Yunfu Song, Zhijian Ou, Zitao Liu, Songfan Yang. Upgrading CRFs to JRFs and its benefits to sequence modeling and labeling. ICASSP, 2020. https://github.com/thu-spmi/semi-EBM
Yunfu Song, Huahuan Zheng, Zhijian Ou. An empirical comparison of joint-training and pre-training for domain-agnostic semi-supervised learning via energy-based models. IEEE Machine Learning for Signal Processing Workshop (MLSP), 2021. https://github.com/thu-spmi/semi-EBM
Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy based model and you should treat it like one. ICLR 2020. https://github.com/wgrathwohl/JEM
Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, and Marc'Aurelio Ranzato. Residual energy-based models for text generation, ICLR 2020. https://github.com/da03/Residual-EBM
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. Pre-training transformers as energy-based cloze models, EMNLP 2020. https://github.com/google-research/electra