Follow @ZhijianOu
2024
May 30, 2024: Invited short course for China Mobile: “Artificial Intelligence Development Trends and Industry Application Practices (人工智能发展趋势与行业应用实践)” (2 hour), Beijing, China.
Aug. 16, 2024: Organize Special Session at NCMMSC 2024: "Multilingual Speech and Language AI Technologies" (SALT-Multilingual 2nd) (多语言智能语音语言技术第二届研讨会), Urumqi (乌鲁木齐), China.
April, 2024:
We are organizing the 2nd FutureDial Challenge - Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), co-located with SLT 2024.
Challenge Website |
Chinese Blog
March, 2024:From delivering a tutorial titled "Energy-Based Models with Applications to Speech and Language Processing" at ICASSP 2022, to being invited to write a monograph with the same title for the "Foundations and Trends® in Signal Processing" series, and now to its official publication, I am humbled to contribute a stepping stone to this promising field!
URL |
arxiv
2023
Dec. 9, 2023: Organize Special Session at NCMMSC 2023: "Semi-Supervised Speech and Language AI Technologies" (SALT-Semi) (半监督智能语音语言技术), Suzhou (苏州), China.
URL
Nov. 19, 2023: Invited Talk at SpeechHome Conference on Speech Technology 2023: "Some Thoughts and Projections on Speech Foundation Models" (语音大模型的若干思考与猜测), Beijing, China.
slides
Oct. 14, 2023: Invited Talk at NLPCC 2023: "Knowledge-Retrieval Dialog Systems with Semi-Supervision" (半监督知识检索对话系统), Foshan (佛山), China.
URL
Sep. 23, 2023: Organize CCF workshop: "Multilingual Speech and Language AI Technologies" (SALT-Multilingual) (多语言智能语音语言技术), Nanning (南宁), China.
Aug. 24, 2023: Talk at 2023 Interspeech: "Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision" (半监督式知识检索的任务导向对话系统), Dublin, Ireland.
video |
slides
July 8, 2023: Invited talk at 2023 World Artificial Intelligence Conference (WAIC): "The philosophy of semi-supervision and alignment behind ChatGPT" (ChatGPT背后的半监督和对齐的哲学), Shanghai, China.
video
June 9, 2023: Panel Discussion at 2023 BAAI Conference for the session "Large-scale model new infrastructure and intellectual operation" (大模型新基建与智力运营), Beijing, China.
Apr 26, 2023: Invited talk at 2023 China Mobile Cloud Conference: "Progress and Future Challenges of Large-scale Human-Machine Dialog" (大模型人机对话的进展及未来挑战), Suzhou, China.
video
Mar 23, 2023: A Chinese blog: "A rigorous introduction to the progress, shortcomings and AGI challenges about ChatGPT" (严谨谈谈ChatGPT的进步、不足及AGI挑战).
URL
2022
Nov 20, 2022: Invited Tutorial (2 hours) "Data-efficient Multilingual and Crosslingual Speech Recognition" (数据高效的多语言与跨语言语音识别) at CCF Advanced Disciplines Lectures (ADL), 2022.
slides
May 22, 2022: Invited Tutorial (3.5 hours) "Energy-Based Models with Applications to Speech and Language Processing" at ICASSP 2022.
slides & videos
April 29, 2022: SereTOD Challenge Kickoff Seminar (EMNLP2022半监督和强化对话系统挑战赛发布暨研讨会).
Chinese Blog |
Seminar video
Mar 25, 2022: The Department (Electronic Engineering Department, Tsinghua University) is launching Speech And Language Technologies (THUEE-SALT) Frontiers Lecture Series (语音语言技术前沿讲座).
URL
Feb, 2022: We are organizing SereTOD workshop "Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems" and its related challenge, Co-located with EMNLP 2022.
Workshop Call for Papers |
Challenge Call for Participation |
Chinese Blog
2021
Dec 18, 2021: Invited Talk "Data-efficient Multilingual and Crosslingual Speech Recognition" (数据高效的多语言与跨语言语音识别) at CCF First Seminar on Southeast-Asia Language Technology and Application (首届东南亚非通用语种处理技术与应用研讨会), Nanning.
URL |
slides
Nov 9, 2021: Invited Tutorial (3 hours) "State-of-the-Art of End-to-End Speech Recognition" at The 6th Asian Conference on Pattern Recognition (ACPR2021), Jeju Island, Korea.
URL |
slides |
video (part 1) |
video (part 2)
Sep 24, 2021: Invited Talk "Semi-Supervised Task-Oriented Dialog Systems and Natural Language Labeling" at UIUC NLP Seminar.
slides
July 9, 2021: Invited Talk "Data-efficient Automatic Speech Recognition" at Apple office, Beijing.
slides
June 5, 2021:
Chinese Blog: 2021中国移动九天人工智能新技术论坛报告——建设强泛化人机语音对话系统迈向强人工智能
Invited talk "Building Strongly Generalizable Man-Machine Spoken Dialogue Systems Towards Strong Artificial Intelligence" at China Mobile AI Forum 2021.
video
June 2021: Congratulations to Hong Liu: Tsinghua University Outstanding Undergraduate Thesis - "Data-efficient end-to-end dialog systems ".
May 31, 2021:
Chinese Blog: 电子系欧智坚研究组项目获国家广电MediaAIAC一等奖
Congratulations: First-class prize in Media Artificial Intelligence Application Innovation National Competition.
Mar 29, 2021:Chinese Blog: 2021全国声学大会语言声学分论坛报告—第三代语音识别技术初探
Part-1 Part-2.
Invited talk "On exploring the third generation of speech recognition technology" at National Conference on Acoustics, Shanghai.
Mar 14, 2021: Chinese Blog: 清华SPMI课题组招聘博士后,年薪30万元起,小伙伴快砸简历
Feb 3, 2021: Chinese Blog: SPMI@SLT2021: 基于直通梯度的高效神经结构搜索与端到端语音识别融合
Jan 30, 2021: Invited Talk "Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients" at SLT2021 Children Speech Recognition Challenge (CSRC) Workshop, Virtual, 2021/1/30.
slides
2020
Dec 28, 2020: Chinese Blog: 十张图十句话带你快速了解SPMI2020的十篇论文
Nov 23, 2020: Chinese Blog: SPMI@EMNLP2020: 隐对话状态端到端对话模型的半监督学习
Oct 29, 2020: Chinese Blog: SPMI@INTERSPEECH2020: 数据高效端到端语音识别,知识融合的词表示学习
Aug 3, 2020: Chinese Blog: SPMI@ICASSP2020: 混合特征随机场语言模型及其语音识别应用
June 2020: Congratulations to Silin Gao: Tsinghua University Outstanding Undergraduate Thesis - "Mixed-feature Random Field Language Models with Applications to Speech Recognition".
June 2020: Co-chair the online workshop "Frontiers in Speech Conversation and Auditory", organized by CCF Speech Conversation and Auditory TC.
2019
Nov. 2019: Invited Talk "Marrying Neural Networks and Undirected Graphical Models for Efficient Speech Recognition" (联合神经网络与无向图模型的高效语音识别) at ASRU2019 Code-Switching Challenge Seminar (ASRU2019中英混杂语音识别挑战赛线下技术交流会), Beijing, 2019/11/23.
slides
Nov. 2019: We release CAT (Crf-based Asr Tookit) for data-efficient end2end speech recognition. Hope it's fun!
code at github
Oct. 2019: Co-chair the panel "Frontiers and challenges of intelligent speech processing" at CNCC (China National Computer Congress) 2019, Suzhou, China.
August 2019: Chair the session "Speech perception and recognition" at CCF Speech Conversation and Auditory TC Annual Meeting, Xining, China.
July 2019: Invited to teach a short course "Theory and Applications of Probabilistic Graphical Models" (1 credit) at Xinjiang University, China.
June 2019: Congratulations to Keyu An: Tsinghua University Outstanding Undergraduate Thesis - "Random Fields based Speech Recognition".
April 2019: Invited Talk "Toward Simple Speaker Recognition and Speech Recognition"(简洁的说话人识别及语音识别) at The Symposium on Speaker Recognition Research and Application, Kun-shan, China.
slides
Mar 24, 2019: Chinese Blog: SPMI@ICASSP2019: 基于条件随机场打造简洁灵活的端到端语音识别
2018
Nov. 2018: Congratulations to Yutian Li: ISCSLP (International Symposium on Chinese Spoken Language Processing) Best Student Paper Award.
News
Nov. 26, 2018: Tutorial "Language Modeling: State of the Art" at the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, given by Zhijian Ou and Bin Wang.
URL |
slides
Oct. 2018: Invited Talk "Neural random fields with applications to modeling of languages and images" at Apple Boston, Toyota Technological Institute at Chicago (TTIC), and UIUC (link).
July 2018: Congratulations to Bin Wang: Tsinghua University Outstanding Ph.D. Thesis.
Thesis pdf
Feb. 2018: We release the Iqiyi movie dialog dataset collected and labeled under a crowdsourcing Wizard-of-Oz framework, which is used in our paper - "Tracking of enriched dialog states for flexible conversational information access", published at ICASSP 2018.
arxiv |
data readme |
data download
2017
Oct. 2017: Invited Talk "Future Artificial Intelligence with Efficient Learning" at Tsinghua University Science and Technology Forum, Su-zhou, China.
Oct. 2017: Present the PAMI paper “Learning Trans-dimensional Random Fields with Applications to Language Modeling”, at the Special Session on Journal Papers, NCMMSC (National Conference on Man-Machine Communication) 2017, Lian-yun-gang, China, by Bin Wang.
slides
2016
Dec. 2016: Present “The THU-SPMI SRE-16 System with Joint Bayesian Scoring and Ladder Network based Feature Learning”, at 2016 NIST Speaker Recognition Workshop (SRE2016), San Diego, USA.
pdf |
teaser presentation |
poster
Nov. 2016: We release SPMI Array processing toolkit, which was used in our work - "The THU-SPMI CHiME-4 system : Lightweight design with advanced multi-channel processing, feature enhancement, and language modeling", published in CHiME Workshop, 2016.
The toolkit implements a bundle of beamforming methods, including MVDR, PMWF, GSC, ME, and we compare these techniques with the same back-end. See the paper for details.
pdf |
poster |
SPMIArray toolkit at github
Nov. 2016: We release the 9-hour real-world audio stream recorded from a public broadcast radio in Beijing with its motif annotation, which is used in our paper - "Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering", published in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016.
pdf |
data readme |
data download
Oct. 17, 2016: Tutorial "Undirected Graphical Models: Theory and Applications to Speech and Language Processing" at the 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), Tianjin, China.
URL |
slides
August 2016: We release the SPMILM toolkit at github, which not only implements TRF-LMs, but also includes third-party codes for ngram-LMs, RNN-LMs and LSTM-LMs. Example scripts are provided to evaluate different LMs by rescoring n-best lists.
SPMILM toolkit at github |
doxygen documentation
July 2016: We release the code for SSP 2016 Paper "Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection" ! The change-point selection problem is elegantly addressed through DPP-based subset selection. Five real-world data experiments are shown.
pdf |
poster |
longer version at arxiv |
code readme |
code download
Jan. 2016: Invited Talk "Machine Intelligence with Speech and Language" at Tsinghua Data Science Institute.
URL |
slides
2015
Oct. 2015: Present "Trans-dimensional Random Fields (TDRF) for Sequence Modeling" at SANE workshop, held at Google in New York City, NY.
We point out the potential of applying random fields for sequence modeling, demonstrated by its success in language modeling.
poster
July 2015: We release the code for our ACL-2015 Long Paper "Trans-dimensional Random Fields for Language Modeling" ! The TRF-LMs lead to performances as good as the RNN-LMs but are computationally more efficient in computing sentence probability (200x faster).
pdf |
poster |
code readme |
code download
June 2015: Invited Talk "Probabilistic Modeling of Speech and Language" at IBM TJ Watson Research Center, New York.
slides
June 2015: Invited Talk "Probabilistic Modeling of Speech and Language" at Mitsubishi Electric Research Laboratories (MERL), Boston.
May 2015: Invited Talk "Probabilistic Modeling of Speech and Language" at ICSI, Berkeley.
abstract
March 2015: Invited Talk "Probabilistic Modeling of Speech" at the DSP Seminar, UIUC.
slides
2014
August 2014: A brief introduction to SPMI (Speech Processing and Machine Intelligence) Lab, as the orientation material for post-graduate freshman in the EE Department.
two slides in Chinese