Zhijian Ou @ Tsinghua University

Talk videos can be found here at bilibili, Chinese blogs here at zhihu :)

Book

Zhijian Ou.
Energy-Based Models with Applications to Speech and Language Processing.
Foundations and Trends® in Signal Processing, Vol. 18, No. 1-2, pp 1-199, Now Publishers, 2024.
URL | arxiv
Zhijian Ou, Gang Li.
Probability Theory and Stochastic Process (in Chinese).
Tsinghua University Press, 2022.
URL

2026

Ziwei Li, Lukuang Dong, Saierdaer Yusuyin, Xianyu Zhao, Zhijian Ou.
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR.
arxiv
Lukuang Dong, Ziwei Li, Saierdaer Yusuyin, Xianyu Zhao, Zhijian Ou.
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition.
arxiv
Hanwen Liu, Saierdaer Yusuyin, Hao Huang, Zhijian Ou.
CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment.
arxiv
Qiong Wu, Mingyu Wang, Ying Hu, Xinchun Ma, Zhijian Ou.
A Dual Consistency Training (DCT) strategy for polyphonic sound event detection.
Neurocomputing, 2026, Vol. 673, Page 132880.
URL

2025

Hongyu Cao, Yuxuan Wu, Yucheng Cai, Xianyu Zhao, Zhijian Ou.
Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation.
arxiv
Saierdaer Yusuyin, Te Ma, Hao Huang, Zhijian Ou.
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation.
IEEE Transactions on Audio, Speech and Language Processing, 2026, Vol. 34, Page 272 – 284.
URL | arxiv | code
Yucheng Cai, Yuxuan Wu, Yi Huang, Junlan Feng, Zhijian Ou.
Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems.
2025 National Conference on Man-Machine Speech Communication (NCMMSC).
pdf | arxiv
Te Ma, Nanjie Li, Hao Huang, Zhijian Ou.
Phoneme-based speech recognition driven by large language models and sampling marginalization.
2025 National Conference on Man-Machine Speech Communication (NCMMSC).
pdf
Te Ma, Min Bi, Saierdaer Yusuyin, Hao Huang, Zhijian Ou.
LLM-based phoneme-to-grapheme for phoneme-based speech recognition.
INTERSPEECH 2025.
pdf | slides | code
Xiangzhu Kong, Hao Huang, Zhijian Ou.
Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform.
INTERSPEECH 2025.
pdf | poster | code
Jiabo Jing, Ying Hu, Hao Huang, Liang He, Zhijian Ou
A Joint Network for Singing Melody Extraction from Polyphonic Music with Attention Aggregation and Self-Consistency Training.
INTERSPEECH 2025.
Yucheng Cai, Ke Li, Yi Huang, Junlan Feng, Zhijian Ou.
Entriever: Energy-based Retriever for Knowledge-Grounded Dialog Systems.
ACL 2025 Findings.
pdf | arxiv | poster | code
Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou.
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision.
IEEE Transactions on Audio, Speech and Language Processing, 2025, Vol. 33, Page 1440-1453.
URL | arxiv | code
Ying Hu, Qin Yang, Wenbing Wei, Li Lin, Liang He, Zhijian Ou, Wenzhong Yang.
MN-Net: Speech Enhancement Network via Modeling the Noise.
IEEE Transactions on Audio, Speech and Language Processing, 2025, Vol. 33, Page 1208-1219.
URL

2024

Yucheng Cai, Si Chen, Yuxuan Wu, Yi Huang, Junlan Feng, Zhijian Ou.
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG).
SLT, 2024.
arxiv | poster | Challenge website
Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou.
An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought.
ISCSLP, 2024.
arxiv | code
Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou.
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training.
ISCSLP, 2024.
arxiv
Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou.
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR.
ISCSLP, 2024.
arxiv
Xiangzhu Kong, Tianqi Ning, Hao Huang, Zhijian Ou.
A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations.
ISCSLP, 2024.
arxiv | code
Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li.
UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt.
LREC-COLING, 2024.
pdf

2023

Qiyan Song, Zhijian Ou.
Modified Frequency-Sliding Generalized Cross Correlation for Time Delay Difference Estimation of Microphone Array.
IEEE Sensors Journal, 2023, Volume 23, Issue 24, Page 31038-31049.
URL
Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng .
Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking.
ASRU, 2023.
pdf | arxiv | slides | code
Xinwei Zhang, Zhiqiang Tan, Zhijian Ou.
Persistently Trained, Diffusion-assisted Energy-based Models.
Stat, 2023.
pdf | URL | arxiv
Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng.
Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision.
INTERSPEECH, 2023.
pdf | arxiv | slides | video | code
Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao.
Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition.
INTERSPEECH, 2023.
pdf | arxiv | slides | code
Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng.
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems.
IEEE/ACM Transactions on Audio, Speech and Language Processing, 2023, Vol. 31, Page 970-984.
URL | pdf | arxiv | code

2022

Hong Liu, Hao Peng, Zhijian Ou, Juanzi Li, Yi Huang, Junlan Feng.
Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset.
EMNLP 2022 SereTOD Workshop.
pdf | arxiv | code at github
Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng.
A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems.
EMNLP 2022 SereTOD Workshop.
pdf | arxiv | code at github
Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng.
Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems.
SLT, 2022.
pdf | arxiv | code at github
Keyu An, Ji Xiao, Zhijian Ou.
Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study.
ISCSLP, 2022.
pdf | arxiv | code at github
Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng.
Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models.
SIGDIAL, 2022.
pdf | arxiv | code at github
Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan.
An Empirical Study of Language Model Integration for Transducer based Speech Recognition.
INTERSPEECH, 2022.
pdf | arxiv | code at CAT toolkit
Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan.
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR.
INTERSPEECH, 2022.
pdf | arxiv | code at CAT toolkit
Zhijian Ou, Junlan Feng, Juanzi Li, Yakun Li, Hong Liu, Hao Peng, Yi Huang, Jiangjiang Zhao.
A Challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems.
arxiv | challenge website
Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng.
Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture.
An early version of Markovian Generative Architectures (MGA) and Generative User Simulator (GUS)
arxiv

2021

Huahuan Zheng*, Wenjie Peng*, Zhijian Ou, Jinsong Zhang. (* Equal contribution and random listing)
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers.
arxiv | code at CAT toolkit
Dai, Yinpei, Yichi Zhang, Hong Liu, Zhijian Ou, Yi Huang, and Junlan Feng.
Elastic CRFs for Open-Ontology Slot Filling.
Applied Sciences, vol.11, 2021. (Selected Papers from 16th National Conference on Man-Machine Speech Communication (NCMMSC2021))
URL | pdf
Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou.
Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings.
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2021.
pdf | slides | video | code at CAT toolkit | arxiv
Yunfu Song, Huahuan Zheng, Zhijian Ou.
An empirical comparison of joint-training and pre-training for domain-agnostic semi-supervised learning via energy-based models.
IEEE Workshop on Machine Learning for Signal Processing (MLSP), 2021.
pdf | slides | video | code at github
Keyu An, Yi Zhang, Zhijian Ou.
Deformable TDNN with adaptive receptive fields for speech recognition.
INTERSPEECH, 2021.
pdf | code at CAT toolkit
Huahuan Zheng, Keyu An, Zhijian Ou.
Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients.
SLT, 2021.
pdf | arxiv | code at github
Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao.
The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines.
SLT, 2021.
pdf | arxiv | challenge website

2020

Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng.
A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning.
EMNLP, 2020.
pdf | arxiv | code at github
Keyu An, Hongyu Xiang. Zhijian Ou.
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency.
INTERSPEECH, 2020.
pdf | arxiv | code at github
Yichi Zhang, Yinpei Dai, Zhijian Ou, Huixin Wang, Junlan Feng.
Improved Learning of Word Embeddings with Word Definitions and Semantic Injection.
INTERSPEECH, 2020.
pdf | code at github
Zhijian Ou, Yunfu Song.
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models.
UAI, 2020.
pdf | arxiv | presentation video | code at github
Silin Gao, Yichi Zhang, Zhijian Ou and Zhou Yu.
Paraphrase Augmented Task-Oriented Dialog Generation.
ACL, 2020.
pdf | code at github
Yunfu Song, Zhijian Ou, Zitao Liu, Songfan Yang.
Upgrading CRFs to JRFs and its benefits to sequence modeling and labeling.
ICASSP, Barcelona, Spain, 2020.
URL | pdf | slides
Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu.
Integrating discrete and neural features via mixed-feature trans-dimensional random field language models.
ICASSP, Barcelona, Spain, 2020. (oral)
URL | pdf | slides
Keyu An, Hongyu Xiang. Zhijian Ou.
CAT: CRF-based ASR Toolkit.
arxiv | code at github
Yichi Zhang, Zhijian Ou, Zhou Yu.
Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context.
AAAI, New York, USA, 2020.
pdf | code at github
Yunfu Song, Zhijian Ou.
Generative Modeling by Inclusive Neural Random Fields with Applications in Image Generation and Anomaly Detection.
pdf | arxiv | code at github

2019

Zhiqiang Tan, Yunfu Song, Zhijian Ou.
Calibrated Adversarial Algorithms for Generative Modeling.
Stat, 2019.
URL | pdf
Yunfu Song, Zhijian Ou.
Semi-supervised Seq2seq Joint-stochastic-approximation Autoencoders with Applications to Semantic Parsing.
IEEE Signal Processing Letters, vol. 27, p.31-35, 2019.
URL | pdf | code at github
Hongyu Xiang, Zhijian Ou.
CRF-based Single-stage Acoustic Modeling with CTC Topology.
ICASSP, Brighton, UK, 2019.
URL | pdf | slides | oral video | code at github
Kai Hu, Zhijian Ou, Min Hu, Junlan Feng.
Neural CRF Transducers for Sequence Labeling.
ICASSP, Brighton, UK, 2019.
URL | pdf | poster | SPMISeq toolkit at github

2018

Zhijian Ou.
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling.
arxiv
Yunfu Song, Zhijian Ou.
Learning Neural Random Fields with Inclusive Auxiliary Generators.
pdf | arxiv | code at github
Yunfu Song, Zhijian Ou.
Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning.
pdf | arxiv
Wenbo He, Zhijian Ou.
Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning.
pdf | arxiv
Yutian Li, Zhijian Ou.
THU-SPMI System For NIST 2018 Speaker Recognition Evaluation.
NIST SRE-18 Workshop, Athens, Greece, 2018 Dec.
poster
Bin Wang, Zhijian Ou.
Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation.
IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece, 2018 Dec.
pdf | poster | arxiv
Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin.
Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units.
International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, 2018 Nov.
pdf | slides | arxiv
Yutian Li, Feng Gao, Zhijian Ou, Jiasong Sun.
Angular Softmax Loss for End-to-end Speaker Verification. (Best Student Paper Award)
International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, 2018 Nov.
pdf | slides | arxiv
Yichi Zhang, Zhijian Ou.
Learning Sparse Structured Ensembles With Stochastic Gradient MCMC Sampling and Network Pruning.
IEEE Workshop on Machine Learning for Signal Processing (MLSP), Aalborg, Denmark, 2018 Sept.
pdf | slides | longer version at arxiv
Yinpei Dai, Zhijian Ou, Dawei Ren, Pengfei Yu.
Tracking of enriched dialog states for flexible conversational information access.
ICASSP, Calgary, Canada, 2018.
pdf | poster | data readme | data download | code at github
Bin Wang, Zhijian Ou.
Learning neural trans-dimensional random field language models with noise-contrastive estimation.
ICASSP, Calgary, Canada, 2018.
pdf | poster
Bin Wang, Zhijian Ou, Zhiqiang Tan.
Learning Trans-dimensional Random Fields with Applications to Language Modeling.
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018, 40(4):876-890.
pdf (including appendices) | SPMILM toolkit at github | doxygen documentation

2017

Bin Wang, Zhijian Ou.
Language modeling with Neural trans-dimensional random fields.
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Okinawa, Japan, 2017 Dec.
pdf | poster
Yiyan Wang, Haotian Xu, Zhijian Ou.
Joint Bayesian Gaussian discriminant analysis for speaker verification.
ICASSP, New Orleans, USA, 2017 Mar.
pdf | poster

2016

Yiyan Wang, Haotian Xu, Zhijian Ou.
The THU-SPMI SRE-16 System with Joint Bayesian Scoring and Ladder Network based Feature Learning.
NIST SRE-16 Workshop, San Diego, USA, 2016 Dec.
pdf | teaser presentation | poster
Hongyu Xiang, Bin Wang and Zhijian Ou.
The THU-SPMI CHiME-4 system : Lightweight design with advanced multi-channel processing, feature enhancement, and language modeling.
CHiME Workshop, San Francisco, USA, 2016 Sept.
pdf | poster | SPMIArray toolkit at github
Bin Wang, Zhijian Ou, Yong He, Akinori Kawamura.
Model Interpolation with Trans-dimensional Random Field Language Models for Speech Recognition.
arxiv
Haotian Xu, Zhijian Ou.
Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering.
IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24(5).
pdf | data readme | data download
Haotian Xu, Zhijian Ou.
Joint Stochastic Approximation Learning of Helmoltz Machines.
International Conference on Learning Representations (ICLR) 2016 Workshop Track, Puerto Rico, USA, 2016 May.
pdf | poster
Ruobai Wang, Yang Zhang, Zhijian Ou and Mark Hasegawa-Johnson.
Use of Particle Filtering and MCMC for Inference in Probabilistic Acoustic Tube Model.
IEEE Workshop on Statistical Signal Processing (SSP), Palma de Mallorca, Spain, 2016 June.
pdf | poster
Jinye Zhang, Zhijian Ou.
Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection.
IEEE Workshop on Statistical Signal Processing (SSP), Palma de Mallorca, Spain, 2016 June.
pdf | poster | longer version at arxiv | code readme | code download

2015

Yang Zhang, Zhijian Ou and Mark Hasegawa-Johnson.
Incorporating AM-FM effect in voiced speech for probabilistic acoustic tube model.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, USA, 2015 Oct.
pdf | poster
Bin Wang, Zhijian Ou and Zhiqiang Tan.
Trans-dimensional Random Fields for Language Modeling.
Annual Meeting of the Association for Computational Linguistics (ACL Long Paper), Beijing, China, 2015 July.
pdf | poster | code readme | code download

2014

Yang Zhang, Zhijian Ou, Mark Hasegawa-Johnson.
Improvement of Probabilistic Acoustic Tube Model for Speech Decomposition.
ICASSP, Florence, Italy, 2014 May.
pdf | poster
Bin Wang, Zhijian Ou, Jian Li, Akinori Kawamura.
Joint-Character-POC N-gram Language Modeling For Chinese Speech Recognition.
International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, 2014 Sept.
pdf | slides

2012

Xin He, Zhijian Ou, Jiasong Sun.
Joint N-gram Chinese Language Modeling with an Application to Chinese Word Segmentation.
IEEE International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, 2012.
pdf
Zhijian Ou, Yang Zhang.
Probabilistic Acoustic Tube: A Probabilistic Generative Model of Speech for Speech Analysis/Synthesis.
International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Spain, 2012 Apr.
pdf | appendix | demo | poster
Zhijian Ou, Huaqing Luo.
CRF-based Confidence Measures of Recognized Candidates for Lattice-based Audio Indexing.
ICASSP, Kyoto, Japan, 2012 Mar.
pdf | poster
Zhijian Ou, Kan Deng.
Combining Eigenvoice Speaker Modeling and VTS-based Environment Compensation for Robust Speech Recognition.
ICASSP, Kyoto, Japan, 2012 Mar.
pdf | poster

2011

Yun Wang, Zhijian Ou.
Combining HMM-based Melody Extraction and NMF-based Soft Masking for Separating Voice and Accompaniment from Monaural Audio.
ICASSP, Prague, Czech, 2011,5.
pdf | oral video | demo | github code

2010

Zhijian Ou, Ji Xiao.
A Study of Large Vocabulary Speech Recognition Decoding Using Finite-state Graphs.
International Symposium on Chinese Spoken Language Processing (ISCSLP), Tainan, Taiwan, 2010 Dec.
pdf
Yimin Tan, Zhijian Ou.
Topic-weak-correlated Latent Dirichlet Allocation.
International Symposium on Chinese Spoken Language Processing (ISCSLP), Tainan, Taiwan, 2010 Dec.
pdf | poster | code readme | code download
Yi Sun, Zhijian Ou, Wei Hu, Yimin Zhang.
Excited Commentator Speech Detection with Unsupervised Model Adaptation for Soccer Highlight Extraction.
IEEE International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, 2010 Nov.
pdf | slides
Qin Shi, Kun Li, Shilei Zhang, Stephen Chu, Ji Xiao, Zhijian Ou.
Spoken English assessment system for non-native speakers using acoustic and prosodic features.
INTERSPEECH, Makuhari, Japan, 2010,9.
pdf
Nan Ding, Zhijian Ou.
Variational nonparametric Bayesian hidden Markov model.
ICASSP, Dallas, USA, 2010,3.
pdf | poster

2008

Zhijian Ou, Jun Luo.
Eigenspace Estimation with Missing Values and its Application to Eigenvoice Adaptation for Speech Recognition.
IEEE International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, 2008 July.
pdf
Cong Li, Zhijian Ou, Wei hu, Tao Wang, Yimin Zhang.
Caption-aided speech detection in videos.
ICASSP, Las Vegas, USA, 2008 Apr.
pdf | poster
孙怿, 欧智坚, 胡炜.
利用无监督自适应的兴奋解说检测和体育比赛精彩片断提取.
计算机应用与软件, 2008, 25(11): 20~22.

2007

Zhijian Ou, Jun Luo.
Latent correlation analysis of HMM parameters for speech recognition.
ICASSP, Hawaii, USA, 2007,4.
pdf | poster
Hui Lin, Zhijian Ou.
Switching Auxiliary Chains for Speech Recognition.
IEEE Signal Processing Letters, 2007, 14(8): 568~571.
pdf
Xianyu Zhao, Zhijian Ou.
Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition.
IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(3): 1114~1122.
pdf
孙怿, 欧智坚, 孙甲松.
图模型推理的层次消息传递算法.
2007年全国模式识别学术会议, 北京, 2007,12. (科学出版社)
pdf

2006

Hui Lin, Zhijian Ou.
Switching Auxiliary Chains for Speech Recognition based on Dynamic Bayesian Networks.
International Conference on Pattern Recognition (ICPR), HongKong, 2006 Aug.
pdf
Hui Lin, Zhijian Ou.
Partial-tied-mixture Auxiliary Chain Models for Speech Recognition Based on Dynamic Bayesian Networks.
IEEE International Conference on Systems, Man and Cybernetics (SMC), Taipei, Taiwan, 2006 Oct.
pdf
Hui Lin, Zhijian Ou, Xi Xiao.
Generalized Time-series Active Search with Kullback-Leibler Distance for Audio Fingerprinting.
IEEE Signal Processing Letters, 2006, 13(8): 464~468.
pdf
Jun Luo, Zhijian Ou, Zuoying Wang.
Eigenvoice-based MAP adaptation within correlation subspace.
Frontiers of Electrical and Electronic Engineering in China - Selected Publications from Chinese Universities, Higher Education Press and Springer-Verlag, 2006, Vol.1, No.2: 130~134.
Xianyu Zhao, Zhijian Ou, Zuoying Wang.
Using Vector Taylor Series with Noise Clustering for Speech Recognition in Non-stationary Noisy Environments.
High Technology Letters, 2006, 12(1): 18~23.
pdf
罗骏, 欧智坚.
一种高效的语音关键词检索系统.
通信学报, 2006, 27(2): 113~118.
pdf

2005

Jun Luo, Zhijian Ou, Zuoying Wang.
Discriminative Speaker Adaptation with Eigenvoices.
EUROSPEECH, Lisbon, Portugal, 2005,9.
pdf
Xianyu Zhao, Zhijian Ou, Minhua Chen, Zuoying Wang.
Closely coupled array processing and model-based compensation for microphone array speech recognition.
ICASSP, Philadelphia, USA, 2005,3.
pdf
Xianyu Zhao, Zhijian Ou, Zuoying Wang.
Space Discriminative Function For Microphone Array Robust Speech Recognition.
High Technology Letters, 2005, 11(4): 351~354.
pdf
王晶莹，王作英，欧智坚.
最大似然线性回归说话人自适应算法在LPHMM中的应用. (会议优秀论文)
第八届全国人机语音通讯学术会议(汇编于: 声学技术, Vol.24: 206~209), 北京, 2005,10.
罗骏, 欧智坚.
一种高效的语音关键词检索系统. (推荐发表到通信学报)
全国网络与信息安全技术研讨会(NetSec2005), 北京, 2005,8, Vol.2, 121~128.
罗骏, 欧智坚, 王作英.
基于拼音图的两阶段关键词检索系统.
清华大学学报, 2005, 45(10): 1356~1359.
赵贤宇, 欧智坚, 王作英.
基于VTS的稳健语音识别的研究.
清华大学学报, 2005, 45(7): 892~895.
肖述才, 欧智坚, 王作英.
语音识别中的说话人聚类算法.
中文信息学报, 2005, 19(4): 84~88.

2004

Zhijian Ou, Zuoying Wang.
Discriminative combination of multiple linear predictions for speech recognition.
ICSLP, Jeju, Korea, 2004 Oct.
pdf | slides
欧智坚, 罗骏, 谢达东, 赵贤宇, 林晖, 王作英.
多功能语音/音频信息检索系统的研究与实现.
全国网络与信息安全技术研讨会(NetSec2004), 北京, 2004,8. 106~112.
赵贤宇, 欧智坚, 王作英.
稳健语音识别中利用词图的无监督矢量泰勒级数展开算法研究.
高技术通讯, 2004, 14: 17~22.
罗骏, 欧智坚, 王作英.
基于相关子空间本征音分析的MAP快速自适应算法.
清华大学学报, 2004, 44(6): 829~832.
pdf
罗骏, 欧智坚, 王作英.
说话人自适应训练方法在连续语音识别中的应用.
中文信息学报, 2004, 18(3): 61~65.

2003

欧智坚, 王作英.
汉语连续语音识别中多项式拟合语音轨迹模型的研究.
电子学报, 2003, 31(4): 608~611.
pdf

2002

Zhijian Ou, Zuoying Wang.
A combined model of statics-dynamics of speech optimized using maximum mutual information.
ICSLP, Denver, Colorado, 2002,9. 2629~2632.
pdf
Zhijian Ou, Zuoying Wang.
A new combined model of statics-dynamics of speech.
ICASSP, Orlando, Florida, 2002,5. 965~968.
pdf | poster
欧智坚, 王作英.
从线性预测HMM到一种新的语音识别的混合模型.
电子学报, 2002, 30(9): 1313~1316.
pdf

2001

Zhijian Ou, Zuoying Wang.
A New DP-like Speaker Clustering Algorithm.
EUROSPEECH, Aalborg, Denmark, 2001,9. 791~794.
pdf
欧智坚, 王作英.
一种基于DDBHMM的利用帧间相关性的混合模型.
第六届全国人机语音通讯学术会议, 深圳, 2001,11. 277~280.

Zhijian Ou - Paper

Talk videos can be found here at bilibili, Chinese blogs here at zhihu :)

Book

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2012

2011

2010

2008

2007

2006

2005

2004

2003

2002

2001