Dan Povey's homepage

Dan Povey's publications

These are in reverse order of time. The list may not be complete.

2024

"Zipformer: A faster and better encoder for automatic speech recognition", Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey, ICLR 2024 (submitted) (pdf)

"Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context", Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey, ICASSP 2024 (pdf)

"PromptASR for contextualized ASR with controllable style", Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey, ICASSP 2024 (pdf)

"Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS", Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen, ICASSP 2024 (pdf)

2023

"Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition", Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur, ASRU 2023 (pdf)

"Alternative pseudo-labeling for semi-supervised automatic speech recognition", Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan, IEEE/ACM Transactions on Audio, Speech, and Language Processing (pdf)

"Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts", Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur, Interspeech 2023 (pdf)

"GPU-accelerated guided source separation for meeting transcription", Desh Raj, Daniel Povey, Sanjeev Khudanpur, Interspeech 2023 (pdf)

"Delay-penalized transducer for low-latency streaming ASR", Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey, ICASSP 2023 (pdf)

"Fast and parallel decoding for transducer", Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey, ICASSP 2023 (pdf)

"Building Keyword Search System from End-To-End Asr Systems", Ruizhe Huang, Matthew Wiesner, Leibny Paola Garcia-Perera, Dan Povey, Jan Trmal, Sanjeev Khudanpur, ICASSP 2023 (pdf)

"Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation", Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Żelasko, Daniel Povey, ICASSP 2023 (pdf)

2022

"Pruned RNN-T for fast, memory-efficient ASR training", Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey, Interspeech 2022, (pdf)

2021

"Multistream CNN for robust acoustic modeling", Kyu J Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey, ICASSP 2021 (pdf)

"LET-Decoder: A WFST-based lazy-evaluation token-group decoder with exact lattice generation", Hang Lv, Daniel Povey, Mahsa Yarmohammadi, Ke Li, Yiming Wang, Lei Xei, Sanjeev Khudanpur, IEEE Signal Processing Letters (submitted) (pdf)

"A parallelizable lattice rescoring strategy with neural language models", Ke Li, Daniel Povey, Sanjeev Khudanpur, ICASSP 2021 (pdf)

"An asynchronous WFST-based decoder for automatic speech recognition", Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur, ICASSP 2021 (pdf)

"Wake word detection with streaming transformers", Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur, ICASSP 2021 (pdf)

"DOVER-Lap: A method for combining overlap-aware diarization outputs", Desh Raj, Paola Garcia, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur, IEEE SLT 2021 (pdf)

2020

Notes on modified gravity, v1.0 v1.1 .

"Efficient MDI adaptation for n-gram language models", Ruizhe Huang, Ke Li, Ashish Arora, Daniel Povey, Sanjeev Khudanpur, Interspeech 2020 (pdf)

"Multistream CNN for robust acoustic modeling", Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey, Interspeech 2020 (pdf)

"Neural language modeling with implicit cache pointers", Ke Li, Daniel Povey, Sanjeev Khudanpur, Interspeech 2020 (pdf)

"Wake word detection with alignment-free lattice-free MMI", Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur, Interspeech 2020 (pdf)

"PyChain: A fully parallelized PyTorch implementation of LF-MMI for end-to-end ASR", Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, Interspeech 2020 (pdf)

"OOV recovery with efficient 2nd pass decoding and open-vocabulary word-level RNNLM rescoring for hybrid ASR", Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur, ICASSP 2020 (pdf)

"An empirical study of transformer-based neural language model adaptation", Ke Li, Zhe Liu, Tianxing He, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur, ICASSP 2020 (pdf)

2019

"Probing the information encoded in x-vectors", Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur, IEEE ASRU 2019 (pdf)

"Incremental lattice determinization for WFST decoders", Zhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv, Lei Xie, Daniel Povey, Sanjeev Khudanpur, IEEE ASRU 2019 (pdf)

"X-vector DNN refinement with full-length recordings for speaker recognition", Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, Interspeech 2019 (pdf)

"The JHU ASR system for VOiCES from a Distance challenge 2019", Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Shankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur, Interspeech 2019 (pdf)

"The JHU speaker recognition system for the VOiCES 2019 challenge", David Snyder, Jesus Villalba, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur, Interspeech 2019 (pdf)

"Speaker recognition benchmark using the CHiME-5 corpus", Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, Interspeech 2019 (pdf)

"State-of-the-art speaker recognition for telephone and video speech: the JHU-MIT submission for NIST SRE18", Jesus Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, Francois Grondin, Reda Dehak, Leibny Paola Garcia-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak, Interspeech 2019 (pdf)

"Robust document representations for cross-lingual information retrieval in low-resource settings", Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh, Proceedings of Machine Translation Summit XVII Volume 1: Research Track (pdf)

"Using ASR methods for OCR", Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, Shinji Watanabe, Vimal Manohar, Yiwen Shao, Sanjeev Khudanpur, ICDAR 2019 (pdf)

"Speaker recognition for multi-speaker conversations using x-vectors", David Snyder, Daniel Garcia-Romero, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, ICASSP 2019 (pdf)

2018

"Notes on the derivative of SVD", Daniel Povey, unpublished notes, (pdf)

"A teacher-student learning approach for unsupervised domain adaptation of sequence-trained ASR models", Vimal Manohar, Pegah Ghahremani, Daniel Povey, Sanjeev Khudanpur, IEEE SLT 2018 (pdf)

"Improving LF-MMI using unconstrained supervisions for ASR", Hossein Hadian, Daniel Povey, Hossein Sameti, Jan Trmal, Sanjeev Khudanpur, IEEE SLT 2018 (pdf)

"Output-Gate Projected Gated Recurrent Unit for Speech Recognition", Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur, Yonghong Yan, Interspeech 2018 (pdf)

"Emotion Identification from raw speech signals using DNNs", Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak, Interspeech 2018 (pdf)

"Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge", Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesus Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"End-to-End Deep Neural Network Age Estimation", Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur, Najim Dehak, Interspeech 2018 (pdf)

"Acoustic Modeling from Frequency-Domain Representations of Speech", Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition", Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification", Yingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey, Interspeech 2018 (pdf)

"Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks", Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohamadi, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"A GPU-based WFST Decoder with Exact Lattice Generation", Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"Spoken Language Recognition using X-vectors", David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, Odyssey 2018 (pdf)

"Semi-Supervised Training of Acoustic Models using Lattice-Free MMI", Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur, ICASSP 2018 (pdf)

"X-vectors: Robust DNN Embeddings for Speaker Recognition", David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, ICASSP 2018 (pdf)

"A Time-Restricted Self-Attention Layer for ASR", Daniel Povey, Hossein Hadian, Pegah Ghahremani, Ke Li, Sanjeev Khudanpur, ICASSP 2018 (pdf)

"End-to-end speech recognition using lattice-free MMI", Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur, Interspeech 2018 (pdf)

"Neural network language modeling with letter-based features and importance sampling", Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2018 (pdf)

"A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition", Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey and Sanjeev Khudanpur, ICASSP 2018 (pdf)

2017

"JHU Kaldi System for Arabic MGB-3 ASR Challenge using Diarization, Audio-transcript Alignment and Transfer Learning": Vimal Manohar, Daniel Povey, Sanjeev Khudanpur, ASRU 2017 (pdf)

"Investigation of Transfer Learning for ASR using LF-MMI Trained Neural Networks": Pegah Ghahremani, Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur, ASRU 2017 (pdf)

"Backstitch: Counteracting Finite-sample Bias via Negative Steps": Yiming Wang, Hossein Hadian, Shuoyang Ding, Ke Li, Hainan Xu, Xiaohui Zhang, Daniel Povey and Sanjeev Khudanpur, NIPS (submitted), 2017. (pdf)

"Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework", Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur, Interspeech 2017 (pdf)

"Deep Neural Network Embeddings for Text-Independent Speaker Verification", David Snyder, Daniel Garcia-Romero, Daniel Povey and Sanjeev Khudanpur, Interspeech 2017 (pdf)

"Low latency modeling of temporal contexts": Vijayaditya Peddinti, Yiming Wang, Daniel Povey and Sanjeev Khudanpur, IEEE Signal Processing Letters, 2017. (pdf)

"Backstitch: Counteracting Finite-sample Bias via Negative Steps": Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur, Interspeech 2017 (pdf)

"An exploration of dropout with LSTMs" Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar Sanjeev Khudanpur and Yonghong Yan, Interspeech 2017 (pdf)

"A study on data augmentation of reverberant speech for robust speech recognition", Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer and Sanjeev Khudanpur, ICASSP 2017 (pdf)

"Speaker diarization using deep neural network embeddings", "Daniel Garcia-Romero, David Snyder, Gregory Sell, Daniel Povey, and Alan McCree", ICASSP 2017 (pdf)

2016

"Deep Neural Network-based Speaker Embeddings for End-to-end Speaker Verification", David Snyder, Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, Yishay Carmiel and Sanjeev Khudanpur, IEEE Spoken Language Workshop (SLT) 2016 (pdf)

"Far-field ASR without parallel data", Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey and Sanjeev Khudanpur, Interspeech 2016, (pdf)

"Purely sequence-trained neural networks for ASR based on lattice-free MMI", Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang and Sanjeev Khudanpur, Interspeech 2016, (pdf) (slides,pptx)

"Acoustic modelling from the signal domain using CNN" Pegah Ghahremani, Vimal Manohar, Daniel Povey and Sanjeev Khudanpur, Interspeech 2016, (pdf)

2015

"Time delay Deep Neural Network-based Universal Background Models for Speaker Recognition", David Snyder, Daniel Garcia-Romero, Daniel Povey, ASRU 2015 (pdf)

"Pronunciation and Silence Probability Modeling for ASR", Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"JHU aspire system: Robust LVCSR with TDNNs, ivector adaptation and RNN-LMs.", Peddinti, V., Chen, G., Manohar, V., Ko, T., Povey, D., & Khudanpur, S., ASRU 2015 (pdf)

"A Diversity-Penalizing Ensemble Training Method for Deep Learning", Xiaohui Zhang, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Modeling Phonetic Context with Non-random Forests for Speech Recognition", Hainan Xu, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"A time delay neural network architecture for efficient modeling of long temporal contexts", Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Reverberation robust acoustic modeling using i-vectors with time delay neural networks", Vijayaditya Peddinti, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Audio Augmentation for Speech Recognition", Tom Ko, Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Semi-supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models", Vimal Manohar, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging", Daniel Povey, Xiaohui Zhang and Sanjeev Khudanpur, ICLR Workshop 2015 (ArXiv) (poster)

"LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015 (pdf)

2014

"A keyword search system using open source software", Jan Trmal, Guoguo Chen, Dan Povey, Sanjeev Khudanpur, Pegah Ghahremani, Xiaohui Zhang, Vimal Manohar, Chunxi Liu, Aren Jansen, Dietrich Klakow, David Yarowsky, Florian Metze, Proc. SLT, 2014 (pdf)

"Improving Deep Neural Network Acoustic Models using Generalized Maxout Networks", Xiaohui Zhang, Jan Trmal, Daniel Povey and Sanjeev Khudanpur, ICASSP 2014 (pdf)

"Improving Speaker Recognition Performance in the Domain Adaptation Challenge using Deep Neural Networks", D. Garcia-Romero, X. Zhang, A. McCree, and D. Povey, Proc. SLT, 2014 (pdf)

A Pitch Extraction Algorithm Tuned for Automatic Speech Recognition", Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal and Sanjeev Khudanpur, ICASSP 2014 (pdf)

"Some Insights from Translating Conversational Telephone Speech", G. Kumar, M. Post, D. Povey and S. Khudanpur, ICASSP 2014 (pdf)

2013

"Quantifying the value of pronunciation lexicons for keyword search in lowresource languages," Guoguo Chen, Sanjeev Khudanpur, Daniel Povey, Jan Trmal, David Yarowsky, and Oguz Yilmaz. ICASSP 2013, pp. 8560-8564. (pdf)

"Using Proxies for OOV keywords in the Keyword Search Task", Guoguo Chen, Oguz Yilmaz, Jan Trmal, Daniel Povey, and Sanjeev Khudanpur, ASRU 2013, (pdf)

"Improved feature processing for Deep Neural Networks", S. P Rath, D. Povey, K. Vesely and J. Cernocky, Interspeech 2013 (pdf)

"Sequence-discriminative training of deep neural networks", K. Vesely, A. Ghoshal, L. Burget and D. Povey, Interspeech 2013 (pdf)

"Combining forward and backward search in decoding", M. Hannemann, D. Povey and G. Zweig, ICASSP 2013 (pdf)

2012

"Krylov Subspace Descent for Deep Learning", Oriol Vinyals and D. Povey, AISTATS 2012 (pdf)

"Generating exact lattices in the WFST framework", D. Povey, M. Hannemann et. al, ICASSP 2012 (pdf)

"Revisiting Semi-continuous Hidden Markov Models", K. Reidhammer, T. Bocklet, A. Ghoshal and D. Povey, ICASSP 2012 (pdf)

"Modeling Gender Dependency in the Subspace GMM Framework", Ngoc Thang Vu, Tanja Schultz and D. Povey, ICASSP 2012 (pdf)

"Revisiting Recurrent Neural Networks for Robust ASR", Oriol Vinyals, Suman V. Ravuri, Daniel Povey, ICASSP 2012 (pdf)

2011

"The Kaldi Speech Recognition Toolkit", D. Povey, A. Ghoshal et. al, ASRU 2011 (accepted) (pdf)

"Speaker Adaptation with an Exponential Transform", Daniel Povey, Geoffrey Zweig and Alex Acero, ASRU 2011 (accepted) (pdf) (+techreport)

"The Subspace Gaussian Mixture Model– a Structured Model for Speech Recognition", D. Povey, Lukas Burget et. al Computer Speech and Language, 2011 (pdf)

"A basis representation of constrained MLLR transforms for robust adaptation", Daniel Povey and Kaisheng Yao, Computer Speech and Language, 2011. (pdf)

"Minimum Bayes Risk decoding and system combination based on a recursion for edit distance", Haihua Xu, Daniel Povey, Lidia Mangu and Jie Zhu, Computer Speech and Language, 2011. (pdf)

"A Basis Method for Robust Estimation of Constrained MLLR", Daniel Povey and Kaisheng Yao, ICASSP 2011 (pdf)

"A Symmetrization of the Subspace Gaussian Mixture Model", Daniel Povey, Martin Karafiat, Arnab Ghoshal, Petr Schwarz, ICASSP 2011 (pdf)

"State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs", Yanmin Qian, Daniel Povey and Jia Lu, Interspeech 2011 (pdf)

2010

The Symmetric Subspace Gaussian Mixture Model: Microsoft Research technical report MSR-TR-2010-138 (pdf)

"Subspace Gaussian Mixture Models for Speech Recognition", D. Povey, Lukas Burget et al., ICASSP 2010. (pdf)

"A Novel Estimation of feature-space MLLR for Full-covariance Models", Arnab Ghoshal, D. Povey et al., ICASSP 2010 (pdf)

"An Improved Consensus-like Method for Minimum Bayes Risk Decoding and Lattice Combination", Haihua Xu, D. Povey, L. Mangu, Jie Zhu, ICASSP 2010 (pdf)

"Multilingual Acoustic Modeling For Speech Recognition Based On Subspace Gaussian Mixture Models", Lukas Burget, Petr Schwarz et. al, ICASSP 2010 (pdf)

"Approaches To Automatic Lexicon Learning With Limited Training Examples", Nagendra Goel, Samuel Thomas et. al, ICASSP 2010 (pdf)

Stephen Chu, Daniel Povey et al., .The 2009 IBM GALE Mandarin Broadcast News Transcription System., ICASSP 2010

Hagen Soltau, George Saon et al., .The IBM 2008 GALE Arabic Speech Transcription System., ICASSP 2010.

Stephen Chu and Daniel Povey, .Speaking Rate Adaptation using Continuous Frame Rate Normalization., ICASSP 2010 (pdf)

"Notes for Affine Transform-based VTLN", Daniel Povey, 2010. (pdf) . These notes were never published, but I'm putting them up here as they are referred to from some Kaldi code

"Approaches to Speech Recognition based on Speaker Recognition Techniques", chapter in forthcoming GALE book (pdf)

2009

For closing presentations from JHU 2009 workshop, see here

"A Tutorial-Style Introduction To Subspace Gaussian Mixture Models For Speech Recognition", Microsoft Research technical report MSR-TR-2009-111(pdf)

Lecture on "estimation techniques in speech recognition" given at JHU CLSP summer school (pdf)

Lecture on "Subspace based/Universal Background Model (UBM) based speech modeling" given at JHU CLSP summer school (pdf)

Lab tutorial on estimation for speech in Octave/Matlab, given at JHU CLSP summer school (pdf)

``Subspace Gaussian Mixture Models for Speech Recognition'', Povey D., Microsoft Research technical report MSR-TR-2009-64 (pdf)

"Minimum Hypothesis Phone Error as a Decoding Method for Speech Recognition", Haihua Xu, Daniel Povey, Jie Zhu and Guanyong Wu, Interspeech 2009 (pdf) (slides,pdf)

2008

Dan Povey & Brian Kingsbury, "Monte Carlo Model-Space Noise Adaptation for Speech Recognition", Interspeech 2008 (pdf)

Daniel Povey, Hong-Kwang J. Kuo, Hagen Soltau, "Fast Speaker Adaptive Training for Speech Recognition", Interspeech 2008 (pdf)

Daniel Povey, Hong-Kwang J. Kuo, "XMLLR for Improved Speaker Adaptation in Speech Recognition", Interspeech 2008 (pdf)

George Saon and Daniel Povey, "Penalty Function Maximization for Large Margin HMM Training", Interspeech 2008 (pdf)

Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon & Karthik Visweswariah, "Boosted MMI for Model and Feature Space Discriminative Training", ICASSP 2008. (pdf)

Balakrishnan Varadarajan & Daniel Povey, “Quick FMLLR for Speaker Adaptation in Speech Recognition”, ICASSP 2008 (pdf)

Daniel Povey, Stephen M Chu & Balakrishnan Varadarajan, "Universal Background Model Based Speech Recognition", ICASSP 2008 (pdf)

2007

Daniel Povey & Brian Kingsbury, "Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training", ICASSP 2007 (pdf)

2006

D. Povey & George Saon, "Feature and model space speaker adaptation with full covariance Gaussians," ICSLP 2006. (pdf)

D. Povey, "SPAM and full covariance for speech recognition," ICSLP 2006. (pdf)

J. Pelecanos, Daniel Povey, Ganesh Ramaswamy, "Secondary Classification for GMM Based Speaker Recognition," ICASSP 2006. (pdf)

Ghinwa Choueiter, Daniel Povey, Stanley Chen & Geoffrey Zweig, "Morpheme-based language. modeling for Arabic LVCSR", ICASSP 2006. (pdf)

Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu and Brian Kingsbury, "Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy", ICASSP 2006. (pdf)

Stanley Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau & Geoffrey Zweig, "Advances in Speech Transcription at IBM under the DARPA EARS Program," 2006, IEEE Transactions on Audio, Speech and Language processing, Vol. 14 , Issue 5, pp. 1596-1608 (pdf)

2005

George Saon, Daniel Povey & Geoffrey Zweig, "Anatomy of an extremely fast LVCSR decoder," Interspeech 2005. (pdf)(poster,pdf)

Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon & Geoffrey Zweig, "The IBM 2004 Conversational Telephony System for Rich Transcription," ICASSP 2005 (pdf)

Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau & Geoffrey Zweig, "fMPE: Discriminatively Trained Features for Speech Recognition," ICASSP 2005 (pdf)

Jing Huang & Daniel Povey, "Discriminatively Trained Features using fMPE for Multi-Stream Audio-Visual Speech Recognition," Interspeech 2005 (pdf)

Hain, T. Woodland, P.C. Evermann, G. Gales, M.J.F. Xunying Liu Moore, G.L. Povey, D. Lan Wang, "Automatic transcription of conversational telephone speech", IEEE Trans on Speech and Audio Procesing, Nov. 2005, vol. 3, Issue 6, pp. 1173-1185. (pdf)

Daniel Povey, "Improvements to fMPE for discriminative training of features," Interspeech 2005 (pdf)

2004

Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, Geoffrey Zweig, "fMPE: Discriminatively trained features for speech recognition," RT'04 meeting, 2004. (pdf)

D. Povey, "Phone Duration Modeling for LVCSR," ICASSP 2004 (pdf)

Saon, G. Dharanipragada, S. Povey, D. "Feature space Gaussianization", ICASSP 2004. (pdf)

2003

Roongroj Nopuswanchai & D. Povey, "Discriminative training for HMM-based offline handwritten character recognition", Proc. Int'l Conf. on Document Analysis and Recognition, 2003. (pdf)

D. Povey, M.J.F. Gales, D.Y. Kim & P.C. Woodland, "MMI-MAP and MPE-MAP for Acoustic Model Adaptation," Eurospeech 2003. (pdf) (slides,pdf)

Daniel Povey, "Recent work on Discriminative Training," Talk given to one day meeting for young speech researchers, London, Apr 24th 2003 (pdf)

Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 (pdf)

D. Povey, P.C. Woodland, and M.J.F. Gales. Discriminative MAP for Acoustic Model Adaptation. In Proc. ICASSP, 2003. (ps)

Daniel Povey, "Minimum Phone Error - Better than MMI," talk given at IBM, 2003 (pdf)

M.J.F. Gales, Y. Dong, D. Povey and P.C. Woodland. "Porting: SwitchBoard to the VoiceMail Task." ICASSP 2003. (ps)

2002

D. Povey & P.C. Woodland, "Minimum Phone Error and I-Smoothing for Improved Discrimative Training," ICASSP 2002 (pdf) (slides,long version,pdf) (slides,short version,pdf)

Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang: "CU-HTK April 2002 Switchboard System", Rich Transcription Workshop 2002. (pdf)

Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., The HTK book (for HTK version 3.2). Technical Report, Cambridge University, Engineering Department, 2002. (pdf)

2001

T. Hain, P.C. Woodland, G. Evermann and D. Povey. "New features in the CU-HTK system for transcription of conversational telephone speech", ICASSP 2001. (pdf)

D. Povey & P.C. Woodland, "Improved Discriminative Training Techniques for Large Vocabulary Speech Recognition," ICASSP 2001 (pdf)

2000

D. Povey and P. C. Woodland, "Large-scale MMIE Training for Conversational Telephone Speech Recognition", Proc. NIST Speech Transcription Workshop, College Park, MD, 2000. (pdf)

Woodland, P.C and Povey, D. "Large Scale Discriminative Training for Speech Recognition", in ASR 2000. (pdf)

1999


D. Povey & P.C. Woodland, "Frame Discrimination Training of HMMs for Large Vocabulary Speech Recognition," Technical report, Cambridge University Engineering Dept., 1999. (pdf)

D. Povey & P.C. Woodland, "Frame Discrimination training of HMMs for Large Vocabulary Speech Recognition," ICASSP 1999 (pdf) (slides,pdf)

Daniel Povey, "Implementation of Frame Discrimination on a large task," MPhil thesis, Cambridge University Engineering Dept, 1999 (pdf)

Contact
The Center for Language and Speech Processing
Hackerman Hall 226
3400 North Charles Street
Baltimore, MD 21218
dpovey AT gmail DOT com
Back to my homepage