Dan Povey's homepage

Dan Povey's publications

These are in reverse order of time. The list may not be complete.

2017

"Backstitch: Counteracting Finite-sample Bias via Negative Steps": Yiming Wang, Hossein Hadian, Shuoyang Ding, Ke Li, Hainan Xu, Xiaohui Zhang, Daniel Povey and Sanjeev Khudanpur, NIPS (submitted), 2017. (pdf)

"Deep Neural Network Embeddings for Text-Independent Speaker Verification", David Snyder, Daniel Garcia-Romero, Daniel Povey and Sanjeev Khudanpur, Interspeech 2017 (pdf)

"Low latency modeling of temporal contexts": Vijayaditya Peddinti, Yiming Wang, Daniel Povey and Sanjeev Khudanpur, IEEE Signal Processing Letters (submitted), 2017. (pdf)

"Backstitch: Counteracting Finite-sample Bias via Negative Steps": Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur, Interspeech 2017 (pdf)

"An exploration of dropout with LSTMs" Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar Sanjeev Khudanpur and Yonghong Yan, Interspeech 2017 (pdf)

"A study on data augmentation of reverberant speech for robust speech recognition", Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer and Sanjeev Khudanpur, ICASSP 2017 (pdf)

"Speaker diarization using neural network embeddings", "Daniel Garcia-Romero, David Snyder, Gregory Sell, Daniel Povey, and Alan McCree", ICASSP 2017 (pdf)

2016

"Deep Neural Network-based Speaker Embeddings for End-to-end Speaker Verification", David Snyder Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, Yishay Carmiel and Sanjeev Khudanpur, IEEE Spoken Language Workshop (SLT) 2016 (pdf)

"Far-field ASR without parallel data", Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey and Sanjeev Khudanpur, Interspeech 2016, (pdf)

"Purely sequence-trained neural networks for ASR based on lattice-free MMI", Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang and Sanjeev Khudanpur, Interspeech 2016, (pdf) (slides,pptx)

"Acoustic modelling from the signal domain using CNN" Pegah Ghahremani, Vimal Manohar, Daniel Povey and Sanjeev Khudanpur, Interspeech 2016, (pdf)

2015

"Time delay Deep Neural Network-based Universal Background Models for Speaker Recognition", David Snyder, Daniel Garcia-Romero, Daniel Povey, ASRU 2015 (pdf)

"Pronunciation and Silence Probability Modeling for ASR", Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"JHU aspire system: Robust LVCSR with TDNNs, ivector adaptation and RNN-LMs.", Peddinti, V., Chen, G., Manohar, V., Ko, T., Povey, D., & Khudanpur, S., ASRU 2015 (pdf)

"A Diversity-Penalizing Ensemble Training Method for Deep Learning", Xiaohui Zhang, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Modeling Phonetic Context with Non-random Forests for Speech Recognition", Hainan Xu, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"A time delay neural network architecture for efficient modeling of long temporal contexts", Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Reverberation robust acoustic modeling using i-vectors with time delay neural networks", Vijayaditya Peddinti, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Audio Augmentation for Speech Recognition", Tom Ko, Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Semi-supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models", Vimal Manohar, Daniel Povey and Sanjeev Khudanpur, Interspeech 2015 (pdf)

"Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging", Daniel Povey, Xiaohui Zhang and Sanjeev Khudanpur, ICLR Workshop 2015 (ArXiv) (poster)

"LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015 (pdf)

2014

"A keyword search system using open source software", Jan Trmal, Guoguo Chen, Dan Povey, Sanjeev Khudanpur, Pegah Ghahremani, Xiaohui Zhang, Vimal Manohar, Chunxi Liu, Aren Jansen, Dietrich Klakow, David Yarowsky, Florian Metze, Proc. SLT, 2014 (pdf)

"Improving Deep Neural Network Acoustic Models using Generalized Maxout Networks", Xiaohui Zhang, Jan Trmal, Daniel Povey and Sanjeev Khudanpur, ICASSP 2014 (pdf)

"Improving Speaker Recognition Performance in the Domain Adaptation Challenge using Deep Neural Networks", D. Garcia-Romero, X. Zhang, A. McCree, and D. Povey, Proc. SLT, 2014 (pdf)

A Pitch Extraction Algorithm Tuned for Automatic Speech Recognition", Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal and Sanjeev Khudanpur, ICASSP 2014 (pdf)

"Some Insights from Translating Conversational Telephone Speech", G. Kumar, M. Post, D. Povey and S. Khudanpur, ICASSP 2014 (pdf)

2013

"Quantifying the value of pronunciation lexicons for keyword search in lowresource languages," Guoguo Chen, Sanjeev Khudanpur, Daniel Povey, Jan Trmal, David Yarowsky, and Oguz Yilmaz. ICASSP 2013, pp. 8560-8564. (pdf)

"Using Proxies for OOV keywords in the Keyword Search Task", Guoguo Chen, Oguz Yilmaz, Jan Trmal, Daniel Povey, and Sanjeev Khudanpur, ASRU 2013, (pdf)

"Improved feature processing for Deep Neural Networks", S. P Rath, D. Povey, K. Vesely and J. Cernocky, Interspeech 2013 (pdf)

"Sequence-discriminative training of deep neural networks", K. Vesely, A. Ghoshal, L. Burget and D. Povey, Interspeech 2013 (pdf)

"Combining forward and backward search in decoding", M. Hannemann, D. Povey and G. Zweig, ICASSP 2013 (pdf)

2012

"Krylov Subspace Descent for Deep Learning", Oriol Vinyals and D. Povey, AISTATS 2012 (pdf)

"Generating exact lattices in the WFST framework", D. Povey, M. Hannemann et. al, ICASSP 2012 (pdf)

"Revisiting Semi-continuous Hidden Markov Models", K. Reidhammer, T. Bocklet, A. Ghoshal and D. Povey, ICASSP 2012 (pdf)

"Modeling Gender Dependency in the Subspace GMM Framework", Ngoc Thang Vu, Tanja Schultz and D. Povey, ICASSP 2012 (pdf)

"Revisiting Recurrent Neural Networks for Robust ASR", Oriol Vinyals, Suman V. Ravuri, Daniel Povey, ICASSP 2012 (pdf)

2011

"The Kaldi Speech Recognition Toolkit", D. Povey, A. Ghoshal et. al, ASRU 2011 (accepted) (pdf)

"Speaker Adaptation with an Exponential Transform", Daniel Povey, Geoffrey Zweig and Alex Acero, ASRU 2011 (accepted) (pdf) (+techreport)

"The Subspace Gaussian Mixture Model– a Structured Model for Speech Recognition", D. Povey, Lukas Burget et. al Computer Speech and Language, 2011 (pdf)

"A basis representation of constrained MLLR transforms for robust adaptation", Daniel Povey and Kaisheng Yao, Computer Speech and Language, 2011. (pdf)

"Minimum Bayes Risk decoding and system combination based on a recursion for edit distance", Haihua Xu, Daniel Povey, Lidia Mangu and Jie Zhu, Computer Speech and Language, 2011. (pdf)

"A Basis Method for Robust Estimation of Constrained MLLR", Daniel Povey and Kaisheng Yao, ICASSP 2011 (pdf)

"A Symmetrization of the Subspace Gaussian Mixture Model", Daniel Povey, Martin Karafiat, Arnab Ghoshal, Petr Schwarz, ICASSP 2011 (pdf)

"State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs", Yanmin Qian, Daniel Povey and Jia Lu, Interspeech 2011 (pdf)

2010

The Symmetric Subspace Gaussian Mixture Model: Microsoft Research technical report MSR-TR-2010-138 (pdf)

"Subspace Gaussian Mixture Models for Speech Recognition", D. Povey, Lukas Burget et al., ICASSP 2010. (pdf)

"A Novel Estimation of feature-space MLLR for Full-covariance Models", Arnab Ghoshal, D. Povey et al., ICASSP 2010 (pdf)

"An Improved Consensus-like Method for Minimum Bayes Risk Decoding and Lattice Combination", Haihua Xu, D. Povey, L. Mangu, Jie Zhu, ICASSP 2010 (pdf)

"Multilingual Acoustic Modeling For Speech Recognition Based On Subspace Gaussian Mixture Models", Lukas Burget, Petr Schwarz et. al, ICASSP 2010 (pdf)

"Approaches To Automatic Lexicon Learning With Limited Training Examples", Nagendra Goel, Samuel Thomas et. al, ICASSP 2010 (pdf)

Stephen Chu, Daniel Povey et al., .The 2009 IBM GALE Mandarin Broadcast News Transcription System., ICASSP 2010

Hagen Soltau, George Saon et al., .The IBM 2008 GALE Arabic Speech Transcription System., ICASSP 2010.

Stephen Chu and Daniel Povey, .Speaking Rate Adaptation using Continuous Frame Rate Normalization., ICASSP 2010 (pdf)

"Notes for Affine Transform-based VTLN", Daniel Povey, 2010. (pdf) . These notes were never published, but I'm putting them up here as they are referred to from some Kaldi code

"Approaches to Speech Recognition based on Speaker Recognition Techniques", chapter in forthcoming GALE book (pdf)

2009

For closing presentations from JHU 2009 workshop, see here

"A Tutorial-Style Introduction To Subspace Gaussian Mixture Models For Speech Recognition", Microsoft Research technical report MSR-TR-2009-111(pdf)

Lecture on "estimation techniques in speech recognition" given at JHU CLSP summer school (pdf)

Lecture on "Subspace based/Universal Background Model (UBM) based speech modeling" given at JHU CLSP summer school (pdf)

Lab tutorial on estimation for speech in Octave/Matlab, given at JHU CLSP summer school (pdf)

``Subspace Gaussian Mixture Models for Speech Recognition'', Povey D., Microsoft Research technical report MSR-TR-2009-64 (pdf)

"Minimum Hypothesis Phone Error as a Decoding Method for Speech Recognition", Haihua Xu, Daniel Povey, Jie Zhu and Guanyong Wu, Interspeech 2009 (pdf) (slides,pdf)

2008

Dan Povey & Brian Kingsbury, "Monte Carlo Model-Space Noise Adaptation for Speech Recognition", Interspeech 2008 (pdf)

Daniel Povey, Hong-Kwang J. Kuo, Hagen Soltau, "Fast Speaker Adaptive Training for Speech Recognition", Interspeech 2008 (pdf)

Daniel Povey, Hong-Kwang J. Kuo, "XMLLR for Improved Speaker Adaptation in Speech Recognition", Interspeech 2008 (pdf)

George Saon and Daniel Povey, "Penalty Function Maximization for Large Margin HMM Training", Interspeech 2008 (pdf)

Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon & Karthik Visweswariah, "Boosted MMI for Model and Feature Space Discriminative Training", ICASSP 2008. (pdf)

Balakrishnan Varadarajan & Daniel Povey, “Quick FMLLR for Speaker Adaptation in Speech Recognition”, ICASSP 2008 (pdf)

Daniel Povey, Stephen M Chu & Balakrishnan Varadarajan, "Universal Background Model Based Speech Recognition", ICASSP 2008 (pdf)

2007

Daniel Povey & Brian Kingsbury, "Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training", ICASSP 2007 (pdf)

2006

D. Povey & George Saon, "Feature and model space speaker adaptation with full covariance Gaussians," ICSLP 2006. (pdf)

D. Povey, "SPAM and full covariance for speech recognition," ICSLP 2006. (pdf)

J. Pelecanos, Daniel Povey, Ganesh Ramaswamy, "Secondary Classification for GMM Based Speaker Recognition," ICASSP 2006. (pdf)

Ghinwa Choueiter, Daniel Povey, Stanley Chen & Geoffrey Zweig, "Morpheme-based language. modeling for Arabic LVCSR", ICASSP 2006. (pdf)

Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu and Brian Kingsbury, "Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy", ICASSP 2006. (pdf)

Stanley Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau & Geoffrey Zweig, "Advances in Speech Transcription at IBM under the DARPA EARS Program," 2006, IEEE Transactions on Audio, Speech and Language processing, Vol. 14 , Issue 5, pp. 1596-1608 (pdf)

2005

George Saon, Daniel Povey & Geoffrey Zweig, "Anatomy of an extremely fast LVCSR decoder," Interspeech 2005. (pdf)(poster,pdf)

Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon & Geoffrey Zweig, "The IBM 2004 Conversational Telephony System for Rich Transcription," ICASSP 2005 (pdf)

Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau & Geoffrey Zweig, "fMPE: Discriminatively Trained Features for Speech Recognition," ICASSP 2005 (pdf)

Jing Huang & Daniel Povey, "Discriminatively Trained Features using fMPE for Multi-Stream Audio-Visual Speech Recognition," Interspeech 2005 (pdf)

Hain, T. Woodland, P.C. Evermann, G. Gales, M.J.F. Xunying Liu Moore, G.L. Povey, D. Lan Wang, "Automatic transcription of conversational telephone speech", IEEE Trans on Speech and Audio Procesing, Nov. 2005, vol. 3, Issue 6, pp. 1173-1185. (pdf)

Daniel Povey, "Improvements to fMPE for discriminative training of features," Interspeech 2005 (pdf)

2004

Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, Geoffrey Zweig, "fMPE: Discriminatively trained features for speech recognition," RT'04 meeting, 2004. (pdf)

D. Povey, "Phone Duration Modeling for LVCSR," ICASSP 2004 (pdf)

Saon, G. Dharanipragada, S. Povey, D. "Feature space Gaussianization", ICASSP 2004. (pdf)

2003

Roongroj Nopuswanchai & D. Povey, "Discriminative training for HMM-based offline handwritten character recognition", Proc. Int'l Conf. on Document Analysis and Recognition, 2003. (pdf)

D. Povey, M.J.F. Gales, D.Y. Kim & P.C. Woodland, "MMI-MAP and MPE-MAP for Acoustic Model Adaptation," Eurospeech 2003. (pdf) (slides,pdf)

Daniel Povey, "Recent work on Discriminative Training," Talk given to one day meeting for young speech researchers, London, Apr 24th 2003 (pdf)

Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 (pdf)

D. Povey, P.C. Woodland, and M.J.F. Gales. Discriminative MAP for Acoustic Model Adaptation. In Proc. ICASSP, 2003. (ps)

Daniel Povey, "Minimum Phone Error - Better than MMI," talk given at IBM, 2003 (pdf)

M.J.F. Gales, Y. Dong, D. Povey and P.C. Woodland. "Porting: SwitchBoard to the VoiceMail Task." ICASSP 2003. (ps)

2002

D. Povey & P.C. Woodland, "Minimum Phone Error and I-Smoothing for Improved Discrimative Training," ICASSP 2002 (pdf) (slides,long version,pdf) (slides,short version,pdf)

Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang: "CU-HTK April 2002 Switchboard System", Rich Transcription Workshop 2002. (pdf)

Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., The HTK book (for HTK version 3.2). Technical Report, Cambridge University, Engineering Department, 2002. (pdf)

2001

T. Hain, P.C. Woodland, G. Evermann and D. Povey. "New features in the CU-HTK system for transcription of conversational telephone speech", ICASSP 2001. (pdf)

D. Povey & P.C. Woodland, "Improved Discriminative Training Techniques for Large Vocabulary Speech Recognition," ICASSP 2001 (pdf)

2000

D. Povey and P. C. Woodland, "Large-scale MMIE Training for Conversational Telephone Speech Recognition", Proc. NIST Speech Transcription Workshop, College Park, MD, 2000. (pdf)

Woodland, P.C and Povey, D. "Large Scale Discriminative Training for Speech Recognition", in ASR 2000. (pdf)

1999


D. Povey & P.C. Woodland, "Frame Discrimination Training of HMMs for Large Vocabulary Speech Recognition," Technical report, Cambridge University Engineering Dept., 1999. (pdf)

D. Povey & P.C. Woodland, "Frame Discrimination training of HMMs for Large Vocabulary Speech Recognition," ICASSP 1999 (pdf) (slides,pdf)

Daniel Povey, "Implementation of Frame Discrimination on a large task," MPhil thesis, Cambridge University Engineering Dept, 1999 (pdf)

Contact
The Center for Language and Speech Processing
Hackerman Hall 226
3400 North Charles Street
Baltimore, MD 21218
dpovey AT gmail DOT com
Back to my homepage