Pervasive Artificial Intelligence Research (PAIR) Labs
Deep Intelligence Based Spoken Language Processing
Principal Investigator: Professor Jia-Ching Wang
–
Summary
Speech is not only the most natural means of communications among people but also the most effective means of human-computer interaction. Enabling a computer to process spoken language like a human is a great problem, which scholars have been trying to solve for decades. Deep leaning puts this goal into practice. To such an end, this project develops a plan for processing spoken language that integrates speech processing, acoustic signal processing, natural language processing, and deep learning techniques. Key techniques such as intelligent multi-channel speech processing and speech separation, robust speech recognition, spoken language translation, speech emotion recognition, and open field dialogue will be developed. Native languages and dialects such as Minnan are particularly addressed.
Keywords
Spoken language processing, speech separation, speech recognition, spoken language translation, speech emotion recognition, dialogue system, deep learning
Innovations
- In front-end processing, we propose a deep learning-based multi-channel speech enhancement algorithm, which integrates the beamforming technique and the deep neural networks.
- For speech separation, we present a GP regression-based SCSS model. The estimated source is measured by the predictive mean of GP regression models, and the hyper-parameter learning process is executed by using the nonlinear conjugate gradient algorithm.
- We propose the hierarchical extreme learning machine (HELM) for audio-visual speech enhancement as an alternative model for the speech enhancement task.
- To achieve robust speech recognition, we propose a novel use of the graph-regularization based methods to enhance speech features by preserving the inherent manifold structures of the magnitude modulation spectra and excluding irrelevant ones.
- For native language speech recognition, we have developed a Taiwanese Minnan speech recognizer.
- In machine translation, we present a bi-directional translator between English and Chinese.
- We propose an acoustic speech emotion recognition system, which uses multiple feature extraction network based on deep learning, as well as a new recurrent neural network that we have developed.
- In order to understand the semantics of the dialogue, we develop language understanding techniques for dialogue systems.
Benefits
- This project develops an English learning robot based on ASUS Zenbo. Through intelligent speech recognition technology, the English learning robot enables the user to practice pronunciation and articulation within non-English environments.
- The robustness techniques alleviate the undesired impact caused by environmental distortions so as to make automatic speech recognition systems maintain an acceptable performance level.
- The spoken language processing techniques that are developed herein will be used on an intelligent interactive plat-form that can be applied to real-world applications.