Method and apparatus for large vocabulary continuous speech recognition using a hybrid phoneme-word lattice
First Claim
1. A method for extracting a term comprising at least one word from an audio signal captured in a call center environment, comprising:
- receiving the audio signal captured in an environment;
extracting a multiplicity of vectors of spectrum-based features from the audio signal, wherein the spectrum-based features comprise at least any one of Mel Frequency Cepstral Coefficients (MFCC), Delta Cepstral Mel Frequency Coefficients (DMFCC), or spectral energy transform;
creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising at least one allophone, the at least one allophone comprising at least two phonemes and determined as most probable and correspondingly assigned a probability score;
creating a hybrid phoneme-word lattice from the phoneme lattice by utilizing a speech model and a non-speech model that were created from the audio signal captured in the call center environment; and
extracting the word by analyzing the hybrid phoneme-word lattice.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus combining the advantages of phonetic search such as the rapid implementation and deployment and medium accuracy, comprising steps and components for receiving the audio signal captured in the call center environment, extracting a multiplicity of feature vectors from the audio signal, creating a phoneme lattice from the multiplicity of feature vectors wherein the phoneme lattice comprising one or more allophone and each allophone comprising two or more phonemes, creating a hybrid phoneme-word lattice from the phoneme lattice and extracting the word by analyzing the hybrid phoneme-Word lattice.
102 Citations
27 Claims
-
1. A method for extracting a term comprising at least one word from an audio signal captured in a call center environment, comprising:
-
receiving the audio signal captured in an environment; extracting a multiplicity of vectors of spectrum-based features from the audio signal, wherein the spectrum-based features comprise at least any one of Mel Frequency Cepstral Coefficients (MFCC), Delta Cepstral Mel Frequency Coefficients (DMFCC), or spectral energy transform; creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising at least one allophone, the at least one allophone comprising at least two phonemes and determined as most probable and correspondingly assigned a probability score; creating a hybrid phoneme-word lattice from the phoneme lattice by utilizing a speech model and a non-speech model that were created from the audio signal captured in the call center environment; and extracting the word by analyzing the hybrid phoneme-word lattice. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for extracting a term comprising an at least one word from an audio signal captured in a call center environment, comprising:
-
a capture device for capturing the audio signal in an environment; a feature extraction component for extracting a multiplicity of vectors of spectrum-based features from the audio signal, wherein the spectrum-based features comprise at least any one of Mel Frequency Cepstral Coefficients (MFCC), Delta Cepstral Mel Frequency Coefficients (DMFCC), or spectral energy transform; an allophone decoding component for creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising at least one allophone, the at least one allophone comprising at least two phonemes and determined as most probable and correspondingly assigned a probability score; a word decoding component for creating a hybrid phoneme-word lattice from the phoneme lattice by utilizing a speech model and a non-speech model that were created from the audio signal captured in the call center environment; and an analysis component for analyzing the hybrid phoneme-word lattice. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
-
capturing an audio signal in a call center environment; extracting a multiplicity of vectors of spectrum-based features from the audio signal, wherein the spectrum-based features comprise at least any one of Mel Frequency Cepstral Coefficients (MFCC), Delta Cepstral Mel Frequency Coefficients (DMFCC), or spectral energy transform; creating a phoneme lattice from the multiplicity of feature vectors, the phoneme lattice comprising at least one allophone, the at least one allophone comprising at least two phonemes and determined as most probable and correspondingly assigned a probability score; creating a hybrid phoneme-word lattice from the phoneme lattice by utilizing a speech model and a non-speech model that were created from the audio signal captured in the call center environment; and analyzing the hybrid phoneme-word lattice. - View Dependent Claims (26, 27)
-
Specification