Unsupervised HMM adaptation based on speech-silence discrimination
First Claim
Patent Images
1. A method for discriminating between speech and background regions, comprising:
- segmenting an input utterance into speech and background regions without knowledge of the lexical content of the input utterance to create a segmented input string;
introducing insertion errors into the background regions that are error prone to generate error laden background strings;
statistically modeling the segmented input string and the error laden background strings using a discriminative training algorithm to generate a model with adapted parameters;
decoding the input utterance using the model with the adapted parameters; and
outputting a recognized string based on the decoding step.
4 Assignments
0 Petitions
Accused Products
Abstract
An unsupervised, discriminative, sentence level, HMM adaptation based on speech-silence classification is presented. Silence and speech regions are determined either using a speech end-pointer or the segmentation obtained from the recognizer in a first pass. The discriminative training procedure using a GPD or any other discriminative training algorithm, employed in conjunction with the HMM-based recognizer, is then used to increase the discrimination between silence and speech.
88 Citations
12 Claims
-
1. A method for discriminating between speech and background regions, comprising:
-
segmenting an input utterance into speech and background regions without knowledge of the lexical content of the input utterance to create a segmented input string; introducing insertion errors into the background regions that are error prone to generate error laden background strings; statistically modeling the segmented input string and the error laden background strings using a discriminative training algorithm to generate a model with adapted parameters; decoding the input utterance using the model with the adapted parameters; and outputting a recognized string based on the decoding step. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for decoding of speech information comprising:
-
means for segmenting an input utterance into speech and background regions without knowledge of the lexical content of the input utterance to create a segmented input string; means for introducing insertion errors into the background regions that are error prone to generate error laden background strings; means for statistically modeling the segmented input string and the error laden background strings using a discriminative training algorithm to generate a model with adapted parameters; means for decoding the input utterance using the model with the adapted parameters; and means for outputting a recognized string based on the decoded input utterance. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification