Recognition of a speech utterance available in spelled form
First Claim
1. A method of recognizing a speech utterance (s) available in spelled form, comprising:
- a first processing stage in which a corresponding letter sequence (r) is estimated by means of a letter speech recognition unit (2) based on hidden Markov Models, said letter speech recognition unit not using a letter grammar which denotes probabilities of the occurrence of different possible letter combinations; and
a second processing stage (3) in which the estimated result (r) produced by the first processing stage utilizing a statistical letter sequence model (4) and a statistical model (5) for the speech recognition unit (2) is post-processed, wherein a dynamic programming method is used during the post-processing wherein a grid structure on which the dynamic programming is based and whose node points are provided for the assignment to accumulated probability values, is converted into a tree structure and an A* algorithm is used for finding an optimum tree path.
2 Assignments
0 Petitions
Accused Products
Abstract
This invention relates to a method of recognizing a speech utterance (s) available in spelled form, comprising a processing stage in which a corresponding letter sequence (r) is estimated by means of a letter speech recognition unit (2) based on Hidden Markov Models, and a second processing stage (3) in which the estimated result (r) produced by the first processing stage utilizing a statistical letter sequence model (4) and a statistical model (5) for the speech recognition unit (2) is post-processed, while the dynamic programming method is used during the post-processing. For providing robust and efficient speech recognition procedures for the use of speech signals for system control, a grid structure on which the dynamic programming is based and whose node points are provided for the assignment to accumulated probability values, is converted into a tree structure and that an A* algorithm is used for finding an optimum tree path. Also a speech control device wherein a complete word is input as a control signal and at least part of this word in spelled form is input, while the result of the letter speech recognition is used within the scope of the word speech recognition.
33 Citations
5 Claims
-
1. A method of recognizing a speech utterance (s) available in spelled form, comprising:
-
a first processing stage in which a corresponding letter sequence (r) is estimated by means of a letter speech recognition unit (2) based on hidden Markov Models, said letter speech recognition unit not using a letter grammar which denotes probabilities of the occurrence of different possible letter combinations; and a second processing stage (3) in which the estimated result (r) produced by the first processing stage utilizing a statistical letter sequence model (4) and a statistical model (5) for the speech recognition unit (2) is post-processed, wherein a dynamic programming method is used during the post-processing wherein a grid structure on which the dynamic programming is based and whose node points are provided for the assignment to accumulated probability values, is converted into a tree structure and an A* algorithm is used for finding an optimum tree path. - View Dependent Claims (2, 3, 4)
-
-
5. A method of system control by means of speech signals (w,s) comprising the steps of;
-
inputting a whole word (w) serving as a control signal and at least part of this word is input in spelled form (s), recognizing the whole word (w) that is input using word speech recognition (7) and letter speech recognition (1) for recognizing the spelled part (s), the letter speech recognition comprising; a first processing stage in which a corresponding letter sequence (r) is estimated by means of a letter speech recognition unit (2) based on hidden Markov Models, said letter speech recognition unit not using a letter grammar which denotes probabilities of the occurrence of different possible letter combinations; and a second processing stage (3) in which the estimated result (r) produced by the first processing stage utilizing a statistical letter sequence model (4) and a statistical model (5) for the speech recognition unit (2) is post-processed, wherein a dynamic programming method is used during the post-processing, wherein a grid structure on which the dynamic programming is based and whose node points are provided for the assignment to accumulated probability values, is converted into a tree structure and an A* algorithm is used for finding an optimum tree path; and restricting a vocabulary assigned to the word speech recognition (7) to the recognition results of the letter speech recognition (1).
-
Specification