Method for recognizing speech
First Claim
1. A method for recognizing speech, comprising:
- (a) receiving a speech phrase;
(b) generating a signal being representative to said speech phrase;
(c) pre-processing and storing said signal with respect to a determined set of rules;
(d) generating from said pre-processed signal at least one series of hypothesis speech elements;
(e) determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements,wherein determining said at least one series of words further comprises;
(1) determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase,wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech that are recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and
(2) continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching,wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action,wherein the predefined language model contains a low-perplexity recognition grammar obtained from a conventional recognition grammar by;
(3) identifying and extracting word classes of high-perplexity from the conventional grammar;
(4) generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class; and
(5) merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and
wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and
wherein word classes are used as classes for speech elements or fragments.
1 Assignment
0 Petitions
Accused Products
Abstract
Method for recognizing speech includes receiving a speech phrase, generating a signal being representative to the speech phrase, pre-processing and storing the signal with respect to a determined set of rules, generating from the pre-processed signal at least one series of hypothesis speech elements, and determining at least one series of words being most probable to correspond to the speech phrase by applying a predefined language model to at least said series of hypothesis speech elements. The determination of the series of words includes determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in the received speech phrase. The determination continues with determining words or combinations of words that are consistent with the seed sub-phrase as at least a first successive sub-phrase, which is contained in the received speech phrase, by using and evaluating additional and paired and/or higher order information between the sub-phrases, thereby decreasing the burden of searching.
-
Citations
16 Claims
-
1. A method for recognizing speech, comprising:
-
(a) receiving a speech phrase; (b) generating a signal being representative to said speech phrase; (c) pre-processing and storing said signal with respect to a determined set of rules; (d) generating from said pre-processed signal at least one series of hypothesis speech elements; (e) determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements, wherein determining said at least one series of words further comprises; (1) determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase, wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech that are recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and (2) continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching, wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action, wherein the predefined language model contains a low-perplexity recognition grammar obtained from a conventional recognition grammar by; (3) identifying and extracting word classes of high-perplexity from the conventional grammar; (4) generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class; and (5) merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and wherein word classes are used as classes for speech elements or fragments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for recognizing speech, comprising:
-
(a) means for receiving a speech phrase; (b) means for generating a signal being representative to said speech phrase; (c) means for pre-processing and storing said signal with respect to a determined set of rules; (d) means for generating from said pre-processed signal at least one series of hypothesis speech elements; (e) means for determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements, wherein said means for determining said at least one series of words further comprises; (1) means for determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase, wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech which can be recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and (2) means for continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching, wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action, and wherein the predefined language model includes a low-perplexity recognition grammar obtained from a conventional recognition grammar by using; (3) means for identifying and extracting word classes of high-perplexity from the conventional grammar; (4) means for generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class; (5) means for merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and wherein word classes are used as classes for speech elements or fragments. - View Dependent Claims (12, 13, 14, 15, 16)
-
Specification