Method for recognizing speech

US 7,225,127 B2
Filed: 12/11/2000
Issued: 05/29/2007
Est. Priority Date: 12/13/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing speech, comprising:

(a) receiving a speech phrase;

(b) generating a signal being representative to said speech phrase;

(c) pre-processing and storing said signal with respect to a determined set of rules;

(d) generating from said pre-processed signal at least one series of hypothesis speech elements;

(e) determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements,wherein determining said at least one series of words further comprises;

(1) determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase,wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech that are recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and

(2) continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching,wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action,wherein the predefined language model contains a low-perplexity recognition grammar obtained from a conventional recognition grammar by;

(3) identifying and extracting word classes of high-perplexity from the conventional grammar;

(4) generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class; and

(5) merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and

wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and

wherein word classes are used as classes for speech elements or fragments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method for recognizing speech includes receiving a speech phrase, generating a signal being representative to the speech phrase, pre-processing and storing the signal with respect to a determined set of rules, generating from the pre-processed signal at least one series of hypothesis speech elements, and determining at least one series of words being most probable to correspond to the speech phrase by applying a predefined language model to at least said series of hypothesis speech elements. The determination of the series of words includes determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in the received speech phrase. The determination continues with determining words or combinations of words that are consistent with the seed sub-phrase as at least a first successive sub-phrase, which is contained in the received speech phrase, by using and evaluating additional and paired and/or higher order information between the sub-phrases, thereby decreasing the burden of searching.

Citations

16 Claims

1. A method for recognizing speech, comprising:
- (a) receiving a speech phrase;
  
  (b) generating a signal being representative to said speech phrase;
  
  (c) pre-processing and storing said signal with respect to a determined set of rules;
  
  (d) generating from said pre-processed signal at least one series of hypothesis speech elements;
  
  (e) determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements,wherein determining said at least one series of words further comprises;
  
  (1) determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase,wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech that are recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and
  
  (2) continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching,wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action,wherein the predefined language model contains a low-perplexity recognition grammar obtained from a conventional recognition grammar by;
  
  (3) identifying and extracting word classes of high-perplexity from the conventional grammar;
  
  (4) generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class; and
  
  (5) merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and
  
  wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and
  
  wherein word classes are used as classes for speech elements or fragments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. Method according to claim 1, characterized in that a predefined language model is applied to at least said series of hypothesis speech elements to obtain said seed sub-phrase and said additional and paired and/or higher order information is obtained from said language model.
  - 3. Method according to claim 1, characterized in that additional information within said language model is used being descriptive for the prepositional relationship of the sub-phrases.
  - 4. Method according to claim 1, characterized in that additional information within that language model is used being descriptive for pairs, triples and/or higher order n-tuples of sub-phrases.
  - 5. Method according to claim 1, characterized in that a hypothetic graph is generated for the received speech phrase including the generated sub-phrases and/or their combinations as candidates for the recognized speech phrase and that additional information between the sub-phrases is used to constrain and to restrict the search for the most probable candidate within the graph.
  - 6. Method according to claim 5, characterized in that during the search candidate sub-phrases or sub-words from the high-perplexity word classes are inserted into the hypothesis graph, whereby the sub-word unit grammars for the high-perplexity word classes are used as constraints as well as the respective additional semantic and/or pragmatic information.
  - 7. Method according to claim 6, characterized in that according to the constraints candidates are deleted from the hypothesis graph until an unbranched resulting graph is generated, corresponding to the most probable phrase.
  - 8. Method according to claim 1, characterized in that the vocabulary—
    - in particular of said language model—
      
      applicable for the remaining parts of the speech phrase besides the seed sub-phrase is restricted at least for one remaining part so as to decrease the burden of search.
  - 9. The method of claim 1, wherein said seed sub-phrase recognized with an appropriate high degree of reliability is defined as a low perplexity part of said received speech phrase.
  - 10. The method of claim 9, wherein perplexity is defined as the complexity of the depth of search which has to be accomplished in conventional search graphs or search trees.

11. An apparatus for recognizing speech, comprising:
- (a) means for receiving a speech phrase;
  
  (b) means for generating a signal being representative to said speech phrase;
  
  (c) means for pre-processing and storing said signal with respect to a determined set of rules;
  
  (d) means for generating from said pre-processed signal at least one series of hypothesis speech elements;
  
  (e) means for determining at least one series of words being most probable to correspond to said speech phrase by applying a predefined language model to said at least one series of hypothesis speech elements,wherein said means for determining said at least one series of words further comprises;
  
  (1) means for determining at least one sub-word, word, or a combination of words most probably being contained as a seed sub-phrase in said received speech phrase,wherein said seed sub-phrase is recognized with an appropriate high degree of reliability, such that segments of speech which can be recognized with high reliability are used to constrain the search in other areas of the speech signal where the language model employed cannot adequately restrict the search; and
  
  (2) means for continuing determining words or combinations of words, which are consistent with said seed sub-phrase as at least a first successive sub-phrase which is contained in said received speech phrase, by inserting additional, paired and/or higher order information, including semantic and/or pragmatic information, between the sub-phrases, thereby decreasing the burden of searching,wherein said semantic information includes description of said sub-phrases and said pragmatic information includes connecting information connecting said sub-phrases to actual situation, application, and/or action, andwherein the predefined language model includes a low-perplexity recognition grammar obtained from a conventional recognition grammar by using;
  
  (3) means for identifying and extracting word classes of high-perplexity from the conventional grammar;
  
  (4) means for generating a phonetic, phonemic and/or syllabic description of the high-perplexity word classes, in particular by applying a sub-word-unit grammar compiler to them, to produce a sub-word-unit grammar for each high-perplexity word class;
  
  (5) means for merging the sub-word-unit grammars with the remaining low-perplexity part of the conventional grammar to yield said low-perplexity recognition grammar; and
  
  wherein a language model is used containing at least a recognition grammar built up by at least a low-perplexity part and a high-perplexity part, each of which being representative for distinct low- and high-perplexity classes of speech elements; and
  
  wherein word classes are used as classes for speech elements or fragments.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The apparatus of claim 11, wherein said semantic information includes information relating to grammatical constraints among said sub-phrases.
  - 13. The apparatus of claim 12, wherein said information relating to grammatical constraints include grammatical constraints for a name of a city.
  - 14. The apparatus of claim 13, wherein said pragmatic information includes a 5-digit postal code for the city.
  - 15. The apparatus of claim 11, wherein said seed sub-phrase recognized with an appropriate high degree of reliability is defined as a low perplexity part of said received speech phrase.
  - 16. The apparatus of claim 15, wherein perplexity is defined as the complexity of the depth of search which has to be accomplished in conventional search graphs or search trees.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony International Europe Gmbh (Sony Group Corp.)
Original Assignee
Sony International Europe Gmbh (Sony Group Corp.)
Inventors
Lucke, Helmut
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
Jackson; Jakieda R.

Application Number

US09/734,228
Publication Number

US 20010016816A1
Time in Patent Office

2,360 Days
Field of Search

704/254, 704/256, 704/257, 704/275, 704/248, 704/221, 704/1, 704/252, 704/10, 701/10, 379/88.03
US Class Current

704/257
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

Method for recognizing speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method for recognizing speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links