METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION

US 20080221873A1
Filed: 03/10/2008
Published: 09/11/2008
Est. Priority Date: 07/11/2000
Status: Active Grant

First Claim

Patent Images

1. A speech recognition apparatus comprising:

an acoustic processor which converts an input analog speech signal into a digital signal;

a first storer which stores an acoustic model that has learned a feature of speech;

a second storer which stores a dictionary wherein an appearance frequency of a predetermined word relative to another predetermined word and/or word sequence is written; and

a recognizer which uses said acoustic model and said dictionary to calculate a probability value for said digital signal, and which recognizes a word having the maximum probability value as input speech, wherein said recognizer predicts a word to be predicted based on a structure of a sentence including said word, and employs said appearance frequency to calculate said probability value for said sentence, including said word that is predicted.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.

Citations

19 Claims

1. A speech recognition apparatus comprising:
- an acoustic processor which converts an input analog speech signal into a digital signal;
  
  a first storer which stores an acoustic model that has learned a feature of speech;
  
  a second storer which stores a dictionary wherein an appearance frequency of a predetermined word relative to another predetermined word and/or word sequence is written; and
  
  a recognizer which uses said acoustic model and said dictionary to calculate a probability value for said digital signal, and which recognizes a word having the maximum probability value as input speech, wherein said recognizer predicts a word to be predicted based on a structure of a sentence including said word, and employs said appearance frequency to calculate said probability value for said sentence, including said word that is predicted.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech recognition apparatus according to claim 1 further comprising:
    - an arrangement which returns at least said recognized word to a user as a recognition result.
  - 3. The speech recognition apparatus according to claim 1 wherein said sentence structure comprises:
    - a word and/or word sequence; and
      
      said word to be predicted;
      
      wherein, when said recognizer predicts said word to be predicted based on said structure of a sentence including said word, said recognizer employs only a word and/or word sequence that has a modification relationship with said word to be predicted.
  - 4. The speech recognition apparatus according to claim 1 wherein said sentence structure constitutes a partial analysis tree.
  - 5. The speech recognition apparatus according to claim 3 wherein said modification relationship includes a modification direction.
  - 6. The speech recognition apparatus according to claim 3 wherein, when multiple modifications are established between said word to be predicted and said word and/or word sequence, a word is predicted for each of said modifications.
  - 7. The speech recognition apparatus according to claim 5 wherein, when said a modification direction is not constant, said recognizer specifies a modification direction.
  - 8. The speech recognition apparatus according to claim 2 wherein, when said recognizer predicts the last word of a sentence, said arrangement which returns at least said recognized word to a user as a recognition result returns the entire sentence as recognition results to said user.
  - 9. The speech recognition apparatus according to claim 2 further comprising:
    - a storage medium which stores said recognition results.

10. A speech recognition method comprising:
- converting an input analog speech signal into a digital signal;
  
  storing an acoustic model that has learned a feature of speech;
  
  storing a dictionary wherein an appearance frequency of a predetermined word relative to another predetermined word and/or word sequence is written; and
  
  recognizing, using said acoustic model and said dictionary to calculate a probability value for said digital signal, a word having the maximum probability value as input speech, wherein said recognizing further comprises;
  
  predicting a word to be predicted based on a structure of a sentence including said word; and
  
  employing said appearance frequency to calculate said probability value for said sentence, including said word that is predicted.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method according to claim 10 further comprising:
    - returning at least said recognized word to a user as a recognition result.
  - 12. The method according to claim 10 wherein said sentence structure comprises:
    - a word and/or word sequence; and
      
      said word to be predicted; and
      
      wherein, said predicting step further comprises employing only a word and/or word sequence that has a modification relationship with said word to be predicted.
  - 13. The method according to claim 10 wherein said sentence structure constitutes a partial analysis tree.
  - 14. The method according to claim 12 wherein said modification relationship includes a modification direction.
  - 15. The method according to claim 12 wherein, when multiple modifications are established between said word to be predicted and said word and/or word sequence, a word is predicted for each of said modifications.
  - 16. The method according to claim 15 wherein, when said a modification direction is not constant, said recognizer specifies a modification direction.
  - 17. The method according to claim 11 wherein, when said recognizer predicts the last word of a sentence, said arrangement which returns at least said recognized word to a user as a recognition result returns the entire sentence as recognition results to said user.
  - 18. The method according to claim 11 further comprising:
    - storing said recognition results in a memory.

19. A program storage device readable by computer, tangibly embodying a program of instructions executable by the computer to perform method steps for speech recognition, said method comprising the steps of:
- converting an input analog speech signal into a digital signal;
  
  storing an acoustic model that has learned a feature of speech;
  
  storing a dictionary wherein an appearance frequency of a predetermined word relative to another predetermined word and/or word sequence is written; and
  
  recognizing, using said acoustic model and said dictionary to calculate a probability value for said digital signal, a word having the maximum probability value as input speech, wherein said recognizing further comprises;
  
  predicting a word to be predicted based on a structure of a sentence including said word; and
  
  employing said appearance frequency to calculate said probability value for said sentence, including said word that is predicted.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Mori, Shinsuke, Nishimura, Masafumi, Itoh, Nobuyasu

Granted Patent

US 8,150,693 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G10L 15/19 Grammatical context, e.g. d...

METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links