Apparatus, method and computer program product for recognizing speech

US 7,974,844 B2
Filed: 03/01/2007
Issued: 07/05/2011
Est. Priority Date: 03/24/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition apparatus comprising:

a semantic-relation storage unit that stores semantic relation among words and relevance ratio indicating degree of the semantic relation in association with each other;

a first input accepting unit that accepts an input of a first speech;

a first candidate producing unit that recognizes the first speech and produces first recognition candidates and first likelihood of the first recognition candidates, the first recognition candidates containing a phoneme-string candidate and a word candidate;

a first-candidate selecting unit that selects one of the first recognition candidates as a recognition result of the first speech based on the first likelihood of the first recognition candidates;

a second input accepting unit that accepts an input of a second speech including an object word and a clue word, wherein the first speech includes the object word, the first speech does not include the clue word, and the recognition result of the first speech does not include the object word, and wherein the clue word provides the clue for recognizing the object word and for correcting a portion of the recognition result of the first speech which corresponds to the object word;

a second candidate producing unit that recognizes the second speech and produces second recognition candidates and second likelihood of the second recognition candidates;

a word extracting unit that extracts recognition candidates of the object word and recognition candidates of the clue word from the second recognition candidates;

a second-candidate selecting unit that acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, from the semantic-relation storage unit, and selects one of the second recognition candidates as a recognition result of the second speech based on the acquired relevance ratio;

a correction-portion identifying unit that compares a phoneme-string contained in the recognition result of the first speech with a phoneme-string contained in the recognition candidates of the object word extracted by the word extracting unit, and identifies a portion corresponding to the object word; and

a correcting unit that corrects the identified portion corresponding to the object word with a portion that contains the object word and that is contained in the recognition result of the second speech.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidates; a second-candidate selecting unit that extracts recognition candidates of a object word contained in the first speech and recognition candidates of a clue word from second recognition candidates, acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects a recognition result of the second speech based on the acquired relevance ratio; a correction-portion identifying unit that identifies a portion corresponding to the object word in the first speech; and a correcting unit that corrects the word on identified portion.

449 Citations

19 Claims

1. A speech recognition apparatus comprising:
- a semantic-relation storage unit that stores semantic relation among words and relevance ratio indicating degree of the semantic relation in association with each other;
  
  a first input accepting unit that accepts an input of a first speech;
  
  a first candidate producing unit that recognizes the first speech and produces first recognition candidates and first likelihood of the first recognition candidates, the first recognition candidates containing a phoneme-string candidate and a word candidate;
  
  a first-candidate selecting unit that selects one of the first recognition candidates as a recognition result of the first speech based on the first likelihood of the first recognition candidates;
  
  a second input accepting unit that accepts an input of a second speech including an object word and a clue word, wherein the first speech includes the object word, the first speech does not include the clue word, and the recognition result of the first speech does not include the object word, and wherein the clue word provides the clue for recognizing the object word and for correcting a portion of the recognition result of the first speech which corresponds to the object word;
  
  a second candidate producing unit that recognizes the second speech and produces second recognition candidates and second likelihood of the second recognition candidates;
  
  a word extracting unit that extracts recognition candidates of the object word and recognition candidates of the clue word from the second recognition candidates;
  
  a second-candidate selecting unit that acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, from the semantic-relation storage unit, and selects one of the second recognition candidates as a recognition result of the second speech based on the acquired relevance ratio;
  
  a correction-portion identifying unit that compares a phoneme-string contained in the recognition result of the first speech with a phoneme-string contained in the recognition candidates of the object word extracted by the word extracting unit, and identifies a portion corresponding to the object word; and
  
  a correcting unit that corrects the identified portion corresponding to the object word with a portion that contains the object word and that is contained in the recognition result of the second speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The speech recognition apparatus according to claim 1, wherein the recognition candidates of the object word include first words, the recognition candidates of the clue word include second words, and the second-candidate selecting unit selects a first word and a second word from the first words and the second words, respectively having the relevance ratio associated with the semantic relation between the first word and the second word being maximum, and selects the recognition result of the second speech that includes the selected first word and the selected second word.
  - 3. The speech recognition apparatus according to claim 1, further comprising:
    - a language model storage unit that stores therein language models that associate a connection relation among words with degree of the connection relation, whereinthe second-candidate selecting unit further acquires the degree of the connection relation associated with the connection relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects the recognition result of the second speech based on the acquired degree of the connection relation and the relevance ratio.
  - 4. The speech recognition apparatus according to claim 1, wherein the second-candidate selecting unit selects the recognition result of the second speech based on the second likelihood of the second recognition candidates and the relevance ratio.
  - 5. The speech recognition apparatus according to claim 1, further comprising:
    - a word-dictionary storage unit that stores words and an appearance probability of the words associated with each other, whereinthe second-candidate selecting unit further acquires the appearance probability associated with the recognition candidates of the object word, and selects the recognition result of the second speech based on the acquired appearance probability and the relevance ratio.
  - 6. The speech recognition apparatus according to claim 1, whereinthe semantic-relation storage unit stores a hierarchical relation of semantic contents among the words and the relevance ratio associated with each other, andthe second-candidate selecting unit acquires from the semantic-relation storage unit the relevance ratio associated with the hierarchical relation of semantic contents between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects the recognition result of the second speech based on the acquired relevance ratio.
  - 7. The speech recognition apparatus according to claim 1, wherein the semantic-relation storage unit stores at least one of synonym relation and quasi-synonym relation among words as the semantic relation associated with the relevance ratio.
  - 8. The speech recognition apparatus according to claim 1, whereinthe semantic-relation storage unit stores a co-occurrence relation indicating that a plurality of words appear together and a co-occurrence probability indicating a probability of appearing the co-occurrence relation associated with each other, andthe second-candidate selecting unit acquires from the semantic-relation storage unit the co-occurrence probability associated with the co-occurrence relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects the recognition result of the second speech based on the acquired co-occurrence probability.
  - 9. The speech recognition apparatus according to claim 1, wherein the correcting unit corrects the identified portion corresponding to the object word with the word selected by the second-candidate selecting unit to the recognition candidates of the object word.
  - 10. The speech recognition apparatus according to claim 1, wherein the correcting unit corrects the identified portion corresponding to the object word with the recognition result of the second speech selected by the second-candidate selecting unit.
  - 11. The speech recognition apparatus according to claim 1, further comprising:
    - a display unit that displays the recognition result of the first speech; and
      
      a correction-portion specifying unit that specifies a correction portion in the recognition result of the first speech displayed on the display unit, whereinthe correction-portion identifying unit identifies a portion corresponding to the object word in the first speech from a predetermined range at least one of before and after the specified correction portion.
  - 12. The speech recognition apparatus according to claim 11, wherein the second input accepting unit accepts a speech input after the correction portion is specified as an input of the second speech.
  - 13. The speech recognition apparatus according to claim 1, whereinthe first input accepting unit accepts a speech input when a first button is pressed as the first speech, andthe second input accepting unit accepts a speech input when a second button is pressed as the second speech.
  - 14. The speech recognition apparatus of claim 1 further comprising a speech receiving device for receiving one of the first speech and the second speech.
  - 15. The speech recognition apparatus of claim 1 further comprising an output device for outputting the recognition result.
  - 16. The speech recognition apparatus of claim 15 wherein the output device is one of a visual output device and an audio output device.
  - 17. The speech recognition apparatus of claim 1 further comprising a trigger device for one of triggering the first input accepting unit to accept the input of the first speech and triggering the second input accepting unit to accept the input of the second speech.

18. A speech recognition method executed by a processor, the method comprising:
- accepting a first speech;
  
  recognizing, by the processor, the accepted first speech to produce first recognition candidates and first likelihood of the first recognition candidates, the first recognition candidates containing a phoneme-string candidate and a word candidate;
  
  selecting, by the processor, one of the first recognition candidates produced for a first speech as the recognition result of the first speech based on the first likelihood of the first recognition candidates;
  
  accepting, by the processor, a second speech that includes an object word and a clue word, wherein the first speech includes the object word, the first speech does not include the clue word, and the recognition result of the first speech does not include the object word, and wherein the clue word provides the clue for recognizing the object word and for correcting a portion of the recognition result of the first speech which corresponds to the object word;
  
  recognizing, by the processor, the accepted second speech to produce second recognition candidates and second likelihood of the second recognition candidates;
  
  extracting, by the processor, recognition candidates of the object word and recognition candidates of the clue word from the produced second recognition candidates;
  
  acquiring, by the processor, a relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word from a semantic-relation storage unit that stores therein semantic relation among words and relevance ratio indicating degree of the semantic relation in association with each other;
  
  selecting, by the processor, one of the second recognition candidates as the recognition result of the second speech based on the acquired relevance ratio;
  
  comparing, by the processor, a phoneme-string contained in the recognition result of the first speech with a phoneme-string contained in the recognition candidates of the object word extracted by the word extracting unit;
  
  identifying, by the processor, a portion corresponding to the object word in the first speech; and
  
  correcting, by the processor, the identified portion corresponding to the object word with a portion that contains the object word and that is contained in the recognition result of the second speech.

19. A computer program product having a non-transitory computer readable medium storing therein programmed instructions for recognizing speech, wherein the instructions, when executed by a computer, cause the computer to perform:
- accepting a first speech;
  
  recognizing the accepted first speech to produce first recognition candidates and first likelihood of the first recognition candidates, the first recognition candidates containing a phoneme-string candidate and a word candidate;
  
  selecting one of the first recognition candidates produced for a first speech as the recognition result of the first speech based on the first likelihood of the first recognition candidates;
  
  accepting a second speech that includes an object word and a clue word, wherein the first speech includes the object word, the first speech does not include the clue word, and the recognition result of the first speech does not include the object word, and wherein the clue word provides the clue for recognizing the object word and for correcting a portion of the recognition result of the first speech which corresponds to the object word;
  
  recognizing the accepted second speech to produce second recognition candidates and second likelihood of the second recognition candidates;
  
  extracting recognition candidates of the object word and recognition candidates of the clue word from the produced second recognition candidates;
  
  acquiring a relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word from a semantic-relation storage unit that stores therein semantic relation among words and relevance ratio indicating degree of the semantic relation in association with each other;
  
  selecting one of the second recognition candidates as the recognition result of the second speech based on the acquired relevance ratio;
  
  comparing a phoneme-string contained in the recognition result of the first speech with a phoneme-string contained in the recognition candidates of the object word extracted by the word extracting unit;
  
  identifying a portion corresponding to the object word in the first speech; and
  
  correcting the identified portion corresponding to the object word with a portion that contains the object word and that is contained in the recognition result of the second speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation), Toshiba Digital Solutions Corporation (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Sumita, Kazuo
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Borsetti; Greg

Application Number

US11/712,412
Publication Number

US 20070225980A1
Time in Patent Office

1,587 Days
Field of Search

704/4, 704/9, 704/237, 704/257
US Class Current

704/257
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/22 Procedures used during a sp...

Apparatus, method and computer program product for recognizing speech

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

449 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Apparatus, method and computer program product for recognizing speech

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

449 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others