Speech recognition device and speech recognition method and recording medium utilizing preliminary word selection

US 7,881,935 B2
Filed: 02/16/2001
Issued: 02/01/2011
Est. Priority Date: 02/28/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition apparatus in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:

input means to receive the input speech;

extraction means for extracting characteristic values of said input speech, the input speech comprising a plurality of input words;

characteristic value storage means for storing characteristic values and an extraction time point associated with each characteristic value;

word concatenation information storage means for storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words;

selection means for selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection means is configured to select said candidate second words likely to be concatenated linguistically to a directly previously recognized word and configured to subject the concatenated candidate first and candidate second words to the matching processing;

score calculation means for calculating said score of said concatenated candidate first and candidate second words selected by said selection means referencing concatenation information of said first and second words; and

finalizing means for finalizing a word string, as the recognition result of said speech, based on said score,wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words,wherein the word concatenation information is sequentially updated based on the score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus in which the accuracy in speech recognition is improved as the resource is prevented from increasing. Such a word which is probable as the result of the speech recognition is selected on the basis of an acoustic score and a linguistic score, while word selection is also performed on the basis of a measure different from the acoustic score, such as the number of phonemes being small, a part of speech being a pre-set one, inclusion in the past results of speech recognition or the linguistic score being not less than a pre-set value. The words so selected are subjected to matching processing.

44 Citations

View as Search Results

6 Claims

1. A speech recognition apparatus in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
- input means to receive the input speech;
  
  extraction means for extracting characteristic values of said input speech, the input speech comprising a plurality of input words;
  
  characteristic value storage means for storing characteristic values and an extraction time point associated with each characteristic value;
  
  word concatenation information storage means for storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words;
  
  selection means for selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection means is configured to select said candidate second words likely to be concatenated linguistically to a directly previously recognized word and configured to subject the concatenated candidate first and candidate second words to the matching processing;
  
  score calculation means for calculating said score of said concatenated candidate first and candidate second words selected by said selection means referencing concatenation information of said first and second words; and
  
  finalizing means for finalizing a word string, as the recognition result of said speech, based on said score,wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words,wherein the word concatenation information is sequentially updated based on the score.
- View Dependent Claims (2, 3, 4)
- - 2. The speech recognition apparatus according to claim 1 further comprising:
    - storage means for memorizing the results of speech recognition;
      
      wherein said selection means selects, as said candidate second words, the input words included in the results of speech recognition memorized in said storage means, with a stored state in said storage means as a measure not based on the non-acoustic score.
  - 3. The speech recognition apparatus according to claim 2 further comprising:
    - inputting means for providing an input for correcting the results of speech recognition;
      
      wherein said storage means stores the results of the speech recognition corrected by the input from said inputting means.
  - 4. The speech recognition apparatus according to claim 1 wherein said selection means calculates said score using characteristic values of the speech to select said candidate first word based on said score.

5. A speech recognition method in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
- an extraction step of extracting characteristic values of said input speech, said input speech comprising a plurality of input words;
  
  a first storing step of storing characteristic values and an extraction time point associated with each characteristic value;
  
  a second storing step of storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words;
  
  a selection step of selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection step selects said candidate second words likely to be concatenated linguistically to a directly previously recognized word and to subject the concatenated candidate first and candidate second words to the matching processing;
  
  a score calculation step of calculating said score of said candidate first and candidate second words selected by said selection step referencing concatenation information of said first and second words; and
  
  a finalizing step of finalizing a word string, as the recognition result of said speech, based on said score,wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words,wherein the word concatenation information is sequentially updated based on the score.

6. A non-transitory computer-readable medium having recorded thereon a program for causing a computer to perform speech recognition processing in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
- an extraction step of extracting characteristic values of said input speech, said input speech comprising a plurality of input words;
  
  a first storing step of storing characteristic values and an extraction time point associated with each characteristic value;
  
  a second storing step of storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words;
  
  a selection step of selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection step selects said candidate second words likely to be concatenated linguistically to a directly previously recognized word and to subject the concatenated candidate first and candidate second words to the matching processing;
  
  a score calculation step of calculating said score of said candidate first and candidate second words selected by said selection step referencing concatenation information of said first and second words; and
  
  a finalizing step of finalizing a word string, as the recognition result of said speech, based on said score,wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words,wherein the word concatenation information is sequentially updated based on the score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Asano, Yasuharu, Minamino, Katsuki, Ogawa, Hiroaki, Lucke, Helmut
Primary Examiner(s)
Wozniak, James S
Assistant Examiner(s)
He, Jialong

Application Number

US10/019,125
Publication Number

US 20020173958A1
Time in Patent Office

3,637 Days
Field of Search

704251-252, 704254-257
US Class Current

704/252
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 2015/085   Methods for reducing search...

Speech recognition device and speech recognition method and recording medium utilizing preliminary word selection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

44 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition device and speech recognition method and recording medium utilizing preliminary word selection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links