Speech recognition dividing words into two portions for preliminary selection

US 5,018,201 A
Filed: 11/30/1988
Issued: 05/21/1991
Est. Priority Date: 12/04/1987
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:

a first memory means for storing, for each word in a vocabulary, a probability of producing each label in a label set at an arbitrary time interval in a fixed length first portion of an utterance of said word;

a second memory means for storing, for each word in said vocabulary, a probability of producing each label in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of said word;

means for determining, upon the generation of a label for an inputted speech to be recognized, whether the label belongs to said first portion or said second portion;

means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary wit reference to said first memory means;

means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means;

means for accumulating the probabilities outputted for each word;

means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and

means for performing detailed recognition for each of the specified candidate words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus makes a preliminary selection of a number of candidate words from a vocabulary of words, one of which candidate words is most likely the spoken word to be recognized. For the preliminary selection, each candiate word is divided into first and second portions. For each portion of a word, there are stored probabilites of producing each label of a label alphabet during the utterance of that portion of the word. The speech to be recognized is also divided into first and second portions. A label string representing the speech to be recognized is generated, such that labels occur during the first or the second portion of the speech (or during a transition between the first and second portions. To determine the likelihood that the spoken word represents a word from the vocabulary, each label occurring during the first portion is assigned its "first portion" probability. Each label occurring during the second portion is assigned its "second portion" probability.

Citations

8 Claims

1. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:
- a first memory means for storing, for each word in a vocabulary, a probability of producing each label in a label set at an arbitrary time interval in a fixed length first portion of an utterance of said word;
  
  a second memory means for storing, for each word in said vocabulary, a probability of producing each label in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of said word;
  
  means for determining, upon the generation of a label for an inputted speech to be recognized, whether the label belongs to said first portion or said second portion;
  
  means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary wit reference to said first memory means;
  
  means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means;
  
  means for accumulating the probabilities outputted for each word;
  
  means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and
  
  means for performing detailed recognition for each of the specified candidate words.

2. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:
- means for accumulating, upon the generation of a label for a training utterance of each word in a vocabulary, a first and a second weight to determine the first and second statistical values of the label concerned, said first and second weights being functions of a time interval from a front edge of the utterance to the generation of the label concerned;
  
  means for normalizing the first and second statistical values of each label in a label set for each word in said vocabulary;
  
  a first memory means for storing the normalized first statistical value of each label in said label set for each word in said vocabulary as the probability of producing the label concerned in said label set at an arbitrary time interval in a fixed length first portion of the utterance of the word;
  
  a second memory means for storing the normalized second statistical value of each label in said label set for each word in said vocabulary as the probability of producing the label concerned in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of the word;
  
  means for determining whether a label generated for an inputted speech to be recognized belongs to said first portion or said second portion;
  
  means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary with reference to said first memory means;
  
  means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means;
  
  means for accumulating the probabilities outputted for each word;
  
  means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and
  
  means for performing detailed recognition processing for each of the specified candidate words.
- View Dependent Claims (3, 4)
- - 3. A speech recognition apparatus according to claim 2, wherein said first weight becomes gradually smaller and said second weight becomes gradually larger as the time interval between the front edge of the utterance to the generation of the label increases at least as long as said label generation time point is around the boundary between said first and second half portions.
  - 4. A speech recognition apparatus according to claim 3, wherein, upon the generation of a label for said training utterance, said first weight and said second weight are accumulated for each label in said label set in response to the probability of the label concerned being confused with the generated label.

5. A speech recognition apparatus comprising:
- acoustic means for receiving an utterance and producing label signals in response to the utterance, said label signals being selected from a set of label signals;
  
  first memory means for storing, for each word k in a vocabulary and for each label signal i in the set of label signals, a signal P₁ (k, i) representing the probability of producing the label signal i in a first portion of an utterance of the word k;
  
  second memory means for storing, for each word k in the vocabulary and for each label signal i in the set of label signals, a signal P₂ (k, i) representing the probability of producing the label signal i in a second portion of an utterance of the word k following the first portion of the utterance of the word;
  
  means for selecting, from the label signals produced by the acoustic means, a series of label signals representing the utterance of an inputted speech to be recognized, said inputted speech having a first portion and a second portion following the first portion, each label signal corresponding to the first portion or the second portion of the inputted speech;
  
  means for outputting probability signals P₁ (k, i) from the first memory means for label signals corresponding to the first portion of the utterance of the inputted speech to be recognized for each word k in the vocabulary;
  
  means for outputting probability signals P₂ (k, i) from the second memory means for label signals corresponding to the second portion of the utterance of the inputted speech to be recognized for each word k in the vocabulary;
  
  means for accumulating the output probability signals for each word k to produce a likelihood signal for each word, each likelihood signal having a magnitude; and
  
  means for selecting a candidate word in accordance with the magnitude of the likelihood signals and producing a word output signal representing the candidate word.
- View Dependent Claims (6, 7, 8)
- - 6. A speech recognition apparatus as claimed in claim 5, characterized in that:
    - the means for selecting a candidate word comprises means for selecting at least two candidate words in accordance with the magnitude of the likelihood signals and producing a candidate word output signal representing each candidate word; and
      
      the apparatus further comprises means for matching the inputted speech to be recognized against each of the specified candidate word output signals to produce a recognition word output signal representing the inputted speech to be recognized.
  - 7. A speech recognition apparatus as claimed in claim 6, characterized in that:
    - the second portion of the utterance of each word is the remainder of the word following the first portion of the utterance of the word; and
      
      the second portion of the utterance of the inputted speech is the remainder of the inputted speech following the first portion of the utterance of the inputted speech.
  - 8. A speech recognition apparatus as claimed in claim 7, characterized in that:
    - the first and second portions of each word in the vocabulary overlap during an overlap time interval at the end of the first portion and at the beginning of the second portion; and
      
      both means for outputting probability signals apply weighting factors to output probability signals P₁ (k, i) and P₂ (k, i) for label signals i corresponding to overlap time intervals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Sugawara, Kazuhide
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/278,055
Time in Patent Office

902 Days
Field of Search

381/41-50, 364/513.5
US Class Current

704/252
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition dividing words into two portions for preliminary selection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition dividing words into two portions for preliminary selection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links