Syntactic continuous speech recognizer

US 4,277,644 A
Filed: 07/16/1979
Issued: 07/07/1981
Est. Priority Date: 07/16/1979
Status: Expired due to Term

First Claim

Patent Images

1. Apparatus for recognizing an utterance as one of a plurality of reference word sequences comprising:

means (205) for storing a set of signals each representative of the acoustic features of a reference word;

means (203) responsive to said utterance for generating a signal representative of the acoustic features of the utterance;

means (411) for generating a set of first signals defining the plurality of sequences, each sequence being a series of reference words;

means (427) for generating a succession of second signals identifying the successive word positions of the plurality of sequences;

means (209, 425,

427) responsive to the set of first signals, the utterance feature signals and the reference word feature signals for producing a set of third signals each representative of the correspondence between said utterance and one of said sequences; and

means (430) responsive to said third signals for selecting the sequence having the closest correspondence to said utterance;

characterized in thatsaid third signal producing means comprises;

means (361) operative for each sequence in each word position identified by said second signals for storing a fourth signal representative of the word position utterance segment endpoint of said sequence;

means (351,383,411) operative for each sequence in each word position jointly responsive to said first signals and stored fourth signal of the preceding word position of the sequence for selecting the utterance feature signals beginning at the preceding word position utterance segment endpoint of the sequence and the current word position reference word feature signals of the sequence; and

means (209) jointly responsive to said sequence selected utterance feature signals and the sequence selected reference word feature signals for concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the selected utterance segment feature signals from the preceding word position utterance segment endpoint for the sequence and the current word position selected reference word feature signals for the sequence.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The accuracy of segmenting an utterance into words is improved by the use of sequence-defining signals. An utterance is recognized as one of a plurality of reference word sequences in an arrangement wherein a set of signals is generated to define the syntax, i.e., word arrangements, of the sequences. Each sequence corresponds to a selected series of reference words. A signal is generated to identify the successive word positions of the sequences. Responsive to the sequence defining signals, the utterance, and the reference words, a set of signals is produced, each representing the correspondence between said utterance and one of the sequences. The sequence having the closest correspondence to the utterance is selected. The sequence correspondence signal generation includes selecting, in each identified word position, the word position reference word for each sequence and the portion of the utterance corresponding thereto responsive to said sequence defining signals, and generating a signal representative of the acoustic correspondence between each sequence word position reference word and its selected utterance portion.

57 Citations

View as Search Results

15 Claims

1. Apparatus for recognizing an utterance as one of a plurality of reference word sequences comprising:
- means (205) for storing a set of signals each representative of the acoustic features of a reference word;
  
  means (203) responsive to said utterance for generating a signal representative of the acoustic features of the utterance;
  
  means (411) for generating a set of first signals defining the plurality of sequences, each sequence being a series of reference words;
  
  means (427) for generating a succession of second signals identifying the successive word positions of the plurality of sequences;
  
  means (209, 425,
  
  427) responsive to the set of first signals, the utterance feature signals and the reference word feature signals for producing a set of third signals each representative of the correspondence between said utterance and one of said sequences; and
  
  means (430) responsive to said third signals for selecting the sequence having the closest correspondence to said utterance;
  
  characterized in thatsaid third signal producing means comprises;
  
  means (361) operative for each sequence in each word position identified by said second signals for storing a fourth signal representative of the word position utterance segment endpoint of said sequence;
  
  means (351,383,411) operative for each sequence in each word position jointly responsive to said first signals and stored fourth signal of the preceding word position of the sequence for selecting the utterance feature signals beginning at the preceding word position utterance segment endpoint of the sequence and the current word position reference word feature signals of the sequence; and
  
  means (209) jointly responsive to said sequence selected utterance feature signals and the sequence selected reference word feature signals for concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the selected utterance segment feature signals from the preceding word position utterance segment endpoint for the sequence and the current word position selected reference word feature signals for the sequence.
- View Dependent Claims (2, 3)
- - 2. Apparatus for recognizing an utterance as one of a plurality of reference word sequences according to claim 1;
    - further characterized in that said closest corresponding sequence selecting means (430) includes;
      
      means (203) responsive to said utterance for generating a signal representative of the range of endpoints of said utterance;
      
      means (211,213,215) operative in each identified word position responsive to said first signals, said utterance endpoint range signal and the current word position utterance segment endpoint signal for each sequence, for detecting sequences which terminate in said word position and for identifying each terminating sequence having a current word position utterance segment endpoint within said utterance endpoint range as a candidate sequence for said utterance; and
      
      means (425,427,430) operative in each identified word position responsive to the correspondence signals of said candidate sequences for selecting and storing the candidate sequence having the closest correspondence to said utterance.
  - 3. Apparatus for recognizing an utterance as one of a plurality of reference word sequences according to claim 2 further characterized in thatsaid closest corresponding sequence selecting means (103,117,130,150) further comprises means (117) responsive to said second signals, said utterance endpoint range signal and said sequence current word position utterance segment endpoint signals for generating a signal corresponding to the occurrence of the final word position of said sequences;
    - andmeans (150) responsive to said final word position occurrence signal for identifying the utterance as the last stored candidate sequence.

4. In a continuous speech recognizer that includes means for storing a set of first signals defining a plurality of predetermined sequences, each sequence being a series of reference words;
- means for storing a set of acoustic feature signals for each reference word;
  
  means for generating a succession of second signals identifying the successive word positions of said sequences;
  
  means for receiving an utterance; and
  
  means for generating a set of acoustic feature signals representative of said received utterance;
  
  a method for recognizing the received utterance as one of the plurality of reference word sequences comprising the steps of;
  
  producing a set of third signals each representative of the acoustic correspondence between said utterance and one of said sequences responsive to the set of first signals, the utterance acoustic feature signals and the reference word acoustic feature signals; and
  
  selecting the sequence having the closest acoustic correspondence to said utterance responsive to said third signals;
  
  characterized in thatsaid third signal producing step comprises;
  
  storing a fourth signal representing the utterance segment endpoint for each sequence of reference words in each word position identified by said second signals;
  
  for each sequence in each successive word position identified by said second signals, selecting the utterance acoustic feature signals beginning at the preceding word position utterance segment endpoint for the sequence, and the acoustic feature signals of the current identified word position reference word for the sequence jointly responsive to the first signals and the stored fourth signal of the preceding word position for the sequence;
  
  and concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the acoustic correspondence between the current word position selected reference word for the sequence and the utterance segment from the preceding word position endpoint, said concurrent acoustic correspondence and fourth signal generation being jointly responsive to the current word position selected reference word acoustic feature signals for the sequence and the acoustic feature signals of the current word position selected utterance segment for the sequence.
- View Dependent Claims (5, 6)
- - 5. A method for recognizing an utterance as one of a plurality of reference word sequences according to claim 4characterized in that said closest corresponding sequence selecting step includes generating a signal representative of the range of endpoints of said utterance;
    - responsive to the first signals in each identified word position, detecting sequences which terminate in said identified word position;
      
      jointly responsive to the utterance endpoint range signal and the current word position utterance portion endpoint signal for each terminating sequence in each identified word position, identifying each terminating sequence having its selected utterance portion endpoint within said utterance endpoint range as a candidate sequence for said utterance; and
      
      responsive to the correspondence signals of said candidate sequences in each identified word position, selecting and storing the candidate sequence having the closest correspondence to said utterance.
  - 6. A method for recognizing an utterance as one of a plurality of reference word sequences according to claim 5 further characterized in that said closest corresponding sequence selecting step includes generating a signal corresponding to the occurrence of the final word position of said sequences responsive to said second signals, said utterance endpoint range signal, and said current word position utterance segment endpoint signals;
    - andidentifying the utterance as the last stored candidate sequence responsive to said final word position occurrence signal.

7. Apparatus for recognizing an utterance as one of a plurality of predetermined reference word sequences comprising:
- means for storing a set of signals each representative to the acoustic features of a reference word;
  
  means responsive to said utterance for generating a signal representative of the acoustic features thereof;
  
  means for generating a set of first signals defining the plurality of reference word sequences, each first signal including a first state code, a second state code, and a reference word code connected from the first state code to the second state code and each sequence being a selected plurality of state connected reference word codes;
  
  means for generating a succession of second signals identifying the successive word positions of the plurality of predetermined sequences;
  
  means for generating a set of final state coded signals each identifying the final word position of one of said predetermined sequences;
  
  means jointly responsive to said first signals, said utterance feature signals and said reference word feature signals, for producing a set of third signals each representative of the acoustic correspondence between said utterance and one of said predetermined reference word sequences; and
  
  means responsive to said third signals for selecting the sequence having the closest acoustic correspondence to said utterance;
  
  said third signal producing means including;
  
  means operative in each successive identified word position for storing a fourth signal representative of an utterance segment endpoint for each predetermined reference word sequence in said word position;
  
  means operative in each successive identified word position responsive to said first signals and to the stored fourth signal of the sequence in the preceding word position for selecting the current identified word position reference word for each sequence and for selecting the utterance feature signals beginning at the utterance segment endpoint of the sequence in the preceding word position;
  
  means jointly responsive to the feature signals of the selected reference word and the feature signals of the utterance segment beginning at the stored endpoint for the immediately preceding word position of the sequence for concurrently generating a fourth signal representative of the endpoint of the current identified word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the feature signals of the sequence selected reference word and the feature signals of utterance segment from the immediately preceding word position endpoint;
  
  and means for combining the current word position sequence reference word correspondence signal with the correspondence signal of the preceding word positions of the same sequence to form a cumulative correspondence signal for each sequence in each word position.
- View Dependent Claims (8, 9)
- - 8. Apparatus for recognizing an utterance as one of a plurality of predetermined reference word sequences according to claim 7 further comprising means responsive to said utterance feature signals for generating a signal corresponding to the endpoint of said utterance;
    - and wherein said means for selecting the sequence having the closest acoustic correspondence with said utterance comprises means operative in each word position responsive to said final state coded signals for detecting the sequences ending in said word position;
      
      means jointly responsive to said utterance endpoint signal and said detected sequence current word position utterance segment endpoint signals for selecting detected sequences having current word position utterance segment endpoints within a predetermined range of said utterance endpoint as candidate sequences for said utterance; and
      
      means responsive to the cumulative correspondence signals of said candidate sequences for selecting and storing the candidate sequence with the closest correspondence to said utterance in said current word position and said preceding word positions.
  - 9. Apparatus for recognizing an utterance as one of a plurality of predetermined reference word sequences according to claim 8 further comprising means responsive to said second signals for generating a signal corresponding to the occurrence of the final word position of said sequences;
    - and means responsive to said final word position occurrence signal for identifying the utterance as the last stored candidate sequence.

10. Apparatus for recognizing a continuous speech pattern as one of a plurality of predetermined reference word sequences comprising:
- means for generating a set of first signals defining the plurality of predetermined sequences, each sequence defining signal including a first state signal, a second state signal and a reference word code linked from said first state to said second state and each sequence being a series of selected state connected reference word codes terminating in an end state;
  
  means for generating a succession of second signals identifying the successive reference word positions of said sequences;
  
  means for generating a signal representative of each sequence end state;
  
  means responsive to each reference word for generating and storing a set of signals representative of the acoustic features of said reference word;
  
  means responsive to said continuous speech pattern for generating signals representative of the acoustic features of said speech pattern;
  
  means jointly responsive to said first signals, said reference word feature signal sets, and the feature signals of said continuous speech pattern for producing a set of third signals each representative of the correspondence between said continuous speech pattern and one of the reference word sequences; and
  
  means responsive to said third signals for identifying the continuous speech pattern as the closest corresponding reference word sequence;
  
  said third signal producing means comprising;
  
  means operative in each word position for storing a fourth signal representing the word position speech pattern segment endpoint for each reference word sequence;
  
  means operative for each sequence in each successive word position identified by said second signals jointly responsive to said first signals and the stored fourth signal of the preceding word position of the sequence for selecting the speech pattern feature signals beginning at the preceding word position speech pattern segment endpoint of the sequence, and for selecting the current identified word position reference word feature signals of each sequence;
  
  means jointly responsive to the feature signal set of the sequence selected reference word and the feature signals of the continuous speech pattern segment beginning at the stored speech pattern endpoint of the immediately preceding word position of the sequence for concurrently generating a fourth signal representative of the endpoint of the current identified word position continuous speech pattern segment corresponding to the selected reference word and a signal representative of the correspondence between the feature signals of the sequence selected reference word and the feature signals of the continuous speech pattern segment from the immediately preceding word position speech pattern endpoint of the sequence, said segment generated fourth signal in the current word position being stored in said endpoint storing means as the preceding word position endpoint signal for the next occurring word position.
- View Dependent Claims (11, 12)
- - 11. Apparatus for recognizing continuous speech pattern as one of a plurality of predetermined reference word sequences according to claim 10 further comprising means responsive to said continuous speech pattern for generating a signal representative of the range of endpoints of said speech pattern;
    - means operative in each word position responsive to said sequence end state signals for detecting sequences which terminate in said word position;
      
      means jointly responsive to said utterance endpoint range signal and the current word position selected speech pattern segment endpoint signal of each terminating sequence for identifying each terminating sequence having its endpoint within said utterance endpoint range as a candidate sequence for said utterance; and
      
      means operative in each identified word position responsive to the correspondence signals of said candidate sequences for detecting and storing the candidate sequence having the closest correspondence to said continuous speech pattern in said identified word position and said preceding identified word positions.
  - 12. Apparatus for recognizing a continuous speech pattern as one of a plurality of predetermined reference word sequences according to claim 11 wherein said continuous speech pattern identifying means comprises means operative in each identified word position responsive to said speech pattern endpoint range signal and said selected speech pattern segment endpoint signals in said current word position for detecting the final word position of said sequences;
    - and means responsive to the operation of said final word position detecting means for selecting the last stored candidate sequence as the continuous speech pattern.

13. In a continuous speech pattern recognition circuit that includes means for storing a set of first signals syntactically defining a plurality of predetermined reference word sequences, each sequence being a selected series of reference words and each first signal comprising a reference word code and a pair of state codes identifying the position of the reference word in said sequences;
- means for generating a succession of second signals identifying the successive word positions of said sequences;
  
  means for storing a set of acoustic feature signals for each reference word;
  
  means for receiving a continuous speech pattern; and
  
  means for generating a set of acoustic feature signals corresponding to said speech pattern;
  
  a method for recognizing the speech pattern as one of said plurality of predetermined reference word sequences comprising the steps of;
  
  producing a set of third signals each representative of the acoustic correspondence between said speech pattern and one of said reference word sequences jointly responsive to said first signals, the acoustic feature signal sets of said reference word and the acoustic feature signal set for said speech pattern;
  
  and responsive to the third signals for the sequences, identifying the speech pattern as the reference word sequence having the closest acoustic correspondence to said utterance;
  
  said third signal producing step comprising;
  
  for every sequence in each word position, storing a fourth signal representative of the speech pattern segment endpoint corresponding to the sequence in said word position;
  
  for each sequence in each successive word position identified by said second signals, selecting the speech pattern acoustic feature signals beginning at the preceding word position speech pattern endpoint for the sequence and selecting the acoustic feature signals of the current word position reference word of the sequence, jointly responsive to the first signals and the stored fourth signal of the preceding word position for the sequence;
  
  and concurrently generating a fourth signal representative of the endpoint of the current word position speech segment corresponding to the selected reference word and a signal representative of the acoustic correspondence between the current word position selected reference word for the sequence and the speech pattern segment beginning at the preceding word position endpoint jointly responsive to the current word position selected reference word acoustic feature signals for the sequence and the acoustic feature signals of the selected speech pattern segment for the sequence.
- View Dependent Claims (14, 15)
- - 14. A method for recognizing a speech pattern as one of the plurality of the predetermined reference word sequences according to claim 13 wherein said speech pattern identifying step includes generating a signal representative of the range of endpoints of said speech pattern;
    - responsive to said first signals in each identified word position, detecting the sequences which terminate in said word position;
      
      jointly responsive to the speech pattern endpoint range signal and said selected speech pattern segment endpoint signal for each terminating sequence in said identified word position, identifying each terminating sequence having its word position selected speech pattern segment endpoint within said speech pattern endpoint range as a candidate sequence for said speech pattern; and
      
      responsive to the correspondence signals of said candidate sequences in each identified word position, selecting and storing the candidate sequence having the closest correspondence to said speech pattern in said current identified word position and said preceding identified word positions.
  - 15. A method for recognizing a speech pattern as one of the plurality of predetermined reference word sequences according to claim 14 wherein said speech pattern identifying step includes generating a signal corresponding to the occurrence of the word position in which said speech pattern endpoint occurs jointly responsive to the selected speech pattern segment endpoint signals in each identified word position and said speech pattern endpoint range signal;
    - and identifying the speech pattern as the last stored candidate sequence responsive to said speech pattern endpoint occurrence word position signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bell Telephone Laboratories, Inc. (Nokia Corporation)
Original Assignee
Bell Telephone Laboratories, Inc. (Nokia Corporation)
Inventors
Levinson, Stephen E., Pirz, Frank C.
Primary Examiner(s)
Nusbaum, Mark E.
Assistant Examiner(s)
KEMENY, EMANUEL

Application Number

US06/057,749
Time in Patent Office

722 Days
Field of Search

179/1 SD, 179/1 SB, 179/1 VC, 340/146.3 WD
US Class Current

704/241
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Syntactic continuous speech recognizer

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Syntactic continuous speech recognizer

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links