Speech recognition arrangement

US 4,937,870 A
Filed: 11/14/1988
Issued: 06/26/1990
Est. Priority Date: 11/14/1988
Status: Expired due to Term

First Claim

Patent Images

1. In a speech recognizer having a plurality of stored reference pattern templates each comprising a time frame sequence of acoustic spectral parameters of a prescribed reference pattern, a method for processing an input signal to recognize a speech pattern comprisinggenerating a time frame sequence of acoustic spectral parameters from said input signal,generating a time frame sequence of acoustic nonspectral parameters from said input signal,time aligning each of said reference pattern templates with said input signal based on reference pattern and input signal spectral parameters but independent of said nonspectral parameters,determining a set of similarity measures each representative of the similarity between spectral parameters of said input signal and spectral parameters of one of the time aligned reference pattern templates andselectively identifying said speech pattern in said input signal as one of said reference patterns based both on said similarity measures and on said nonspectral parameters,wherein said time aligning comprisesfor each of said reference patterns, pairing time frames of that reference pattern template with time frames of said input signal to maximize the similarity measure determined for that reference pattern, said pairing defining a scan region of input signal time frames for that reference pattern,wherein said selectively identifying comprisesfor each of said reference patterns, adjusting the determined similarity measured based on said nonspectral parameters over the scan region of input signal time frames for that reference pattern andselectively identifying said speech pattern in said input signal as one of said reference patterns based on said adjusted similarity measures.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition arrangement where nonspectral features of the input signal, e.g., energy and voicing parameters, are used to effectively remove non-speech events from consideration but only after a time warping procedure based solely on the input and reference pattern spectral parameters has been completed. The time warping procedure is not unduly complex because there is no need to weight spectral and nonspectral parameters in matching input and reference patterns. For each reference pattern, the time warping procedure defines a scan region of the input signal to be used in evaluating the nonspectral input signal characteristics. Energy and voicing parameters are useful in distinguishing non-speech events since speech patterns typically have few very low-energy frames (other than frames that are part of a gap within a vocabulary item) and more than a minimum number of voiced frames, e.g., frames corresponding to vowel sounds.

Citations

11 Claims

1. In a speech recognizer having a plurality of stored reference pattern templates each comprising a time frame sequence of acoustic spectral parameters of a prescribed reference pattern, a method for processing an input signal to recognize a speech pattern comprisinggenerating a time frame sequence of acoustic spectral parameters from said input signal,generating a time frame sequence of acoustic nonspectral parameters from said input signal,time aligning each of said reference pattern templates with said input signal based on reference pattern and input signal spectral parameters but independent of said nonspectral parameters,determining a set of similarity measures each representative of the similarity between spectral parameters of said input signal and spectral parameters of one of the time aligned reference pattern templates andselectively identifying said speech pattern in said input signal as one of said reference patterns based both on said similarity measures and on said nonspectral parameters,wherein said time aligning comprisesfor each of said reference patterns, pairing time frames of that reference pattern template with time frames of said input signal to maximize the similarity measure determined for that reference pattern, said pairing defining a scan region of input signal time frames for that reference pattern,wherein said selectively identifying comprisesfor each of said reference patterns, adjusting the determined similarity measured based on said nonspectral parameters over the scan region of input signal time frames for that reference pattern andselectively identifying said speech pattern in said input signal as one of said reference patterns based on said adjusted similarity measures.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method in accordance with claim 1 wherein said sequence of acoustic nonspectral parameters comprises energy parameters and wherein said adjusting comprisesfor each of said reference patterns, adjusting the determined similarity measure based on the number of input signal time frames of the scan region for that reference pattern that have an energy parameter less than a predetermined threshold.
  - 3. A method in accordance with claim 1 wherein said sequence of acoustic nonspectral parameters comprises energy parameters and wherein said adjusting comprisesfor each of said reference patterns, adjusting the determined similarity measure based on the number of input signal time frames of the scan region for that reference pattern that have an energy parameter less than a predetermined threshold and that are not part of a gap within a vocabulary item.
  - 4. A method in accordance with claim 1 wherein said sequence of acoustic nonspectral parameters comprises voicing parameters and wherein said adjusting comprisesfor each of said reference patterns, adjusting the determined similarity measured based on the number of input signal time frames of the scan region for that reference pattern that have a voicing parameter defining the frame as voiced.
  - 5. A method in accordance with claim 1 wherein said sequence of acoustic nonspectral parameters comprises energy parameters and voicing parameters and wherein said adjusting comprisesfor each of said reference patterns, adjusting the determined similarity measure based on the number of input signal time frames of the scan region for that reference pattern that have an energy parameter less than a predetermined threshold and that are not part of a gap within a vocabulary item and based on the number of input signal time frames of the scan region for that reference pattern that have a voicing parameter defining the frame as voiced.
  - 6. A method in accordance with claim 1 wherein said sequence of acoustic nonspectral parameters comprises voicing parameters and wherein said adjusting comprisesfor each of said reference patterns, adjusting the determined similarity measure based on the difference between a minimum number and the number of input signal time frames of the scan region for that reference pattern that have a voicing parameter defining the frame as voiced.
  - 7. A method in accordance with claim 6 wherein said minimum number is constant for all of said reference patterns.
  - 8. A method in accordance with claim 6 wherein said minimum number is a variable dependent on an expected number of voiced frames for a reference pattern.

9. In a speech recognizer having a plurality of stored reference pattern templates each comprising a time frame sequence of acoustic spectral parameters of a prescribed reference pattern, a method for processing an input signal to recognize a speech pattern comprisinggenerating a time frame sequence of acoustic spectral parameters from said input signal,generating a time frame sequence of voicing parameters from said input signal, each of said voicing parameters defining the presence or absence of a vowel sounddetermining a set of similarity measures each representative of the similarity between spectral parameters of said input signal and spectral parameters of one of the reference pattern templates andselectively identifying said speech pattern in said input signal as one of said reference patterns based both on said similarity measures and on said voicing parameters.

10. A speech recognizer for processing an input signal to recognize a speech pattern comprisingmemory means for storing a plurality of reference pattern templates each comprising a time frame sequence of acoustic spectral parameters of a prescribed reference pattern anddigital signal processor means comprisingmeans responsive to said input signal for generating a time frame sequence of acoustic spectral parameters,means responsive to said input signal for generating a time frame sequence of acoustic nonspectral parameters,means for time aligning each of said reference pattern templates with said input signal based on reference pattern and input signal spectral parameters but independent of said nonspectral parameters,means for determining a set of similarity measures each representative of the similarity between spectral parameters of said input signal and spectral parameters of one of the time aligned reference pattern templates andmeans for selectively identifying said speech pattern in said input signal as one of said reference patterns based both on said similarity measures and on said nonspectral parameters,wherein said time aligning means comprisesmeans for pairing, for each of said reference patterns, time frames of that reference pattern template with time frames of said input signal to maximize the similarity measure determined by said determining means for that reference pattern, said pairing defining a scan region of input signal time frames for that reference pattern,wherein said selectively identifying means comprisesmeans for adjusting, for each of said reference patterns, the determined similarity measure based on said at least one nonspectral parameter andmeans for selectively identifying said speech pattern in said input signal as one of said reference patterns based on said adjusted similarity measures.

11. A speech recognizer for processing an input signal to recognize a speech pattern comprisingmemory means for storing a plurality of reference pattern templates each comprising a time frame sequence of acoustic spectral parameters of a prescribed reference pattern anddigital signal processor means comprisingmeans responsive to said input signal for generating a time frame sequence of acoustic spectral parameters,means responsive to said input signal for generating a time frame sequence of voicing parameters, each of said voicing parameters defining the presence or absence of a vowel sound,means for determining a set of similarity measures each representative of the similarity between spectral parameters of said input signal and spectral parameters of one of the reference pattern templates andmeans for selectively identifying said speech pattern in said input signal as one of said reference patterns based both on said similarity measures and on said voicing parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
American Telephone & Telegraph Company (AT&T, Inc.), Bell Telephone Laboratories, Inc. (Nokia Corporation)
Original Assignee
American Telephone & Telegraph Company (AT&T, Inc.)
Inventors
Bossemeyer, Robert W. Jr.
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/269,527
Time in Patent Office

589 Days
Field of Search

364/513.5, 381/41-43
US Class Current

704/241
CPC Class Codes

G10L 15/12 using dynamic programming t...

Speech recognition arrangement

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition arrangement

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links