Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs

US 5,596,679 A
Filed: 10/26/1994
Issued: 01/21/1997
Est. Priority Date: 10/26/1994
Status: Expired due to Term

First Claim

Patent Images

1. In a speech-recognition system having a plurality of classifiers a method of identifying a spoken sound the method comprising the following steps:

(a) receiving a plurality of classifier output signal sequences from the classifiers, each of the classifier output signal sequences having been generated according to a polynomial discriminant function;

(b) defining a voting window to include portions of the classifier output signal sequences occurring within a finite period of time;

(c) selecting a winning classifier output corresponding to an interval within the finite period of the voting window, the winning classifier output signal having a magnitude larger than other classifier output signals that correspond to the same interval;

(d) repeating step (c) for a plurality of intervals occurring within the finite period to generate a plurality of winning classifier output signals; and

(e) identifying the spoken sound by determining which of the classifier output signal sequences includes the most winning classifier output signals within the voting window.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech-recognition system having a plurality of classifiers, a voting window includes a sequence of outputs from each of the classifiers. At each interval in the voting window, the outputs are compared to determine a winning output. A spoken sound is identified by determining which classifier generates the greatest number of winning outputs in the voting window.

60 Citations

View as Search Results

22 Claims

1. In a speech-recognition system having a plurality of classifiers a method of identifying a spoken sound the method comprising the following steps:
- (a) receiving a plurality of classifier output signal sequences from the classifiers, each of the classifier output signal sequences having been generated according to a polynomial discriminant function;
  
  (b) defining a voting window to include portions of the classifier output signal sequences occurring within a finite period of time;
  
  (c) selecting a winning classifier output corresponding to an interval within the finite period of the voting window, the winning classifier output signal having a magnitude larger than other classifier output signals that correspond to the same interval;
  
  (d) repeating step (c) for a plurality of intervals occurring within the finite period to generate a plurality of winning classifier output signals; and
  
  (e) identifying the spoken sound by determining which of the classifier output signal sequences includes the most winning classifier output signals within the voting window.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising the step of:
    - generating a system output which includes a class label representing the spoken sound.
  - 3. The method of claim 1 wherein the polynomial discriminant function has a form ##EQU3## wherein x_j represents a plurality of features;
    - wherein i, j, m and n are integers, y represents a classifier output signal;
      
      wherein w_i represents a coefficient;
      
      wherein g_ji represents an exponent.
  - 4. The method of claim 1, wherein the classifier output signal sequences have a duration which is greater than the finite period of time of the voting window and step (b) includes the following sub-step:
    - defining the voting window so that the finite period of time overlaps with the finite period of time of a second voting window.
  - 5. The method of claim 4 wherein the speech-recognition system identifies a plurality of spoken sounds from continuous speech.
  - 6. The method of claim 1 wherein the plurality of intervals consists of three successive intervals.
  - 7. The method of claim 1 wherein the spoken sound is selected from a group consisting of word, syllable, and phoneme.

8. A method for recognizing a spoken sound from continuous speech, comprising the following steps:
- (a) receiving the continuous speech;
  
  (b) sampling the continuous speech, over time, to form a sequence of sample datum which represents the continuous speech;
  
  (c) partitioning the sequence of sample datum into a sequence of data frames, each of the sequence of data frames including at least two of the sequence of sample datum;
  
  (d) extracting a plurality of features from the sequence of data frames;
  
  (e) forming a sequence of feature frames from the plurality of features;
  
  (f) distributing the sequence of feature frames to a plurality of classifiers, each of the classifiers generating a classifier output signal sequence in response thereto according to a polynomial discriminant function, whereby producing a plurality of classifier output signal sequences;
  
  (g) defining a voting window to include portions of the classifier output signal sequences occurring within a finite period of time;
  
  (h) selecting a winning classifier output signal corresponding to an interval within the finite period of the voting window, the winning classifier output signal having a magnitude larger than other classifier output signals that correspond to the same interval;
  
  (i) repeating step (h) for a plurality of intervals occurring within the finite period to generate a plurality of winning classifier output signals; and
  
  (j) identifying the spoken sound by determining which of the classifier output signal sequences includes the most winning classifier output signals within the voting window.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8 further comprising the step of:
    - generating a system output which includes a winning class label representing the spoken sound.
  - 10. The method of claim 8 wherein the polynomial discriminant function has a form ##EQU4## wherein x_j represents the plurality of features included in the feature frames;
    - wherein i, j, m and n are integers, y represents a classifier output signal, wherein w_i represents a coefficient;
      
      wherein g_ji represents an exponent.
  - 11. The method of claim 8 wherein classifier output signal sequences have a duration which is greater than the finite period of time of the voting window and step (g) includes the following sub-step:
    - defining the voting window so that the finite period of time overlaps with the finite period of time of a second voting window.
  - 12. The method of claim 8 wherein the speech-recognition system recognizes a plurality of spoken sounds from the continuous speech.
  - 13. The method of claim 8 wherein the plurality of intervals consists of three successive intervals.
  - 14. The method of claim 8 wherein the spoken sound is selected from a group consisting of word, syllable, and phoneme.

15. A speech-recognition system for identifying a spoken sound, the speech-recognition system comprising:
- a plurality of classifiers for generating a plurality of classifier output signal sequences, each of the classifier output signal sequences being generated according to a polynomial discriminant function;
  
  defining means for defining a voting window to include portions of the classifier output signal sequences occurring within a finite period of time;
  
  determining means, associatively coupled to the defining means and the plurality of classifiers, for comparing classifier output signals corresponding to an interval occurring within the voting window to select a winning classifier output signal corresponding to the interval, the winning classifier output signal having a largest magnitude, the determining means repeating the comparison for a plurality of intervals occurring within the voting window to generate a plurality of winning classifier output signals; and
  
  identifying means, associatively coupled to the defining means and the determining means, for identifying the spoken sound by determining which of the classifier output signal sequences includes the most winning classifier output signals within the voting window.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The speech-recognition system of claim 15 wherein the identifying means generates a system output which includes a class label representing the spoken sound.
  - 17. The speech-recognition system of claim 15 wherein polynomial discriminant function has a form ##EQU5## wherein x_j represents a plurality of features;
    - wherein i, j, m and n are integers, y represents a classifier output signal;
      
      wherein w_i represents a coefficient;
      
      wherein g_ji represents an exponent.
  - 18. The speech-recognition system of claim 15 wherein the plurality of intervals consists of three successive intervals.
  - 19. The speech-recognition system of claim 15 wherein the spoken sound is selected from the group consisting of word, syllable, and phoneme.
  - 20. The speech-recognition system of claim 15, wherein the defining means defines a plurality of voting windows, each of the plurality of voting windows having a different starting time.
  - 21. The speech-recognition system of claim 20, wherein the classifier output signal sequences have a duration which is greater than the finite period of time of the voting window and the defining means defines the voting window so that the finite period of time overlaps with the finite period of time of a second voting window.
  - 22. The speech-recognition system of claim 20 wherein the speech-recognition system identifies a plurality of spoken sounds from continuous speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Wang, Shay-Ping T.
Primary Examiner(s)
Knepper, David D.

Application Number

US08/329,395
Time in Patent Office

818 Days
Field of Search

395/2, 395/2.12, 395/2.13, 395/2.26, 395/2.45, 395/2.57, 395/2.62, 395/2.13, 395/2.41, 381/41-43
US Class Current

704/236
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/10 using distance or distortio...

Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links