Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns

US 5,025,471 A
Filed: 08/04/1989
Issued: 06/18/1991
Est. Priority Date: 08/04/1989
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing an acoustic input speech signal for extraction of individual utterances comprising the steps of:

(a) converting said speech signal into a first and second sequence of speech related samples;

(b) correlating the first sequence of speech related samples to derive a first histogram representing the input speech signal;

(c) correlating the second sequence of speech related samples to derive a second histogram representing the input speech signal;

(d) compressing the first and second histograms to derive a plurality of spaced channels;

(e) generating a compression histogram representing at least a part of the input speech signal from the spaced channels;

(f) repeating steps (a)-(e) to generate a sequence of compression histograms said sequence of compression histogram representing a transformation of the input speech signal;

(g) identifying end points for each utterance in the sequence of compression histograms; and

(h) extracting individual utterances from the sequence of compression histograms between the identified utterance end points.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech signals are analyzed by correlating a sequence of samples to derive a sliding average magnitude difference function (SAMDF) whereby histograms are formed which are compressed and normalized to form histogram sequences representing the speech signal for comparison and recognition.

48 Citations

View as Search Results

38 Claims

1. A method for processing an acoustic input speech signal for extraction of individual utterances comprising the steps of:
- (a) converting said speech signal into a first and second sequence of speech related samples;
  
  (b) correlating the first sequence of speech related samples to derive a first histogram representing the input speech signal;
  
  (c) correlating the second sequence of speech related samples to derive a second histogram representing the input speech signal;
  
  (d) compressing the first and second histograms to derive a plurality of spaced channels;
  
  (e) generating a compression histogram representing at least a part of the input speech signal from the spaced channels;
  
  (f) repeating steps (a)-(e) to generate a sequence of compression histograms said sequence of compression histogram representing a transformation of the input speech signal;
  
  (g) identifying end points for each utterance in the sequence of compression histograms; and
  
  (h) extracting individual utterances from the sequence of compression histograms between the identified utterance end points.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method for processing an acoustic speech signal as in claim 1 wherein the step of converting the input speech signal comprises the steps of:
    - filtering and digitizing the input speech signal to generate the first sequence of speech related samples; and
      
      filtering, digitizing, differentiating and peak clipping the input speech signal to generate the second sequence of speech related samples.
  - 3. A method for processing an acoustic input speech signal as in claim 1 wherein the step of correlating the first sequence of speech related samples comprises the step of calculating the sliding average magnitude difference function (SAMDF) from the first sequence of speech related samples to derive measurements for the first histogram representing the input speech signal.
  - 4. A method for processing an acoustic input speech signal as in claim 3 wherein the first histogram comprises at least four measurements.
  - 5. A method for processing an acoustic input speech signal as in claim 4 wherein the step of correlating the second sequence of speech related samples comprises the step of calculating the sliding average magnitude difference function (SAMDF) from the second sequence of speech related samples to derive measurements for the second histogram representing the input speech signal.
  - 6. A method for processing an acoustic input speech signal as in claim 5 wherein the second histogram comprises at least sixteen measurements.
  - 7. A method for processing an acoustic input speech signal as in claim 6 wherein the step of compressing comprises the steps of:
    - selecting the first four measurements from the first histogram comprising the first spaced channel;
      
      compressing the first eight measurements from the second histogram into four measurements by averaging adjacent measurements across the first eight measurements, the four measurements comprising the second spaced channel; and
      
      compressing the first sixteen measurements from the second histogram into four measurements by averaging four adjacent measurements at a time across the first sixteen measurements, the four measurements comprising the third spaced channel.
  - 8. A method for processing an acoustic input speech signal as in claim 7 wherein the step of compressing further comprises the steps of:
    - averaging the amplitude of selected measurements in each of the three spaced channels to generate three amplitude averaged measurements; and
      
      averaging the amplitude of selected measurements across all three spaced channels to generate a fourth amplitude averaged measurement,the four amplitude averaged measurements comprising the fourth spaced channel.
  - 9. A method for processing an acoustic input speech signal as in claim 8 wherein the step of identifying end points of an utterance in the sequence of compression histograms comprises the step of comparing on a histogram by histogram basis across the sequence of compression histograms the measurements in each spaced channel with a fixed threshold measurement to identify compression histograms indicative of unvoiced onset and offset.

10. A method for processing a sequence of histograms representing a transformation of an extracted portion of a time varying input signal for recognition of certain signal patterns within said extracted portion, each histogram in the sequence of histograms having a plurality of channels, each channel comprising a plurality of measurements, said method for processing comprising the steps of:
- generating and storing a plurality of identification templates, each identification template representing a signal pattern to be identified;
  
  time normalizing the sequence of histograms;
  
  amplitude normalizing the sequence of histograms;
  
  generating and storing a test template from the time and amplitude normalized histogram sequence; and
  
  comparing the identification templates with the test template for a match to identify the signal pattern.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 11. A method for processing as in claim 10 wherein each identification template comprises a sequence of histograms, each histogram having at least four channels with at least four measurements per channel.
  - 12. A method for processing as in claim 10 wherein the test template comprises a sequence of histograms, each histogram having at least four channels with at least four measurements per channel.
  - 13. A method as in claim 10 wherein the step of generating the identification template representing a signal pattern further comprises the step of merging a plurality of the same signal pattern generated by a plurality of sources.
  - 14. A method as in claim 10 wherein the step of time normalizing the sequence of histograms representing the transformation of the extracted portion of the time varying signal comprises the steps of:
    - detecting signal pattern beginning and end points for each extracted portion in the sequence of histograms; and
      
      calculating a center reference point in the sequence of histograms for each extracted portion.
  - 15. A method as in claim 14 wherein the step of calculating a center reference point further comprises the steps of:
    - starting at the detected beginning point and ending at the detected end point, summing selected measurements in each channel across the sequence of histograms representing the signal pattern to obtain a first sum;
      
      starting at the detected beginning point and ending at the detected end point, summing the selected measurements in each channel across the sequence of histograms representing the signal pattern to obtain a second sum until the second sum equals or exceeds one-half of the first sum to determine the position of the center reference point in the histogram sequence; and
      
      storing the location of the center reference point of the sequence of histograms representing the signal pattern.
  - 16. A method as in claim 15 wherein the step of time normalizing the sequence of histograms representing the signal pattern further comprises the steps of:
    - time normalizing the sequence of histograms from the beginning point to said center reference point; and
      
      time normalizing the sequence of histograms from the center reference point to the end point.
  - 17. A method as in claim 10 wherein the step of amplitude normalizing the sequence of histograms representing the signal pattern comprises the steps of:
    - amplitude normalizing each measurement within each histogram in the sequence of histograms and within each channel with a first algorithm;
      
      amplitude normalizing each measurement in all channels across the sequence of histograms with a second algorithm; and
      
      amplitude normalizing selected measurements within each histogram in the sequence of with a third algorithm.
  - 18. A method for processing as in claim 10 wherein the step of amplitude normalizing further comprises the steps of:
    - identifying an upper and lower value for selected measurements within each channel for each histogram in the sequence of histograms; and
      
      scaling each measurement within each channel for each histogram in the sequence of histograms between a fixed minimum and maximum value relative to the identified upper and lower values.
  - 19. A method as in claim 18 wherein the step of amplitude normalizing further comprises the steps of:
    - scaling selected measurements in the first three channels in each histogram by each of the three algorithms;
      
      rescaling the measurements in the first three channels in each histogram by each of the three algorithms; and
      
      scaling the measurements in the fourth channel by the first algorithm and rescaling the measurements in the fourth channel by the second algorithm.
  - 20. A method for processing as in claim 10 wherein the step of comparing the identification and test templates for a match comprises the steps of:
    - (a) comparing the measurements for the test template to the measurements for the identification template on a channel by channel, histogram by histogram basis;
      
      (b) generating a comparison score representing the value difference between the measurements for the test template and the identification template for each histogram in the sequence of histograms;
      
      (c) adding each value difference for each histogram compared in the sequence of histograms to calculate a total difference score;
      
      (d) repeating steps (a)-(c) for each identification template; and
      
      (e) outputting as a match the identification template that produces the lowest total difference score below a threshold level and no other comparisons produce scores close to the lowest score.
  - 21. A method as in claim 10 further comprising the step of adapting the identification template when no match occurs between the test template and the identification template.
  - 22. A method as in claim 21 wherein the step of adapting the identification template further comprises the steps of:
    - identifying intruding identification templates close to the identification template;
      
      subtracting the test template from any identified intruding identification templates to minimize the influence of each intruding identification template;
      
      updating the identification template with the test template; and
      
      updating the template for all intruding identification templates.

23. A method for extracting the information bearing portions of an acoustic speech signal comprising the steps of:
- (a) digitizing the acoustic speech signal to produce a plurality of sequences of speech samples;
  
  (b) correlating each sequence of speech samples to derive a histogram comprising a plurality of measurements;
  
  (c) compressing the plurality of measurements for the histogram to generate a compression histogram representing at least a part of the acoustic speech signal, said step of compressing comprising the step of averaging selected measurements for the histogram to generate the measurements that comprise the compression histogram; and
  
  (d) repeating steps (a)-(c) to output a sequence of compression histograms representing a transformation of the acoustic speech signal.
- View Dependent Claims (24, 25, 26, 27)
- - 24. A method for extracting the information bearing portions of an acoustic speech signal as in claim 23 further comprising the steps of:
    - identifying end points in the sequence of compression histograms to identify the information bearing portions of the acoustic speech signal; and
      
      extracting the information bearing portions from the sequence of compression histograms between detected end points.
  - 25. A method for extracting as in claim 24 wherein the step of correlating comprises the step of calculating the sliding average magnitude difference function (SAMDF) for each sequence of speech samples to generate a histogram.
  - 26. A method for extracting as in claim 24 wherein the step of detecting comprises the step of comparing on a histogram by histogram basis the value of each histogram in the sequence of compression histograms to a threshold value to determine instances of unvoiced onset and offset.
  - 27. A method for extracting as in claim 23 wherein the step of digitizing comprises the steps of:
    - generating a first sequence of speech samples from a broadband digitized version of the acoustic speech signal; and
      
      generating a second sequence of data samples from a digitized, differentiated and infinitely clipped version of the acoustic speech signal.

28. A method for processing a sequence of histograms representing information bearing portions of an acoustic speech signal for recognition of individual utterances comprising the steps of:
- storing a plurality of identification templates representing the individual utterances to be recognized;
  
  time and amplitude normalizing the sequence of histograms;
  
  generating a test template from the time and amplitude normalized sequence of histograms; and
  
  comparing the test template to the identification templates for matching and recognition.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
- - 29. A method for processing as in claim 28 wherein the step of time normalizing comprises the steps of:
    - identifying beginning and end points for each individual utterance in the sequence of histograms; and
      
      generating a center reference point for each individual utterance in the sequence of histograms.
  - 30. A method for processing for recognition as in claim 29 wherein each histogram comprises a plurality of measurements and the step of generating a center reference point further comprises the steps of:
    - starting at the beginning point, combining selected measurements across the sequence of histograms to the end point to obtain a first sum; and
      
      starting at the beginning point, combining selected measurements across the sequence of histograms until a second sum equals or exceeds one-half of the first sum to generate the center reference point.
  - 31. A method for processing as in claim 29 further comprising the steps of:
    - time normalizing each histogram in the sequence of histograms from the beginning point to the center reference point; and
      
      time normalizing each histogram in the sequence of histograms from the center reference point to the end point.
  - 32. A method for processing as in claim 29 wherein each histogram comprises a plurality of measurements and amplitude normalization comprises the steps of:
    - identifying a maximum and a minimum value across selected measurements for the histograms in the sequence of histograms for each utterance with each of three algorithms; and
      
      scaling each selected measurement for the histograms in the sequence to a value between zero and fifteen relative to the located maximum and minimum values with each of three algorithms.
  - 33. A method for processing as in claim 28 wherein the step of comparing comprises the steps of:
    - comparing on a histogram by histogram basis the normalized histogram sequence representing the test template with each of the stored histogram sequences representing the identification templates;
      
      generating a value difference score between each histogram in the normalized and stored sequence of histograms;
      
      combining each value difference score for each histogram in the sequence compared to generate a total difference score for each template comparison; and
      
      identifying as a match the vocabulary template that produces the lowest total difference score below a threshold level when no other comparisons produce scores below the threshold level.
  - 34. A method as in claim 33 further comprising the step of adapting the identification template when no match occurs between the test template and the identification template.
  - 35. A method as in claim 34 wherein the step of adapting the identification template further comprises the steps of:
    - identifying identification templates having a match as against the test template for the unknown information-bearing portion of the input signal within a defined difference to the score of the correct identification template;
      
      subtracting the test template from the identified identification templates to reduce the influence of the identified identification templates; and
      
      updating the identification template having an identified match with the test template.

36. Apparatus for processing an acoustic speech signal for recognition of individual utterances comprising:
- (a) means for converting said speech signal into a first and second sequence of data samples;
  
  (b) means for correlating the first sequence of data samples into a first histogram representing the input speech signal, said first histogram comprising a plurality of data measurements;
  
  (c) means for correlating the second sequence of data samples into a second histogram representing the input speech signal, said second histogram comprising a plurality of data measurements;
  
  (d) means for selectively compressing the plurality of data measurements in the first and second histograms into a plurality of data channels, each data channel comprised of a plurality of data measurements, the total number of measurements in all channels being less than the total measurements in said first and second histograms; and
  
  (e) means for repeating steps (a)-(d) to produce a sequence of histograms within each data channel, said sequence of histograms representing a transformation of the speech signal.
- View Dependent Claims (37)
- - 37. The apparatus as in claim 36 further comprising:
    - means for processing each histogram in the sequence of histograms to identify end points of individual utterances; and
      
      means for storing the individual utterances in the sequence of histograms between the detected end points.

38. Apparatus for processing a sequence of histograms representing a transformation of an utterance extracted from an acoustic speech signal, comprising:
- means for storing a vocabulary template representing the utterance to be processed;
  
  means for time and amplitude normalizing the sequence of histograms to generate a test template representing the extracted utterance; and
  
  means for comparing the vocabulary template to the test template for matching and recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Scott Instruments Company Denton TX
Inventors
Newell, J. Mark, Smith, Lloyd A., Lin, Lisan S., Balentine, Bruce E., Scott, Brian L.
Primary Examiner(s)
Kemeny, Emanuel S.

Application Number

US07/389,682
Time in Patent Office

683 Days
Field of Search

381/29-43, 364/513.5
US Class Current

704/237
CPC Class Codes

G10L 15/02 Feature extraction for spee...

Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

48 Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

48 Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links