Method and apparatus for real time speech recognition with and without speaker dependency

US 5,056,150 A
Filed: 11/08/1989
Issued: 10/08/1991
Est. Priority Date: 11/16/1988
Status: Expired due to Fees

First Claim

Patent Images

1. A speaker dependent and independent speech recognition method, based on a comparison between speech characteristic parameter frames and a plurality of reference samples which are generated from a speaker in dependent recognition or from a plurality of persons having representative sounds in independent recognition, comprising the steps of:

a) converting speech signals into a series of primitive sound spectrum parameter frames;

b) determining beginning and ending points of sounds of speech according to said primitive sound parameter frames, for determination of a sound spectrum parameter frame series;

c) performing non-linear time domain normalization on said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain, including;

forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;

d) performing amplitude quantizing normalization on said speech characteristic parameter frames obtained from step c);

e) comparing said speech characteristic parameter frame series of step d) with each of a plurality of reference samples, said plurality of reference samples having previously been subjected to amplitude quantizing normalization, for determining a reference sample having a closest match with said speech characteristic parameter frame series; and

f) determining a result of recognition according to said reference sample having said closest match.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for real time speech recognition with and without speaker dependency which includes the following steps. Converting the speech signals into a series of primitive sound spectrum parameter frames; detecting the beginning and ending of speech according to the primitive sound spectrum parameter frame, to determine the sound spectrum parameter frame series; performing non-linear time domain normalization on the sound spectrum parameter frame series using sound stimuli, to obtain speech characteristic parameter frame series with predefined lengths on the time domain; performing amplitude quantization normalization on the speech characteristic parameter frames; comparing the speech characteristic parameter frame series with the reference samples, to determine the reference sample which most closely matches the speech characteristic parameter frame series; and determining the recognition result according to the most closely matched reference sample.

Citations

18 Claims

1. A speaker dependent and independent speech recognition method, based on a comparison between speech characteristic parameter frames and a plurality of reference samples which are generated from a speaker in dependent recognition or from a plurality of persons having representative sounds in independent recognition, comprising the steps of:
- a) converting speech signals into a series of primitive sound spectrum parameter frames;
  
  b) determining beginning and ending points of sounds of speech according to said primitive sound parameter frames, for determination of a sound spectrum parameter frame series;
  
  c) performing non-linear time domain normalization on said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain, including;
  
  forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;
  
  d) performing amplitude quantizing normalization on said speech characteristic parameter frames obtained from step c);
  
  e) comparing said speech characteristic parameter frame series of step d) with each of a plurality of reference samples, said plurality of reference samples having previously been subjected to amplitude quantizing normalization, for determining a reference sample having a closest match with said speech characteristic parameter frame series; and
  
  f) determining a result of recognition according to said reference sample having said closest match.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech recognition method according to claim 1, further comprising the steps of calculating syllables of sound of the speech according to variation of said sound stimulus value, and determining a reference sample having said closest match with said speech characteristic parameter frame series with syllables being closest to said syllables of sound of the speech.
  - 3. The speech recognition method according to claim 1 in which, after said step of amplitude quantizing normalization of said plurality of reference samples for independent recognition, further comprising the steps of:
    - reducing said plurality of reference samples by a clustering process.
  - 4. The speech recognition method according to claim 1, wherein, said sound spectrum parameter is one of a sampling value and a frequency spectrum slope which is a difference of adjacent frequency components.
  - 5. The speech recognition method according to claim 4, wherein, said amplitude quantizing normalization of the speech characteristic parameter frame normalizes each frequency component value of said speech characteristic parameter frame to one bit;
    - when the sound spectrum parameter is the sampling value, the rule for quantizing normalization is;
      
      the frequency components less than a average frequency component value are normalized to 0, and otherwise they are normalized to 1;
      
      when the sound spectrum parameter is the frequency spectrum slope, the rule for quantizing normalization is;
      
      the frequency components less than 0 are normalized to 0, and otherwise they are normalized to 1.
  - 6. The speech recognition method according to claim 5, wherein in said step of determining a reference sample most closely matching with the speech characteristic parameter frame series, calculating a Hamming distance between each of the plurality of reference samples and the speech characteristic parameter frame series.
  - 7. The speech recognition method according to claim 6, wherein said step of determining a result of recognition includes finding a difference between a least distance and a second least distance, and when said difference is less than a specified threshold, then recognition is rejected, otherwise the result of recognition is accepted.
  - 8. The speech recognition method according to claim 7, wherein, said discretion is the difference of the second least distance and the least distance divided by the least distance, preferred value of said threshold is 0.1.
  - 9. The speech recognition method according to claim 1, wherein said non-linear time domain normalization comprises:
    - a) using a sound stimulus accumulator and setting it to 0;
      
      b) sequentially retrieving a next sound stimulus, and adding it to said sound stimulus accumulator;
      
      c) when the value of said sound stimulus accumulator is at least equal to an average sound stimulus, then the frame corresponding to this sound stimulus is selected, and the value of said sound stimulus accumulator is decreased by an amount equal to said average sound stimulus, and step b) is repeated;
      
      d) when said value of said sound stimulus accumulator is less than said average sound stimulus, then the frame corresponding to this sound stimulus is not selected, and step b) is repeated.

10. A speaker dependent and independent speech recognition apparatus based on a comparison between speech characteristic parameter frames and reference samples which are generated from one of a speaker in dependent recognition and from persons producing representative sounds in independent recognition, comprising:
- a) a speech parameter extracting means for converting speech signals into a series of primitive sound spectrum parameter frames;
  
  b) determining means for determining beginning and ending points of sounds of speech based on said series of primitive sound spectrum parameter frames, for obtaining a sound spectrum parameter frame series;
  
  c) time domain normalization means for normalizing said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain including;
  
  forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;
  
  d) quantizing normalization means for performing amplitude quantizing normalization on each frame of said speech characteristic parameter frame series;
  
  e) difference evaluation means for comparing said speech characteristic parameter frame series with said plurality of reference samples, for determining a reference sample having a closest match with said speech characteristic parameter frame series of equal length on the time domain; and
  
  f) judgement means for determining a result of recognition according to said reference sample having said closest match.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The speech recognition apparatus according to claim 10 further comprising:
    - number of syllables determining means for calculating syllables of sound of the speech according to variations of said sound stimulus, and sending the output to a difference evaluation means; and
      
      difference evaluation means for determining said reference sample having said closest match with said speech characteristic parameter frame series with syllables being closest to said syllables of sound of the speech.
  - 12. The speech recognition apparatus of claim 10, further comprising reference sample optimization means for reducing the plurality of reference samples by a clustering process in use of the speaker independent speech recognition.
  - 13. The speech recognition apparatus according to claim 10, wherein said speech parameter extraction means comprising at least:
    - a sound-electricity converting means;
      
      a high and low frequency raising means, the input of which being coupled with the output of said sound-electricity converting means;
      
      a plurality of filtering means for extracting different speech frequency components, the inputs of which being coupled with the output of said high and low frequency raising means; and
      
      an A/D converting means, the input of which being coupled with said plurality of filtering means, and the output of which being sent to said speech beginning and ending points determining means.
  - 14. The speech recognition apparatus according to claim 10, wherein said time domain normalization means performs the following steps:
    - a) setting a sound stimuli accumulator to have a value of zero;
      
      b) fetching sequentially a next sound stimulus, and adding it to said sound stimuli accumulator;
      
      c) if said accumulator is not less than the average sound stimulus, then the frame corresponding to this sound stimulus is selected, and the value of said sound stimulus accumulator is decreased by an amount equal to said average sound stimulus, and step b) is repeated;
      
      d) if said stimulus accumulator is less than said average sound stimulus, then the frame corresponding to this sound stimulus is not selected, and step b) is repeated.
  - 15. The speech recognition apparatus according to claim 10, wherein said quantizing normalization means normalizes each component value of said speech characteristic parameter frame to one bit.
  - 16. The speech recognition apparatus according to claim 15, wherein said difference evaluation means uses Hamming distance to calculate difference between said reference samples and said speech characteristic parameter frame series.
  - 17. The speech recognition apparatus according to claim 16, wherein said judgement means finds discretion between the found least distance and the second least distance, and rejects recognition when said discretion is less than a specified threshold, otherwise justifies recognition result.
  - 18. The speech recognition apparatus according to claim 17, wherein, said discretion is the difference between the second least distance and the least distance divided by the least distance, the preferred value of said threshold is 0.1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Institute of Acoustics
Original Assignee
Institute of Acoustics Academia Sinica
Inventors
Bi, Ning, Rong, Meiling, Yu, Tiecheng, Zhang, Enyao
Primary Examiner(s)
Harkcom, Gary V.

Application Number

US07/433,098
Time in Patent Office

699 Days
Field of Search

381/41-46, 364/513.5
US Class Current

704/248
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 25/87 Detection of discrete point...

Method and apparatus for real time speech recognition with and without speaker dependency

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for real time speech recognition with and without speaker dependency

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links