Speaker independent speech recognition process

US 4,975,959 A
Filed: 03/08/1989
Issued: 12/04/1990
Est. Priority Date: 11/08/1983
Status: Expired due to Term

First Claim

Patent Images

1. A speaker independent speech recognition method comprising:

analyzing an input analog speech signal;

dividing the analyzed speech signal into phonetic units;

comparing said phonetic units of the analyzed speech signal with a plurality of reference templates as stored in a phoneme dictionary, wherein each reference template is representative of at least a portion of a phoneme and is prepared in a training phase by dividing an acoustical space representing phonetic units spoken during training into domains, each of the domains of the acoustical space representing a plurality of phonetic units;

providing phonetic distribution tables associated with each of said reference templates stored in said phoneme dictionary as frequency tables, the probability of a particular phonetic unit being included in a domain being defined according to said frequency tables;

comparing a sequence of phonetic units of the analyzed speech signal with a plurality of words stored in a word lexicon in a phonetic form in accordance with said frequency tables; and

recognizing a particular word of the speech to be recognized as corresponding to a word stored in said word lexicon and having the maximum probability of its constituent phonetic units according to said frequency tables.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to this process, a speech signal is analyzed in a vector quantizer (1) in which the acoustic parameters are calculated for each interval of time of a predetermined value and are compared with each spectral reference template contained in a reference template dictionary (2) utilizing a distance calculation. The sequence obtained at the output of the vector quantizer (1) is then compared with each of the words stored in a word lexicon (5) in a phonetic form utilizing phonetic distribution tables (3) associated with each template. A particular word of the speech to be recognized is then recognized as corresponding to a word stored in the lexicon having the maximum probability of its constituent phonetic units according to the phonetic distribution tables.

57 Citations

View as Search Results

10 Claims

1. A speaker independent speech recognition method comprising:
- analyzing an input analog speech signal;
  
  dividing the analyzed speech signal into phonetic units;
  
  comparing said phonetic units of the analyzed speech signal with a plurality of reference templates as stored in a phoneme dictionary, wherein each reference template is representative of at least a portion of a phoneme and is prepared in a training phase by dividing an acoustical space representing phonetic units spoken during training into domains, each of the domains of the acoustical space representing a plurality of phonetic units;
  
  providing phonetic distribution tables associated with each of said reference templates stored in said phoneme dictionary as frequency tables, the probability of a particular phonetic unit being included in a domain being defined according to said frequency tables;
  
  comparing a sequence of phonetic units of the analyzed speech signal with a plurality of words stored in a word lexicon in a phonetic form in accordance with said frequency tables; and
  
  recognizing a particular word of the speech to be recognized as corresponding to a word stored in said word lexicon and having the maximum probability of its constituent phonetic units according to said frequency tables.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A speaker independent speech recognition method as set forth in claim 1, wherein the maximum distance utilized for associating a point with a domain included in an acoustical space and consequently to a certain number of phonetic units is restricted to as short a distance as necessary by the choice of the number of sufficiently large domains.
  - 3. A speaker independent speech recognition method as set forth in claim 1 wherein the input analog speech signal is analyzed in a vector quantizer to provide acoustic parameters, and the acoustic parameters are calculated for each interval of time of a predetermined value and are compared with each reference template stored in said phoneme dictionary utilizing a distance calculation.
  - 4. A speaker independent speech recognition method as set forth in claim 3, wherein said distance calculation is a calculation of Euclidean distance.
  - 5. A speaker independent speech recognition method as set forth in claim 3, wherein the calculation of the probability according to said frequency tables of correspondence between the sequence of acoustic parameters obtained at the output of the vector quantizer and each word stored in said word lexicon is accomplished by dynamic programming.
  - 6. A speaker independent speech recognition method as set forth in claim 1, wherein said phoneme dictionary is derived by selecting spectral templates in a training set of spectral templates such that the distance from their closest neighbor is of larger magnitude than a threshold value;
    - grouping the spectral templates of the training set into classes as a function of their nearest neighbor in the training set of spectral templates; and
      
      providing said phoneme dictionary by inserting into a proposed dictionary of spectral templates the center of gravity of each class of spectral templates from the training set of spectral templates as a respective reference template; and
      
      repeating the sequence of steps beginning with the selection of spectral templates in a training set, grouping the spectral templates of the training set into classes, and inserting into the proposed dictionary of spectral templates the center of gravity of each class of spectral templates until the average distance between respective spectral templates closest to each other is less than a certain threshold distance or until the variation in the average distance becomes lower than a reference value of low magnitude.
  - 7. A speaker independent speech recognition method as set forth in claim 1, wherein the training phase includes creating an analysis index and a marking index from the speech of a training set of words as spoken by a predetermined number of speakers having different accents and tones as converted into digital speech signals;
    - andcreating frequency tables from the analysis index and the marking index.
  - 8. A speaker independent speech recognition method as set forth in claim 7, further including coding the digital speech signals obtained from the conversion of the speech of the training set;
    - andanalyzing the digital signals by linear prediction analysis to provide speech parameters comprising the contents of the analysis index; and
      
      submitting the speech parameters stored in said analysis index to a phonetic marking operation to form the speech data stored in said marking index.
  - 9. A speaker independent speech recognition method as set forth in claim 8, wherein said phonetic marking operation is achieved by utilizing a sound emission monitor and a spectral and temporal graphic representation of the digital speech signal from the training set simultaneously so as to determine the limits of the stable portions of the phonemes.
  - 10. A speaker independent speech recognition method as set forth in claim 7, wherein the frequency tables are provided by performing optimal selection of the spectral templates from the contents of the analysis and marking indexes;
    - placing the optimally selected spectral templates in a dictionary index; and
      
      calculating the frequencies to be included in the frequency tables from the contents of the dictionary index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Benbassat, Gerard V.
Primary Examiner(s)
Harkcom, Gary V.

Application Number

US07/320,841
Time in Patent Office

636 Days
Field of Search

381/41-43, 381/39, 381/48
US Class Current

704/240
CPC Class Codes

G10L 15/07 to the speaker

G10L 19/038 Vector quantisation, e.g. T...

Speaker independent speech recognition process

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker independent speech recognition process

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links