Speaker independent speech recognition system and method

US 5,946,653 A
Filed: 10/01/1997
Issued: 08/31/1999
Est. Priority Date: 10/01/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating command models from a set of spoken commands, each spoken command being represented by a set of feature vectors determined from speech signals, the method comprising the steps of:

vectorily summing each feature vectors associated with each spoken command to create a single command vector for each spoken command;

summing each single command vector associated with each spoken command to create a command set vector;

scaling each single command vector inversely proportional to a number of said feature vectors of said set representing each spoken command; and

adding the command set vector to each single command vector to create a scaled single command vector for each spoken command to create an individual command model for each command, the individual command model being a single vector.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved method of training a SISRS uses less processing and memory resources by operating on vectors instead of matrices which represent spoken commands. Memory requirements are linearly proportional to the number of spoken commands for storing each command model. A spoken command is identified from the set of spoken commands by a command recognition procedure (200). The command recognition procedure (200) includes sampling the speaker'"'"'s speech, deriving cepstral coefficients and delta-cepstral coefficients, and performing a polynomial expansion on cepstral coefficients. The identified spoken command is selected using the dot product of the command model data and the average command structure representing the unidentified spoken command.

49 Citations

View as Search Results

16 Claims

1. A method of generating command models from a set of spoken commands, each spoken command being represented by a set of feature vectors determined from speech signals, the method comprising the steps of:
- vectorily summing each feature vectors associated with each spoken command to create a single command vector for each spoken command;
  
  summing each single command vector associated with each spoken command to create a command set vector;
  
  scaling each single command vector inversely proportional to a number of said feature vectors of said set representing each spoken command; and
  
  adding the command set vector to each single command vector to create a scaled single command vector for each spoken command to create an individual command model for each command, the individual command model being a single vector.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claimed in claim 1 further comprising the step of performing at least a third order polynomial expansion on the set of feature vectors to create a set of expanded feature vectors for each spoken command, andwherein the vectorily summing step includes the step of combining each expanded feature vector for each spoken command to create the single command vector for each spoken command.
  - 3. A method as claimed in claim 2mapping elements of the scaled single command vector for each spoken command to a command matrix, wherein the command matrix is determined by multiplying the scaled single command vector and a transpose of the scaled single command vector;
    - decomposing the command matrix for each spoken command to determine a decomposed matrix and a transpose of the decomposed matrix; and
      
      solving for a command model for each spoken command based on the decomposed matrix, the feature vectors, and at least some of the elements of the scaled single command vector associated therewith, each command model representing one of the set of spoken commands.
  - 4. A method as claimed in claim 1 further comprising the step of selecting commands spoken by a plurality of different individuals.
  - 5. A method as claimed in claim 1 further comprising the steps of:
    - sampling speech to create a speech sample representing each of said set of spoken commands;
      
      removing silence from each speech sample;
      
      creating a plurality of overlapping time-windows for said speech sample;
      
      extracting a feature vector for each overlapping time window; and
      
      vector quantizing each feature vector for each overlapping time window to produce said set of feature vectors for each spoken command.

6. A method of generating command models for a set of commands, each command being represented by a set of feature vectors, the method comprising the steps of:
- combining the set of feature vectors for each command to create a high order command structure vector for each command;
  
  summing each high order command structure vector to create a total command structure vector;
  
  adding the total command structure vector to a scaled version of each high order command structure vector to create a scaled individual command structure vector for each command;
  
  computing an individual command model for each command using the scaled individual command structure vector for each command and the set of feature vectors for each command; and
  
  identifying an unidentified spoken command, said unidentified spoken command being represented by a plurality of spoken feature vectors, the identifying step further comprising the steps of;
  
  averaging the plurality of spoken feature vectors to produce an average command structure for the unidentified spoken command;
  
  performing a dot product with said average command structure and each individual command model to create a set of score values, each score value being associated with one command of the set of commands; and
  
  selecting a command from said set of commands based on a score value.
- View Dependent Claims (7, 8, 9)
- - 7. A method as claimed in claim 6 further comprising the step of providing an instruction to perform an operation based on the command.
  - 8. A method as claimed in claim 6 wherein the step of identifying said unidentified spoken command further includes the step of performing a non-linear transform on each individual command model to produce non-linear transformed individual command models, and wherein the performing a dot product step includes the step of performing a dot product with said average command structure and each non-linear transformed individual command model to create said set of score values, each score value being associated with one command from said set of commands.
  - 9. A method as claimed in claim 8 further comprising the step of determining said plurality of spoken feature vectors, the step of determining said plurality of spoken feature vectors comprising the steps of:
    - sampling said unidentified spoken command to create a speech sample representing said unidentified spoken command;
      
      removing silence from the speech sample of the unidentified spoken command;
      
      creating a plurality of overlapping time-windows for said speech sample of the unidentified spoken command;
      
      extracting a feature vector for each overlapping time window of the unidentified spoken command; and
      
      vector quantizing each feature vector for each overlapping time window to produce said set of feature vectors for the unidentified spoken command.

10. A method of identifying an unidentified spoken command from a set of individual command models, said unidentified spoken command being represented by a plurality of spoken feature vectors, the method comprising the steps of:
- averaging the plurality of spoken feature vectors to produce an average command structure for the unidentified spoken command;
  
  performing a dot product with said average command structure and each individual command model to create a set of score values, each score value being associated with one command of a set of commands; and
  
  selecting a command from said set of commands based on a score value from said set of score values.
- View Dependent Claims (11, 12, 13)
- - 11. A method as claimed in claim 10 wherein the method of identifying an unidentified spoken command further includes the step of performing a non-linear transform on each individual command model to produce non-linear transformed individual command models, and wherein the performing a dot product step includes the step of performing a dot product with said average command structure and each non-linear transformed individual command model to create said set of score values, each score value of said set of score values being associated with one command from said set of commands.
  - 12. A method as claimed in claim 11 further comprising the step of generating each individual command model for each command of said set of commands, each command of said set of commands being represented by a set of feature vectors, the method comprising the steps of:
    - combining the set of feature vectors for each command to create a high order command structure vector for each command;
      
      summing each high order command structure vector to create a total command structure vector;
      
      adding the total command structure vector to a scaled version of each high order command structure vector to create a scaled individual command structure vector for each command; and
      
      computing each individual command model for each command using the scaled individual command structure vector for each command and the set of feature vectors for each command.
  - 13. A method as claimed in claim 12 wherein the generating step further comprises the step of mapping each scaled individual command structure vector to a matrix, and wherein the computing step includes the step of computing each individual command model for each command using a decomposed version of the matrix, a scaling factor and the set of feature vectors for the command.

14. A speech recognition system for identifying an unidentified spoken command from a set of individual command models, said unidentified spoken command being represented by a plurality of spoken feature vectors, the speech recognition system comprising:
- a command model memory for storing individual command models for a set of commands;
  
  a pattern classifier for averaging the plurality of spoken feature vectors to produce an average command structure for the unidentified spoken command, performing a dot product with said average command structure and each individual command model to create a set of score values, each score value being associated with a command of the set of commands; and
  
  a command selector for selecting one command from said set of commands based on a score value.
- View Dependent Claims (15, 16)
- - 15. A speech recognition system as claimed in claim 14 wherein the pattern classifier includes means for performing a non-linear transform on each individual command model to produce non-linear transformed individual command models, and means for performing said dot product with said average command structure and each non-linear transformed individual command model to create said set of score values, each score value being associated with one command of the set of commands.
  - 16. A speech recognition system as claimed in claim 15 further comprising a training processor for generating said individual command models for said set of commands, each command of said set of commands being represented by a set of feature vectors, the training processor including:
    - means for combining the set of feature vectors for each command to create a high order command structure vector for each command;
      
      means for summing each high order command structure vector to create a total command structure vector;
      
      means for adding the total command structure vector to a scaled version of each high order command structure vector to create a scaled individual command structure vector for each command; and
      
      means for computing an individual command model for each command using the scaled individual command structure vector for each command and the set of feature vectors for the command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Kleider, John Eric, Assaleh, Khaled, Gifford, Carl Steven, Campbell, William Michael, Broun, Charles Conway
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/942,211
Time in Patent Office

699 Days
Field of Search

704/254, 704/256, 704/243, 704/252, 704/222, 704/245, 704/234
US Class Current

704/243
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 2015/0631   Creating reference template...

Speaker independent speech recognition system and method

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Speaker independent speech recognition system and method

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others