Speech recognition using nonparametric speech models

US 6,224,636 B1
Filed: 02/28/1997
Issued: 05/01/2001
Est. Priority Date: 02/28/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of evaluating a speech sample using a computer, the method comprising:

collecting training observations, each training observation representing a single utterance by a single speaker;

partitioning the training observations into groups of related training observations;

receiving a speech sample; and

assessing a degree to which the speech sample resembles a group of training observations by evaluating the speech sample relative to particular training observations in the group of training observations.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The content of a speech sample is recognized using a computer system by evaluating the speech sample against a nonparametric set of training observations, for example, utterances from one or more human speakers. The content of the speech sample is recognized based on the evaluation results. The speech recognition process also may rely on a comparison between the speech sample and a parametric model of the training observations.

Citations

24 Claims

1. A method of evaluating a speech sample using a computer, the method comprising:
- collecting training observations, each training observation representing a single utterance by a single speaker;
  
  partitioning the training observations into groups of related training observations;
  
  receiving a speech sample; and
  
  assessing a degree to which the speech sample resembles a group of training observations by evaluating the speech sample relative to particular training observations in the group of training observations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 further comprising collecting utterances from a speaker, wherein the step of collecting training observations comprises collecting training observations from the collected utterances.
  - 3. The method of claim 2 in which the step of collecting utterances comprises sampling utterances from multiple speakers.
  - 4. The method of claim 1 in which evaluating the speech sample comprises measuring distances between a data point representing the speech sample and data points representing particular training observations in the group of training observations.
  - 5. The method of claim 1 in which evaluating the speech samDle comprises identifying a degree to which the group of training observations resembles the speech sample based on a proximity between particular training observations in the group of training observations and the speech sample.
  - 6. The method of claim 1 in which evaluating the speech sample comprises applying to the speech sample a variable bandwidth kernel density estimator function derived from the group of training observations.
  - 7. The method of claim 6 in which evaluating the speech sample comprises applying to the speech sample a k-th nearest neighbor density function derived from the training observations.
  - 8. The method of claim 1 further comprising establishing a speech model from the training observations and comparing the speech sample with the speech model.
  - 9. The method of claim 8 in which establishing a speech model comprises generating a statistical representation of the training observations in the form of a parametric model.
  - 10. The method of claim 8 in which assessing comprises assessing a degree to which the speech sample resembles a group of training observations based on the evaluation relative to the training observations and on the comparison to the speech model.
  - 11. The method of claim 10 in which the step of assessing comprises applying a weighting factor to a result of the evaluation relative to the training observations and to a result of the comparison to the speech model.

12. A computer-implemented method of recognizing content in a speech sample based on a multi-dimensional speech model derived from training observations, the method comprising:
- receiving a speech sample;
  
  identifying a portion of the speech model based on a comparison between the speech sample and the speech model;
  
  evaluating the speech sample against particular training observations on a subset of the training observations that corresponds to the identified portion of the speech model; and
  
  recognizing a content of the speech sample based on the evaluating.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The method of claim 12, further comprising deriving the multi-dimensional speech model by generating a statistical representation of the training observations.
  - 14. The method of claim 13 in which the generating comprises constructing a parametric model of the training observations.
  - 15. The method of claim 12 in which each portion of the speech model comprises a phoneme element.
  - 16. The method of claim 12 in which the identifying comprises:
17. The method of claim 16, wherein identifying a portion of the speech sample comprises designating at least one frame as corresponding to the identified Portion, and in which the recognizing comprises for each identified portion of the speech model:
- evaluating the at least one designated frame relative to each training observation for the identified portion of the speech model;
  
  modifying the score for the identified portion based on a result of the evaluation relative to training observations; and
  
  identifying the content of the speech sample as corresponding to the identified portion based on the modified score.
18. The method of claim 17 in which the modifying comprises smoothing the score using a weighting factor.

19. A speech recognition system comprising:
- an input device configured to receive a speech sample to be recognized;
  
  a stored nonparametric vocabulary representing utterances from one or more human speakers, the vocabulary including discrete training observations, each of which represents a single utterance by a single speaker; and
  
  a processor coupled to the input device and to the nonparametric vocabulary and configured to evaluate the speech sample against the nonparametric vocabulary.
- View Dependent Claims (20)
- - 20. The speech recognition system of claim 19 further comprising parametric acoustic models which comprise statistical representations of the utterances, the speech sample also being evaluated by the processor against the parametric acoustic models.

21. A computer program, residing on a computer readable medium, for a speech recognition system comprising a processor and an input device, the computer program comprising instructions to perform the following operations:
- evaluate a speech sample against a nonparametric speech model, the speech model including discrete training observations, each of which represents a single utterance by a single speaker; and
  
  recognize a speech content of the speech sample based on a result of the evaluation.
- View Dependent Claims (22, 23, 24)
- - 22. The computer program of claim 21 further comprising instructions to evaluate the input speech sample against a parametric speech model and to recognize the content of the input speech model also based on a result of the parametric evaluation.
  - 23. The computer program of claim 22 in which the parametric evaluation is performed prior to the nonparametric evaluation.
  - 24. The computer program of claim 23 in which the nonparametric evaluation comprises instructions to compare the input speech sample against a portion of the nonparametric speech model based on the result of the parametric evaluation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Wegmann, Steven A., Gillick, Laurence S.
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
Wieland, Susan

Application Number

US08/807,430
Time in Patent Office

1,523 Days
Field of Search

704/243, 704/244, 704/245, 704/246, 704/251, 704/255, 704/256, 704/241
US Class Current

704/246
CPC Class Codes

G10L 15/08 Speech classification or se...

Speech recognition using nonparametric speech models

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using nonparametric speech models

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links