Method and apparatus for transitioning from one voice recognition system to another

US 6,014,624 A
Filed: 04/18/1997
Issued: 01/11/2000
Est. Priority Date: 04/18/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of using a first set of speech characteristic information including one of a speech recognition template and a speech recognition model which was previously generated for use by a first speech recognition system, to generate a second set of speech characteristic information for use by a second speech recognition system, the method comprising the steps of:

generating, from the one of the speech recognition template and model included in the first set of speech characteristic information, additional speech characteristic information not included in the first set of speech characteristic information; and

combining the generated additional speech characteristic information, with at least some information obtained from the first set of speech characteristic information, to generate the second set of speech characteristic information.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Converting speech recognition templates or models from a first format to a second format improves the recognition rate achieved using the converted templates or models. Storing source and/or scoring information for templates or models so that converted models or templates can be scored differently than original models or templates reflects the effect the conversion process has on recognition scores. In order to enhance recognition results in one embodiment, an available compressed voice recording is used in the conversion process. The conversion process of the present invention is described using the conversion of dynamic time warping templates into Hidden Markov Models. Generating garbage models are also described. In one embodiment, a garbage model is generated dynamically at recognition time using a period of silence in the utterance upon which the recognition operation is to be performed as the source of the data required to generate the garbage model.

63 Citations

View as Search Results

30 Claims

1. A method of using a first set of speech characteristic information including one of a speech recognition template and a speech recognition model which was previously generated for use by a first speech recognition system, to generate a second set of speech characteristic information for use by a second speech recognition system, the method comprising the steps of:
- generating, from the one of the speech recognition template and model included in the first set of speech characteristic information, additional speech characteristic information not included in the first set of speech characteristic information; and
  
  combining the generated additional speech characteristic information, with at least some information obtained from the first set of speech characteristic information, to generate the second set of speech characteristic information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising the steps of:
    - generating an indicator of the source of the second set of speech characteristic information; and
      
      storing the second set of speech characteristic information and the source indicator in a database.
  - 3. The method of claim 2, further comprising the steps of:
    - generating an additional set of speech characteristic information and second weighting factor information which is different from the first weighting factor information; and
      
      storing the additional set of speech characteristic information and second weighting factor information in the database.
  - 4. The method of claim 1, further comprising the steps of:
    - generating first weighting factor information to be used when performing scoring as part of a speech recognition operation using the second set of speech characteristic information; and
      
      storing the generated weighting factor information and the second set of speech characteristic information in a database.
  - 5. The method of claim 1,wherein the first and second sets of speech characteristic information are sets of speaker dependent speech recognition information;
    - wherein at least a portion of the first set'"'"'s speech characteristic information represents speech characteristic values corresponding to a period of time; and
      
      wherein the step of generating additional speech characteristic information includes the steps of;
      
      i. calculating, from the first set'"'"'s speech characteristic information, changes over preselected speech time intervals for at least a portion of a first plurality of speech characteristic values included in the first set of speech characteristic information;
      
      ii. generating a second plurality of speech characteristic values, representing the calculated changes in the portion of the first set'"'"'s speech characteristic values.
  - 6. The method of claim 5, wherein the first set of speech characteristic information includes quantized data, the method further comprising the steps of:
    - performing an inverse quantization operation on said quantized data using a first set of quantization values; and
      
      performing a quantization operation on the second set of speech characteristic information using a second set of quantization values which is different than the first set of quantization values.
  - 7. The method of claim 6, wherein the first set of quantization values is different than a set of quantization values originally used to quantize the quantized data included in the first set of speech characteristic information.
  - 8. The method of claim 6, further comprising the step of:
    - generating a speaker dependent garbage model from the second set of speech characteristic information.
  - 9. The method of claim 1,wherein the first set of speech characteristic information includes energy coefficients;
    - andwherein the step of generating additional speech characteristic information includes the step of;
      
      calculating delta energy coefficients.
  - 10. The method of claim 1,wherein the first set of speech characteristic information includes cepstra coefficients;
    - andwherein the step of generating additional speech characteristic information includes the step of;
      
      calculating delta cepstra coefficients.
  - 11. The method of claim 10,wherein the first set of speech characteristic information further includes energy coefficients;
    - andwherein the step of generating additional speech characteristic information further includes the step of;
      
      calculating delta energy coefficients.
  - 12. The method of claim 11, further comprising the step of:
    - combining the generated delta cepstra, delta energy and cepstra coefficients to form a speech recognition model without including in said model the energy coefficients from the first set of speech characteristic information.
  - 13. The method of claim 11, wherein the first set of speech characteristic information is a dynamic time warping template and wherein the second set of speech characteristic information is a Hidden Markov Model.

14. A method of using a first set of speech characteristic information which was previously generated for use by a first speech recognition system, to generate a second set of speech characteristic information for use by a second speech recognition system, the first set of speech characteristic information representing a segment of audible speech, the method comprising the steps of:
- generating, from the first set of speech characteristic information, additional speech characteristic information not included in the first set of speech characteristic information;
  
  combining the generated additional speech characteristic information, with at least some information obtained from the first set of speech characteristic information, to generate the second set of speech characteristic information;
  
  decompressing a compressed voice recording;
  
  generating a third set of speech recognition characteristic information from the decompressed voice recording; and
  
  combining the second and third sets of speech characteristic information to generate a fourth set of speech characteristic information.

15. A method of using a first set of speech characteristic information which was previously generated for use by a first speech recognition system, to generate a second set of speech characteristic information and for using the second set of speech characteristic information, the first set of speech characteristic information representing a segment of audible speech, the method comprising the steps of:
- generating, from the first set of speech characteristic information, additional speech characteristic information not included in the first set of speech characteristic information;
  
  combining the generated additional speech characteristic information, with at least some information obtained from the first set of speech characteristic information, to generate the second set of speech characteristic information;
  
  generating an indicator of the source of the second set of speech characteristic information;
  
  storing the second set of speech characteristic information and the source indicator in a database;
  
  using the second set of speech characteristic information and generated speaker dependent garbage model to perform a speech recognition operation on speech provided by a user against the seed.
- View Dependent Claims (16, 17)
- - 16. The method of claim 15, further comprising the step of:
    - monitoring to detect indicia from the user that the outcome of the speech recognition operation was correct; and
      
      when indicia that the outcome of the speech recognition operation was correct is detected, updating the second set of speech characteristic information as a function of the speech provided by the user.
  - 17. The method of claim 16, wherein the method further comprises the step of:
    - receiving, via a telephone line, the speech provided by the user;
      
      dialing a telephone number and playing a recording to the user associated in a database with the recognized speech; and
      
      wherein the indicia of correct speech recognition is the user of the system allowing completion of the dialed call.

18. A method of converting a speech recognition template including a first set of speech characteristic data into a second speech recognition template including a second different set of speech characteristic data, the method comprising the steps of:
- processing the first set of speech characteristic data included in the first template to produce therefrom a first generated set of speech characteristic data;
  
  decompressing a compressed speech recording to generate decompressed speech;
  
  processing the decompressed speech to generate a third set of speech characteristic data;
  
  generating the second set of speech characteristic data included in the second template, from the first set of speech characteristic data, the generated first set of speech characteristic data, and the third set of speech characteristic data.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The method of claim 18, wherein the step of processing the first set of speech characteristic data includes the steps of:
    - performing in inverse quantization operation on the first set of speech characteristic data to generate a plurality of coefficient values therefrom; and
      
      generating delta coefficient values from said plurality of coefficient values.
  - 20. The method of claim 18, wherein the step of generating the second set of speech characteristic data includes the steps of;
    - generating a seed from the third set of speech characteristic data;
      
      generating a first feature set representation from a portion of the first set of data and the generated first set of data; and
      
      aligning the first feature set representation against the seed.
  - 21. The method of claim 20, wherein the quantization operation performed on the first set of speech characteristic data is not a direct inverse of a quantization operation used to generate the initially create the first set of speech characteristic data.
  - 22. The method of claim 20, further comprising the step of:
    - generating a speaker dependent garbage template from the second template.
  - 23. The method of claim 22,wherein the first template is a dynamic time warping template corresponding to a single utterance of speech;
    - andwherein the second template is a Hidden Markov Model corresponding to a plurality of speech utterances.
  - 24. The method of claim 23, further comprising the step of:
    - utilizing the second template to perform a speech recognition operation on an utterance provided by the user; and
      
      providing a service associated with the second template in a database, when it is determined that there is a match between the utterance provided by the user and the second template.
  - 25. The method of claim 24, further comprising the step of:
    - performing a model updating operating using the utterance provided by the user when it is determined that there is a match between the provided utterance and the second template.

26. An apparatus comprising:
- means for generating a second speaker dependent speech recognition template having a second format and second data content from a first speaker dependent speech template having a first format and a first data content, the first and second formats being different; and
  
  means for storing the second speaker dependent speech recognition template in a database.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The apparatus of claim 26,wherein said means for generating a second speaker dependent speech recognition template further includes:
    - means for processing the first speaker dependent speech recognition template to generate a first set of speech characteristic data;
      
      means for processing a third speaker dependent speech recognition template having the first format to generate a second set of speech characteristic data; and
      
      means for processing the first and second sets of speech characteristic data to generate the second speech template.
  - 28. The apparatus of claim 27, further comprising:
    - means for generating a speaker dependent garbage model from the second speaker dependent speech recognition template.
  - 29. The apparatus of claim 28, further comprising:
    - means for generating an indicator of the source of the second speaker dependent speech recognition template.
  - 30. The apparatus of claim 29, further comprising:
    - means for generating weighting factor information to be used during speech recognition operations associated with the second speaker dependent speech recognition template; and
      
      wherein the storing means includes means for storing the generated weighting factor information with the second speaker dependent speech recognition template in the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Nynex Science And Technology, Inc.
Inventors
Raman, Vijay R.
Primary Examiner(s)
Knepper, David D.

Application Number

US08/844,534
Time in Patent Office

998 Days
Field of Search

704/243-245, 704/246, 704/251, 704/241, 704/256
US Class Current

704/243
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/20 Speech recognition techniqu...

Method and apparatus for transitioning from one voice recognition system to another

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

63 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for transitioning from one voice recognition system to another

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

63 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links