Voice recognition of proper names using text-derived recognition models

US 5,212,730 A
Filed: 07/01/1991
Issued: 05/18/1993
Est. Priority Date: 07/01/1991
Status: Expired due to Term

First Claim

Patent Images

1. A method of proper name recognition using text-derived recognition models to recognize spoken rendition of name-texts (i.e., names in textual form) that are susceptible to multiple pronunciations, where spoken name input (i.e., spoken rendition of a name-text) is from a person who does not necessarily know how to properly pronounce the name-text, comprising the steps:

entering name-text into a text database in which the database is accessed by designating name-text;

for each name-text in the text database, constructing a selected number of text-derived recognition models from the name-text, each text-derived recognition model representing at least one pronunciation of the name;

for each attempted access to the text database by a spoken name input, comparing the spoken name input with the text-derived recognition models; and

if such comparision yields a sufficiently close pattern match to one of the text-derived recognition models based on a decision rule, providing a name recognition response designating the name-text associated with such text-derived recognition model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A name recognition system (FIG. 1 )used to provide access to a database based on the voice recognition of a proper name spoken by a person who may not know the correct pronunciation of the name. During an enrollment phase (10), for each name-text entered (11) into a text database (12), text-derived recognition models (22) are created for each of a selected number of pronunciations of a name-text, with each recognition model being constructed from a respective sequence of phonetic features (15) generated by a Boltzmann machine (13). During a name recognition phase (20), the spoken input (24,25) of a name (by a person who may not know the correct pronunciation) is compared (26) with the recognition models (22) looking for a pattern match--selection of a corresponding name-text is made based on a decision rule (28).

128 Citations

16 Claims

1. A method of proper name recognition using text-derived recognition models to recognize spoken rendition of name-texts (i.e., names in textual form) that are susceptible to multiple pronunciations, where spoken name input (i.e., spoken rendition of a name-text) is from a person who does not necessarily know how to properly pronounce the name-text, comprising the steps:
- entering name-text into a text database in which the database is accessed by designating name-text;
  
  for each name-text in the text database, constructing a selected number of text-derived recognition models from the name-text, each text-derived recognition model representing at least one pronunciation of the name;
  
  for each attempted access to the text database by a spoken name input, comparing the spoken name input with the text-derived recognition models; and
  
  if such comparision yields a sufficiently close pattern match to one of the text-derived recognition models based on a decision rule, providing a name recognition response designating the name-text associated with such text-derived recognition model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The name recognition method of claim 1, wherein the step of constructing a selected number of text-derived recognition models is accomplished using a neural network.
  - 3. The name recognition method of claim 1, where in the step of constructing a selected number of recognition models comprises the substeps:
    - for each name in the text database, inputting the name-text into an appropriately trained Boltzmann machine for a selected number of input cycles, with the machine being placed in a random state prior to each input cycle;
      
      for each input cycle, generating a corresponding pronunciation representation sequence of at least one pronunciation for the name-text;
      
      when the input cycles are complete, constructing from the pronunciation representation sequences that are different at least one text-derived recognition model representing at least one pronunciation of the name-text.
  - 4. The method of proper name recognition using text-derived recognition models of claim 3, wherein the pronunciation representations are phonetic features.
  - 5. The method of proper name recognition using text-derived recognition models of claim 3, wherein said Boltzmann machine comprises:
    - small and large sliding-window subnetworks, each including a respective set of input units and a respective set of internal units; and
      
      a set of output units;
      
      said small sliding window subnetwork being composed of a smaller number of input units than said large sliding window subnetwork;
      
      such that, for each input cycle, the step of generating a corresponding pronunciation representation sequence is accomplished by moving the name-text through both windows simultaneously, with each letter in turn placed in a central position in the respective sets of input units.
  - 6. The method of proper name recognition using text-derived recognition models of claim 1, wherein the step of constructing a selected number of text-derived recognition models is accomplished using HMM modeling.
  - 7. The method of proper name recognition using text-derived recognition models of claim 6, wherein the step of constructing text-derived recognition models comprises the substeps of:
    - creating a phonetic model library of phonetic unit models representing phonetic units, where each phonetic unit model represents sets of expected acoustic features and durations for such features;
      
      for each pronunciation representation sequence generated by the Boltzmann machine, searching the phonetic model library for corresponding phonetic unit models; and
      
      if at least one corresponding phonetic unit model is found, selecting such phonetic unit model;
      
      otherwise, discarding such pronunciation representation sequence; and
      
      after all pronunciation representation sequences have been used to search the phonetic model library, constructing a corresponding text-derived recognition model using the selected phonetic unit models.
  - 8. The method of proper name recognition using text-derived recognition models of claim 7, wherein the substep of selecting such phonetic unit model comprises the substep;
    - if at least one phonetic unit model is found that corresponds to the pronunciation representation sequence with a predetermined degree of consistency, selecting such phonetic unit model.
  - 9. The method of proper name recognition using text derived recognition models of claim 1, wherein the step of providing a name recognition response designating the name-text associated with such text-derived recognition model is accomplished according to the decision-rule substeps of:
    - for each spoken name input, assigning first scores to the text-derived recognition models representing a likelihood that the spoken name input is an instance of that recognition model; and
      
      evaluating the first scores using a decision rule that selects as a name recognition response either;
      
      (a) a single name-text, (b) a set of N rank-ordered name-texts, or (c) no name.
  - 10. The method of proper name recognition using text-derived recognition models of claim 9, wherein the step of assigning first scores comprises the substeps of:
    - for each spoken name input, assigning name scores to all text-derived recognition models for all name texts representing of that likelihood that the spoken name input is an instance of that recognition model; and
      
      for names with multiple text-derived recognition models representing alternative pronunciations, assigning a single name score derived by selecting a single best name score associated with a text-derived recognition model which is most likely.
  - 11. The method of proper name recognition using text-derived recognition models of claim 10, wherein the step of evaluating the first scores using a decision rule is accomplished by:
    - comparing the best name score is then compared to the name score for a text-derived recognition model which is second most likely; and
      
      if the best name score exceeds a predetermined absolute score threshold, and if the difference between the best and second best name scores also exceeds a predetermined difference score threshold, then the name-text associated the text-derived recognition model having the best score is output as the name recognition response;
      
      otherwise, the name recognition response indicates no name-text.

12. A method of proper name recognition using text-derived recognition models to recognize spoken rendition of name-texts (i.e., names in textual form) that are susceptible to multiple pronunciations, where spoken name input (i.e., the spoken rendition of a name-text) is from a person who does not necessarily know how to properly pronounce the name-text, comprising the steps:
- entering name-text into a text database in which the database is accessed by designating name-text;
  
  for each name-text in the text database, inputting the name-text into an appropriately trained Boltzmann machine for a selected number of input cycles, with the machine being placed in a random state prior to each input cycle;
  
  for each input cycle, generating a corresponding phonetic feature sequence of at least one pronunciation for the name-text;
  
  when the input cycles are complete, constructing from the phonetic feature sequences that are different at least one text-derived recognition model representing at least one pronunciation of the name-text;
  
  for each attempted access to the text database by a spoken name input, comparing the spoken name input with the stored text-derived recognition models; and
  
  if such comparison yields a sufficiently close pattern match to one of the text-derived recognition models based on a decision rule, providing a name recognition response designating the name-text associated with such text-derived recognition model.
- View Dependent Claims (13, 14)
- - 13. The method of proper name recognition using text-derived recognition models of claim 12, wherein the step of constructing from the phonetic feature sequences that are different at least one text-derived recognition models comprises the substeps of:
    - creating a phonetic model library of phonetic unit models representing phonetic units, where each phonetic unit model represents sets of expected acoustic features and durations for such features;
      
      for each phonetic feature sequence generated by the Boltzmann machine, searching the phonetic model library for corresponding phonetic unit models; and
      
      if at least one corresponding phonetic unit model is found, selecting such phonetic unit model;
      
      otherwise, discarding such phonetic feature sequences; and
      
      after all phonetic feature sequences have been used to search the phonetic model library, constructing a corresponding text-derived recognition model using the selected phonetic unit models.
  - 14. The method of proper name recognition using text-derived recognition models of claim 12, wherein the step of providing a name recognition response designating the name-test associated with such text-derived recognition model is accomplished according to the decision-rule substeps of:
    - for each spoken name input, assigning first scores to the text-derived recognition models representing a likelihood that the spoken name input is an instance of that recognition model; and
      
      evaluating the first scores using a decision rule that selects as a name recognition response either;
      
      (a) a single name-text, (b) a set of N rank-ordered name-texts, or (c) no name.

15. A proper name recognition system using text-derived recognition models to recognize spoken rendition of name-texts (i.e., names in textual form) that are susceptible to multiple pronunciations, where spoken name input (i.e., the spoken rendition of a name-text) is from a person who does not necessarily know how to properly pronounce the name-text, comprising:
- a text database into which are entered name-texts, where the database is accessed by designating name-text;
  
  an appropriately trained Boltzmann machine responsive to the input of a name-text for generating a corresponding phonetic feature sequence of at least one pronunciation for the name-text;
  
  each name-text being input to said Boltzmann machine a selected number of input cycles, with the machine being placed in a random state prior to each input cycle;
  
  a text-derived recognition model generator for constructing, after the selected number of input cycles for a name-text is complete, from the phonetic feature sequences that are different at least one text-derived recognition model representing at least one pronunciation of the name-text;
  
  a name-text recognition engine for comparing, for each attempted access to the text database by a spoken name input, such spoken name input with the generated text-derived recognition models, and if such comparison yields a sufficiently close pattern match to one of the text-derived recognition models based on a decision rule, providing a name recognition response designating the name-text associated with such text-derived recognition model.
- View Dependent Claims (16)
- - 16. The proper name recognition system using text-derived recognition models of claim 15, further comprising:
    - a phonetic model library of phonetic unit models representing phonetic units, where each phonetic unit model represents sets of expected acoustic features and durations for such features;
      
      such that, for each phonetic feature sequence generated by the Boltzmann machine, said text-derived recognition model generator searches the phonetic model library for corresponding phonetic unit models, and if at least one corresponding phonetic unit model is found, selects such phonetic unit model, otherwise, it discards such phonetic feature sequence; and
      
      after said text-derived recognition model generator has so processed all phonetic feature sequences, it constructs a corresponding text-derived recognition model using the selected phonetic unit models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Picone, Joseph W., Wheatley, Barbara J.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Tung, Kee M.

Application Number

US07/724,299
Time in Patent Office

687 Days
Field of Search

381/41-43, 381/52
US Class Current

704/243
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/047   Probabilistic or stochastic...

G06N 3/08   Learning methods

G06N 7/01   Probabilistic graphical mod...

G10L 15/063   Training

Voice recognition of proper names using text-derived recognition models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

128 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition of proper names using text-derived recognition models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

128 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links