Method and apparatus for obtaining transcriptions from multiple training utterances

US 5,983,177 A
Filed: 12/18/1997
Issued: 11/09/1999
Est. Priority Date: 12/18/1997
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:

an input for receiving audio information derived from at least two utterances of the certain word;

processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;

a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions;

b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions;

c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions;

d) means for selecting said certain transcription from said subset of transcriptions;

means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a method and an apparatus for adding a new entry to a speech recognition dictionary, more particularly to a system and method for generating transcriptions from multiple utterances of a given word. The novel method and apparatus automatically transcribes several training utterances into transcriptions without knowledge of the orthography of the word being added. It also provides a method and apparatus for transcribing multiple utterances into a single transcription that can be added to a speech recognition dictionary. In a first step, each utterance is analyzed individually to get their respective acoustic characteristics. Following this, these characteristics are combined to generate a set of the most likely transcriptions using the acoustic information obtained from each of the training utterances.

98 Citations

View as Search Results

42 Claims

1. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:
- an input for receiving audio information derived from at least two utterances of the certain word;
  
  processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;
  
  a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions;
  
  b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions;
  
  c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions;
  
  d) means for selecting said certain transcription from said subset of transcriptions;
  
  means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus as defined in claim 1, wherein said processing means includes means for computing a score for each common transcription.
  - 3. The apparatus as defined in claim 2, wherein said processing means computes a score for each transcription in a given set of transcriptions.
  - 4. The apparatus as defined in claim 3, wherein each score for a common transcription is computed at least in part on the basis of scores of the transcriptions in respective sets of transcriptions.
  - 5. The apparatus as defined in claim 4, wherein the score for a transcription in the subset of transcriptions is the sum of the scores of the transcription in respective sets of transcriptions.
  - 6. The apparatus as defined in claim 5, wherein said processing means includes means for classifying the common transcriptions on a basis of the scores of the common transcriptions.

7. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:
- an input for receiving audio information derived from at least two utterances of the certain word;
  
  processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;
  
  a) means for establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances;
  
  b) means for searching the compound scoring data structure to generate the certain transcription;
  
  means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. An apparatus as defined in claim 7, wherein said processing means includes means for establishing for each utterance a phonemic lattice, said compound scoring data structure being derived at least in part from the phonemic lattice.
  - 9. An apparatus as defined in claim 8, wherein said processing means includes means for merging the phonemic lattices of several utterances together to obtain a merged phonemic lattice, said merged phonemic lattice defining said compound scoring data structure.
  - 10. An apparatus as defined in claim 9, wherein said processing means includes means for searching the merged phonemic lattice to select the certain transcription.
  - 11. An apparatus as defined in claim 7, wherein said processing means includes means to establish a graph for each utterance, said compound scoring data structure being derived at least in part from the graph.
  - 12. An apparatus as defined in claim 11, wherein said processing means includes means for merging the graphs of several utterances together to obtain a merged graph, said merged graph defining said compound scoring data structure.
  - 13. An apparatus as defined in claim 12, wherein said processing means includes means for searching the merged graph for a certain probability path and selecting a transcription corresponding to said certain probability path as the certain transcription on the basis of which the entry in the speech recognition dictionary is created.
  - 14. An apparatus as defined in claim 13, wherein said certain probability path is the highest probability path.

15. A method for creating an entry associated with a certain word in a speech recognition dictionary, said method comprising the steps of:
- receiving audio information derived from at least two utterances of the certain word;
  
  processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing step including the steps of;
  
  a) generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions;
  
  b) searching the sets of transcriptions for at least one transcription common to the sets of transcriptions;
  
  c) creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions;
  
  d) selecting said certain transcription from said subset of transcriptions;
  
  creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method as defined in claim 15, comprising the step of computing a score for each common transcription.
  - 17. The method as defined in claim 16, comprising the step of computing a score for each transcription in a given set of transcriptions.
  - 18. The method as defined in claim 17, wherein each score for a common transcription is computed at least in part on the basis of scores of the transcriptions in respective sets of transcriptions.
  - 19. The method as defined in claim 18, wherein the score for a common transcription is the sum of the scores of the transcription in respective sets of transcriptions.
  - 20. The method as defined in claim 19, comprising the step of classifying the common transcriptions on a basis of the scores of the common transcriptions.

21. A method for creating an entry associated with a certain word in a speech recognition dictionary, said method comprising the steps of:
- receiving audio information derived from at least two utterances of the certain word;
  
  processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing step including the steps of;
  
  a) establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances;
  
  b) searching the compound scoring data structure to generate the certain transcription;
  
  creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
- - 22. The method as defined in claim 21, comprising the step of establishing for each utterance a phonemic lattice, said compound scoring data structure being derived at least in part from the phonemic lattice.
  - 23. The method as defined in claim 22, comprising the step of merging the phonemic lattices of several utterances together to form a merged phonemic lattice, said merged phonemic lattice defining said compound scoring data structure.
  - 24. The method as defined in claim 23, comprising the step of searching the merged phonemic lattice to generate a transcription representative of said at least two utterances.
  - 25. The method as defined in claim 21, comprising the step of establishing a graph for each utterance, said compound scoring data structure being derived at least in part from the graph.
  - 26. The method as defined in claim 25, comprising the step of merging the graphs of several utterances together to form a merged graph, said merged graph defining said compound scoring data structure.
  - 27. The method as defined in claim 26, comprising the step of searching the merged graph for a certain probability path and selecting a transcription corresponding to said certain probability path as the certain transcription on the basis of which the entry in the speech recognition dictionary is created.
  - 28. The method as defined in claim 27, wherein said certain probability path is the highest probability path.

29. A machine-readable storage medium containing a program element to direct a computer to create an entry associated with a certain word in a speech recognition dictionary, said program element implementing functional blocks, comprising:
- an input for receiving audio information derived from at least two utterances of the certain word;
  
  processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;
  
  a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions;
  
  b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions;
  
  c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions;
  
  d) means for selecting said certain transcription from said subset of transcriptions;
  
  means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (30, 31, 32, 33, 34)
- - 30. The machine-readable storage medium as defined in claim 29, wherein said processing means includes means for computing a score for each common transcription.
  - 31. The machine-readable storage medium as defined in claim 30, wherein said processing means computes a score for each transcription in a given set of transcriptions.
  - 32. The machine-readable storage medium as defined in claim 31, wherein each score for a common transcription is computed at least in part on the basis of scores of the transcriptions in respective sets of transcriptions.
  - 33. The machine-readable storage medium as defined in claim 32, wherein the score for a common transcription is the sum of the scores of the transcription in respective sets of transcriptions.
  - 34. The machine-readable storage medium as defined in claim 33 wherein said processing means includes means for classifying the common transcriptions on a basis of the scores of the common transcriptions.

35. A machine-readable storage medium containing a program element to direct a computer to create an entry associated with a certain word in a speech recognition dictionary, said program element implementing functional blocks, comprising:
- an input for receiving audio information derived from at least two utterances of the certain word;
  
  processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;
  
  a) means for establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances;
  
  b) means for searching the compound scoring data structure to generate the certain transcription;
  
  means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.
- View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
- - 36. The machine-readable storage medium as defined in claim 35, wherein said processing means includes means for establishing for each utterance a phonemic lattice, said compound scoring data structure being derived at least in part from the phonemic lattice.
  - 37. The machine-readable storage medium as defined in claim 36, wherein said processing means includes means for merging the phonemic lattices of several utterances together to form a merged phonemic lattice, said merged phonemic lattice defining said compound scoring data structure.
  - 38. The machine-readable storage medium as defined in claim 37, wherein said processing means includes means for searching the merged phonemic lattice to generate a transcription representative of said at least two utterances.
  - 39. The machine-readable storage medium as defined in claim 35, wherein said processing means includes means to establish a graph for each utterance, said compound scoring data structure being derived at least in part from the graph.
  - 40. The machine-readable storage medium as defined in claim 39, wherein said processing means includes means for merging the graphs of several utterances together to form a merged graph, said merged graph defining said compound scoring data structure.
  - 41. The machine-readable storage medium as defined in claim 40, wherein said processing means includes means for searching the merged graph for a certain probability path and selecting a transcription corresponding to said certain probability path as the certain transcription on the basis of which the entry in the speech recognition dictionary is created.
  - 42. The machine-readable storage medium as defined in claim 41, wherein said certain probability path is the highest probability path.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Clearinghouse LLC (RPX Corporation)
Original Assignee
Nortel Networks Corporation
Inventors
Stubley, Peter, Wu, Jianxiong, Dahan, Jean-Guy, Gupta, Vishwa
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/994,007
Time in Patent Office

691 Days
Field of Search

704/252, 704/245, 704/256, 704/244, 704/251, 704/243, 704/237, 704/255, 704/249
US Class Current

704/244
CPC Class Codes

G10L 15/06 Creation of reference templ...

G10L 15/187 Phonemic context, e.g. pron...

Method and apparatus for obtaining transcriptions from multiple training utterances

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

98 Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for obtaining transcriptions from multiple training utterances

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

98 Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links