Method and apparatus for obtaining transcriptions from multiple training utterances
First Claim
1. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:
- an input for receiving audio information derived from at least two utterances of the certain word;
processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including;
a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions;
b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions;
c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions;
d) means for selecting said certain transcription from said subset of transcriptions;
means for creating an entry in the speech recognition dictionary on a basis of said certain transcription.
9 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a method and an apparatus for adding a new entry to a speech recognition dictionary, more particularly to a system and method for generating transcriptions from multiple utterances of a given word. The novel method and apparatus automatically transcribes several training utterances into transcriptions without knowledge of the orthography of the word being added. It also provides a method and apparatus for transcribing multiple utterances into a single transcription that can be added to a speech recognition dictionary. In a first step, each utterance is analyzed individually to get their respective acoustic characteristics. Following this, these characteristics are combined to generate a set of the most likely transcriptions using the acoustic information obtained from each of the training utterances.
98 Citations
42 Claims
-
1. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:
-
an input for receiving audio information derived from at least two utterances of the certain word; processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including; a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions; b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions; c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions; d) means for selecting said certain transcription from said subset of transcriptions; means for creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for creating an entry associated with a certain word in a speech recognition dictionary, said apparatus comprising:
-
an input for receiving audio information derived from at least two utterances of the certain word; processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including; a) means for establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances; b) means for searching the compound scoring data structure to generate the certain transcription; means for creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for creating an entry associated with a certain word in a speech recognition dictionary, said method comprising the steps of:
-
receiving audio information derived from at least two utterances of the certain word; processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing step including the steps of; a) generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions; b) searching the sets of transcriptions for at least one transcription common to the sets of transcriptions; c) creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions; d) selecting said certain transcription from said subset of transcriptions; creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method for creating an entry associated with a certain word in a speech recognition dictionary, said method comprising the steps of:
-
receiving audio information derived from at least two utterances of the certain word; processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing step including the steps of; a) establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances; b) searching the compound scoring data structure to generate the certain transcription; creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A machine-readable storage medium containing a program element to direct a computer to create an entry associated with a certain word in a speech recognition dictionary, said program element implementing functional blocks, comprising:
-
an input for receiving audio information derived from at least two utterances of the certain word; processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including; a) means for generating a plurality of transcriptions for each utterance, the transcriptions associated with each utterance forming a set of transcriptions; b) means for searching the sets of transcriptions for at least one transcription common to the sets of transcriptions; c) means for creating a subset of transcriptions on the basis of said at least one transcription common to the sets of transcriptions; d) means for selecting said certain transcription from said subset of transcriptions; means for creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A machine-readable storage medium containing a program element to direct a computer to create an entry associated with a certain word in a speech recognition dictionary, said program element implementing functional blocks, comprising:
-
an input for receiving audio information derived from at least two utterances of the certain word; processing means for processing said audio information to provide a certain transcription of the certain word, said certain transcription being derived from acoustic information contained in each one of said two utterances, said processing means including; a) means for establishing a compound scoring data structure established on the basis of acoustic characteristics from each of said at least two utterances; b) means for searching the compound scoring data structure to generate the certain transcription; means for creating an entry in the speech recognition dictionary on a basis of said certain transcription. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
-
Specification