LEARNING PERSONALIZED ENTITY PRONUNCIATIONS
First Claim
1. A method comprising:
- receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun;
generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises;
providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun;
updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun;
receiving a subsequent utterance that includes the entity name; and
transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
58 Citations
20 Claims
-
1. A method comprising:
-
receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun; generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises; providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun; updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun; receiving a subsequent utterance that includes the entity name; and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun; generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises; providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun; updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun; receiving a subsequent utterance that includes the entity name; and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun; generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises; providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun; in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun; updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun; receiving a subsequent utterance that includes the entity name; and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification