LEARNING PERSONALIZED ENTITY PRONUNCIATIONS

US 20170221475A1
Filed: 02/03/2016
Published: 08/03/2017
Est. Priority Date: 02/03/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun;

generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun;

in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises;

providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;

receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;

in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun;

updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun;

receiving a subsequent utterance that includes the entity name; and

transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.

58 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun;
  
  generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises;
  
  providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun;
  
  updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun;
  
  receiving a subsequent utterance that includes the entity name; and
  
  transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein receiving the corrected transcription comprises:
    - receiving data indicative of a selection of an entity name from a display of one or multiple entity names in response to a prompt;
      
      orreceiving data indicative of one or multiple characters that were input via a keypad and are indicative of an entity name in response to the prompt.
  - 3. The method of claim 1, wherein updating a pronunciation dictionary further comprises:
    - identifying a pronunciation dictionary entry that is associated with the entity name;
      
      deleting the portion of the entry that corresponds to a phonetic representation of the initial transcription; and
      
      storing, in the pronunciation dictionary entry that is associated with the entity name, the phonetic representation that is associated with the obtained phonetic representation.
  - 4. The method of claim 1, further comprising:
    - associating a time stamp with at least a portion of the received audio data; and
      
      caching one or more portions of the received audio data until a correct transcription of the utterance is identified and the command associated with the received utterance is completed.
  - 5. The method of claim 4, wherein obtaining a phonetic representation that is associated with the manually selected term comprises:
    - obtaining a portion of the most recently received audio data based on the timestamp associated with at least a portion of the received audio data; and
      
      generating a phonetic representation of the obtained portion of the most recently received audio data based on a set of phonemes obtained using an acoustic model.
  - 6. The method of claim 1, further comprising:
    - in response to updating a pronunciation dictionary to include the obtained phonetic representation, increasing a global counter associated with the phonetic representation.
  - 7. The method of claim 6, further comprising:
    - determining that the global counter associated with the phonetic representation satisfies a predetermined threshold; and
      
      in response to determining that the global counter associated with the phonetic pronunciation has exceeded a predetermined threshold, updating a pronunciation dictionary entry in a global pronunciation dictionary that is associated with entity name to include the phonetic representation associated with the correct transcription.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun;
  
  generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises;
  
  providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun;
  
  updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun;
  
  receiving a subsequent utterance that includes the entity name; and
  
  transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein receiving the corrected transcription comprises:
    - receiving data indicative of a selection of an entity name from a display of one or multiple entity names in response to a prompt;
      
      orreceiving data indicative of one or multiple characters that were input via a keypad and are indicative of an entity name in response to the prompt.
  - 10. The system of claim 8, wherein updating a pronunciation dictionary further comprises:
    - identifying a pronunciation dictionary entry that is associated with the entity name;
      
      deleting the portion of the entry that corresponds to a phonetic representation of the initial transcription; and
      
      storing, in the pronunciation dictionary entry that is associated with the entity name, the phonetic representation that is associated with the obtained phonetic representation.
  - 11. The system of claim 8, wherein the operations further comprise:
    - associating a time stamp with at least a portion of the received audio data; and
      
      caching one or more portions of the received audio data until a correct transcription of the utterance is identified and the command associated with the received utterance is completed.
  - 12. The system of claim 8, wherein obtaining a phonetic representation that is associated with the manually selected term comprises:
    - obtaining a portion of the most recently received audio data based on the timestamp associated with at least a portion of the received audio data; and
      
      generating a phonetic representation of the obtained portion of the most recently received audio data based on a set of phonemes obtained using an acoustic model.
  - 13. The system of claim 8, wherein the operations further comprise:
    - in response to updating a pronunciation dictionary to include the obtained phonetic representation, increasing a global counter associated with the phonetic representation.
  - 14. The system of claim 13, wherein the operations further comprise:
    - determining that the global counter associated with the phonetic representation satisfies a predetermined threshold; and
      
      in response to determining that the global counter associated with the phonetic pronunciation has exceeded a predetermined threshold, updating a pronunciation dictionary entry in a global pronunciation dictionary that is associated with entity name to include the phonetic representation associated with the correct transcription.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving audio data corresponding to an utterance that includes a voice command trigger term and an entity name that is a proper noun;
  
  generating, by an automated speech recognizer, an initial transcription that (i) corresponds to a portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to the generation of the initial transcription that includes a transcription of a mispronounced term that is associated with a pronunciation of a term that is not a proper noun, prompting a user for feedback, wherein prompting the user for feedback comprises;
  
  providing, for output, a representation of the initial transcription that (i) corresponds to the portion of the audio data that is associated with the entity name that is a proper noun, and (ii) includes the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  receiving a corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun;
  
  in response to receiving the corrected transcription in which a manually selected term that is a proper noun is substituted for the transcription of the mispronounced term that is associated with a pronunciation of a term that is not a proper noun, obtaining a phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun;
  
  updating a pronunciation dictionary to associate (i) the obtained phonetic representation that is associated with the portion of the received audio data that is associated with the entity name that is a proper noun with (ii) the entity name from the utterance that is a proper noun;
  
  receiving a subsequent utterance that includes the entity name; and
  
  transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable medium of claim 15, wherein updating a pronunciation dictionary further comprises:
    - identifying a pronunciation dictionary entry that is associated with the entity name;
      
      deleting the portion of the entry that corresponds to a phonetic representation of the initial transcription; and
      
      storing, in the pronunciation dictionary entry that is associated with the entity name, the phonetic representation that is associated with the obtained phonetic representation.
  - 17. The computer-readable medium of claim 15, wherein the operations further comprise:
    - associating a time stamp with at least a portion of the received audio data; and
      
      caching one or more portions of the received audio data until a correct transcription of the utterance is identified and the command associated with the received utterance is completed.
  - 18. The computer-readable medium of claim 15, wherein obtaining a phonetic representation that is associated with the manually selected term comprises:
    - obtaining a portion of the most recently received audio data based on the timestamp associated with at least a portion of the received audio data; and
      
      generating a phonetic representation of the obtained portion of the most recently received audio data based on a set of phonemes obtained using an acoustic model.
  - 19. The computer-readable medium of claim 15, wherein the operations further comprise:
    - in response to updating a pronunciation dictionary to include the obtained phonetic representation, increasing a global counter associated with the phonetic representation.
  - 20. The computer-readable medium of claim 15, wherein the operations further comprise:
    - determining that the global counter associated with the phonetic representation satisfies a predetermined threshold; and
      
      in response to determining that the global counter associated with the phonetic pronunciation has exceeded a predetermined threshold, updating a pronunciation dictionary entry in a global pronunciation dictionary that is associated with entity name to include the phonetic representation associated with the correct transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Peng, Fuchun, Beaufays, Francoise, Bruguier, Antoine Jean

Granted Patent

US 10,152,965 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/26   Speech to text systems G10L...

G10L 2015/0635   updating or merging of old ...

G10L 2015/0636   Threshold criteria for the ...

LEARNING PERSONALIZED ENTITY PRONUNCIATIONS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

58 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

LEARNING PERSONALIZED ENTITY PRONUNCIATIONS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

58 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others