Personalized Speech Synthesis for Voice Actions

US 20160307569A1
Filed: 04/14/2015
Published: 10/20/2016
Est. Priority Date: 04/14/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving audio data encoding a voice query from a user;

obtaining, by an automated speech recognizer, a transcription of the voice query from the audio data, wherein the transcription includes a particular term;

determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term;

generating a spoken acknowledgment of the voice query, wherein the spoken acknowledgment includes the particular term, and wherein, when output, the particular term is spoken in accordance with the user'"'"'s pronunciation for the particular term based at least on the custom pronunciation data that was determined from the audio data;

providing the spoken acknowledgment for output; and

providing the voice query for execution.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining a template that defines (i) trigger criteria for presenting a notification type and (ii) content rules for determining content to include in a notification of the notification type. Additional actions include accessing enterprise resources of an enterprise, the enterprise resources including data describing entities related to the enterprise and relationships among the entities. Further actions include, accessing user information specific to a user and determining that the trigger criteria is satisfied by the enterprise resources and the user information. Additional actions include generating a particular notification of the notification type based at least on the content rules and providing the particular notification to the user.

29 Citations

View as Search Results

22 Claims

1. A computer-implemented method comprising:
- receiving audio data encoding a voice query from a user;
  
  obtaining, by an automated speech recognizer, a transcription of the voice query from the audio data, wherein the transcription includes a particular term;
  
  determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term;
  
  generating a spoken acknowledgment of the voice query, wherein the spoken acknowledgment includes the particular term, and wherein, when output, the particular term is spoken in accordance with the user'"'"'s pronunciation for the particular term based at least on the custom pronunciation data that was determined from the audio data;
  
  providing the spoken acknowledgment for output; and
  
  providing the voice query for execution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 12, 21, 22)
- - 2. The method of claim 1, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - identifying a portion of the audio corresponding to the particular term; and
      
      determining a sequence of phones from the portion of the audio corresponding to the particular term.
  - 3. The method of claim 1, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - obtaining custom pronunciation data from the audio data before obtaining the transcription of the voice query from the audio data; and
      
      mapping the custom pronunciation data from the audio data corresponding to the particular term to the particular term in the transcription.
  - 4. The method of claim 1, comprising:
    - determining that the transcription includes a proper name,wherein providing the spoken acknowledgment of the voice query for output is in response to determining that the transcription includes a proper name.
  - 5. The method of claim 4, wherein determining that the transcription includes a proper name comprises:
    - determining that the transcription includes one or more terms that indicate that the transcription includes a proper name.
  - 6. The method of claim 1, comprising:
    - determining, from the audio data, a confidence score for the custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term; and
      
      determining that the confidence score for the custom pronunciation data satisfies a confidence threshold,wherein providing the spoken acknowledgment of the voice query for output is in response to determining that the confidence score for the custom pronunciation data satisfies the confidence threshold.
  - 7. The method of claim 1, wherein obtaining a transcription of the voice query from the audio data comprises:
    - obtaining the transcription of the voice query from the audio data based at least on canonical pronunciation data associated with the particular term, where the canonical pronunciation data is stored in a pronunciation dictionary and different from the custom pronunciation data determined from the audio data encoding the voice query from the user.
  - 12. The system of claim 1, wherein determining that the transcription includes a proper name comprises:
    - determining that the transcription includes one or more terms that indicate that the transcription includes a proper name.
  - 21. The method of claim 1, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - obtaining pronunciations for phones within the voice query;
      
      after obtaining the transcription of the voice query by the automated speech recognizer, aligning at least a portion of the obtained pronunciations for the phones within the voice query with the term in the transcription; and
      
      generating the custom pronunciation data that reflects the user'"'"'s pronunciation for the particular term from the portion of the obtained pronunciations for the phones within the voice query aligned with the term in the transcription.
  - 22. The method of claim 1, wherein generating a spoken acknowledgment of the voice query comprises:
    - obtaining text that includes the particular term for the spoken acknowledgement; and
      
      synthesizing the spoken acknowledgement from the text based at least on (i) the custom pronunciation data for the particular term and (ii) canonical pronunciation data for one or more other terms in the text for the spoken acknowledgement.

8. A system comprising:
- one or more computers; and
  
  one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving audio data encoding a voice query from a user;
  
  obtaining, by an automated speech recognizer, a transcription of the voice query from the audio data, wherein the transcription includes a particular term;
  
  determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term;
  
  generating a spoken acknowledgment of the voice query, wherein the spoken acknowledgment includes the particular term, and wherein, when output, the particular term is spoken in accordance with the user'"'"'s pronunciation for the particular term based at least on the custom pronunciation data that was determined from the audio data;
  
  providing the spoken acknowledgment for output; and
  
  providing the voice query for execution.
- View Dependent Claims (9, 10, 11, 13, 14)
- - 9. The system of claim 8, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - identifying a portion of the audio corresponding to the particular term; and
      
      determining a sequence of phones from the portion of the audio corresponding to the particular term.
  - 10. The system of claim 8, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - obtaining custom pronunciation data from the audio data before obtaining the transcription of the voice query from the audio data; and
      
      mapping the custom pronunciation data from the audio data corresponding to the particular term to the particular term in the transcription.
  - 11. The system of claim 8, the instructions further comprising:
    - determining that the transcription includes a proper name,wherein providing the spoken acknowledgment of the voice query for output is in response to determining that the transcription includes a proper name.
  - 13. The system of claim 8, the instructions further comprising:
    - determining, from the audio data, a confidence score for the custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term; and
      
      determining that the confidence score for the custom pronunciation data satisfies a confidence threshold,wherein providing the spoken acknowledgment of the voice query for output is in response to determining that the confidence score for the custom pronunciation data satisfies the confidence threshold.
  - 14. The system of claim 8, wherein obtaining a transcription of the voice query from the audio data comprises:
    - obtaining the transcription of the voice query from the audio data based at least on canonical pronunciation data associated with the particular term, where the canonical pronunciation data is stored in a pronunciation dictionary and different from the custom pronunciation data determined from the audio data encoding the voice query from the user.

15. A non-transitory computer-readable medium storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving audio data encoding a voice query from a user;
  
  obtaining, by an automated speech recognizer, a transcription of the voice query from the audio data, wherein the transcription includes a particular term;
  
  determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term;
  
  generating a spoken acknowledgment of the voice query, wherein the spoken acknowledgment includes the particular term, and wherein, when output, the particular term is spoken in accordance with the user'"'"'s pronunciation for the particular term based at least on the custom pronunciation data that was determined from the audio data;
  
  providing the spoken acknowledgment for output; and
  
  providing the voice query for execution.
- View Dependent Claims (16, 17, 20)
- - 16. The medium of claim 15, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - identifying a portion of the audio corresponding to the particular term; and
      
      determining a sequence of phones from the portion of the audio corresponding to the particular term.
  - 17. The medium of claim 15, wherein determining, from the audio data, custom pronunciation data that reflects a user'"'"'s pronunciation for the particular term comprises:
    - obtaining custom pronunciation data from the audio data before obtaining the transcription of the voice query from the audio data; and
      
      mapping the custom pronunciation data from the audio data corresponding to the particular term to the particular term in the transcription.
  - 20. The medium of claim 15, the instructions further comprising:
    - determining, from the audio data, a confidence score for the custom pronunciation data associated with the particular term; and
      
      determining that the confidence score for the custom pronunciation data satisfies a confidence threshold,wherein providing the spoken acknowledgment of the voice query for output is in response to determining that the confidence score for the custom pronunciation data satisfies the confidence threshold.

18. (canceled)

19. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Huang, Fei, Peng, Fuchun, Foerster, Jakob Nicolaus, Casado, Diego Melendo, Beaufays, Francoise

Granted Patent

US 10,102,852 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 15/07   to the speaker

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/221   Announcement of recognition...

G10L 2015/225   Feedback of the input speech

Personalized Speech Synthesis for Voice Actions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Personalized Speech Synthesis for Voice Actions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links