Adaptive text-to-speech outputs

US 9,886,942 B2
Filed: 04/03/2017
Issued: 02/06/2018
Est. Priority Date: 01/28/2016
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more processing devices, the method comprising:

determining, by the one or more processing devices, a complexity level of a voice input to a device;

determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input;

generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and

providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output.

14 Citations

View as Search Results

24 Claims

1. A method performed by one or more processing devices, the method comprising:
- determining, by the one or more processing devices, a complexity level of a voice input to a device;
  
  determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input;
  
  generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and
  
  providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the complexity level of the voice input comprises a language complexity of the voice input.
  - 3. The method of claim 1, wherein:
    - the method further comprising determining a language proficiency of a user that submitted the voice input to the device; and
      
      the complexity level of the voice input to the device is determined based on the language proficiency of the user that submitted the voice input to the device.
  - 4. The method of claim 1, wherein determining the message for output comprises:
    - obtaining a baseline message for output in response to the voice input; and
      
      generating an adjusted message by increasing a complexity level of the baseline message based on the determined complexity level for the voice input to the device.
  - 5. The method of claim 1, wherein determining the message for output comprises:
    - obtaining a baseline message for output in response to the voice input; and
      
      generating an adjusted message by decreasing a complexity level of the baseline message based on the determined complexity level for the voice input to the device.
  - 6. The method of claim 1, wherein the device displays a mobile application that uses a text-to-speech interface.

7. A system comprising:
- one or more processing devices; and
  
  one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to perform operations comprising;
  
  determining, by the one or more processing devices, a complexity level of a voice input to a device;
  
  determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input;
  
  generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and
  
  providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the complexity level of the voice input comprises a language complexity of the voice input.
  - 9. The system of claim 7, wherein:
    - the operations further comprise determining a language proficiency of a user that submitted the voice input to the device; and
      
      the complexity level of the voice input to the device is determined based on the language proficiency of the user that submitted the voice input to the device.
  - 10. The system of claim 7, wherein determining the message for output comprises:
    - obtaining a baseline message for output in response to the voice input; and
      
      generating an adjusted message by increasing a complexity level of the baseline message based on the determined complexity level for the voice input to the device.
  - 11. The system of claim 7, wherein determining the message for output comprises:
    - obtaining a baseline message for output in response to the voice input; and
      
      generating an adjusted message by decreasing a complexity level of the baseline message based on the determined complexity level for the voice input to the device.
  - 12. The system of claim 7, wherein the device displays a mobile application that uses a text-to-speech interface.

13. A method performed by one or more computers, the method comprising:
- obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user;
  
  determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score;
  
  generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and
  
  providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13, wherein the by the one or more computers display a mobile application that uses a text-to-speech interface.
  - 15. The method of claim 13, wherein determining the first language proficiency of the first user and the second language proficiency score for the second user comprises inferring a respective language proficiency of the first user and second user based at least on respective previous queries submitted by the first user and the second users.
  - 16. The method of claim 13, wherein generating the first audio data comprises:
    - identifying a text segment for a text-to-speech output to the first user;
      
      computing a complexity score of the text segment; and
      
      modifying the text segment for the text-to-speech output to the first user based at least on the first language proficiency score of the first user and the complexity score of the text segment for the text-to-speech output.
  - 17. The method of claim 16, wherein modifying the text segment for the text-to-speech output to the first user comprises:
    - determining an overall complexity score for the first user based at least on the first language proficiency score of the first user;
      
      determining a complexity score for individual portions within the text segment for the text-to-speech output to the first user;
      
      identifying one or more individual portions within the text segment with complexity scores greater than the overall complexity score for the first user; and
      
      modifying the one or more individual portions within the text segment to reduce complexity scores below the overall complexity score.
  - 18. The method of claim 16, wherein modifying the text segment for the text-to-text-to-speech output to the first user comprises:
    - receiving data indicating a context associated with the first user;
      
      determining an overall complexity score for the context associated with the first user;
      
      determining that the complexity score of the text segment exceeds the overall complexity score for the context associated with the first user; and
      
      modifying the text segment to reduce the complexity score below the overall complexity score for the context associated with the first user.

19. A system comprising:
- one or more computers; and
  
  one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user;
  
  determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score;
  
  generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and
  
  providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input.
- View Dependent Claims (20, 21, 22)
- - 20. The system of claim 19, wherein the one or more computers are configured to display a mobile application that uses a text-to-speech interface.
  - 21. The system of claim 19, wherein determining the first language proficiency of the first user and the second language proficiency score for the second user comprises inferring a respective language proficiency of the first user and second user based at least on respective previous queries submitted by the first user and the second users.
  - 22. The system of claim 19, wherein generating the first audio data comprises:
    - identifying a text segment for a text-to-speech output to the first user;
      
      computing a complexity score of the text segment; and
      
      modifying the text segment for the text-to-speech output to the first user based at least on the first language proficiency score of the first user and the complexity score of the text segment for the text-to-speech output.

23. One or more non-transitory computer-readable media storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations comprising:
- determining, by the one or more processing devices, a complexity level of a voice input to a device;
  
  determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input;
  
  generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and
  
  providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.

24. One or more non-transitory computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user;
  
  determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score;
  
  generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and
  
  providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sharifi, Matthew, Foerster, Jakob Nicolaus
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US15/477,360
Publication Number

US 20170221472A1
Time in Patent Office

309 Days
Field of Search

704257-275
US Class Current
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/289   Phrasal analysis, e.g. fini...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

Adaptive text-to-speech outputs

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive text-to-speech outputs

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links