ADAPTIVE TEXT-TO-SPEECH OUTPUTS

US 20170316774A1
Filed: 07/19/2017
Published: 11/02/2017
Est. Priority Date: 01/28/2016
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more computers, the method comprising:

receiving, by the one or more computers, context data from a client device of a user;

selecting, by the one or more computers, a user context corresponding to the context data from the client device, the user context being selected from among a plurality of user contexts;

determining, by the one or more computers, a text segment for text-to-speech synthesis by a text-to-speech module based on the selected user context;

generating, by the one or more computers, audio data comprising a synthesized utterance of the text segment using the text-to-speech module; and

providing, by the one or more computers and to the client device, the audio data comprising the synthesized utterance of the text segment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output.

Citations

20 Claims

1. A method performed by one or more computers, the method comprising:
- receiving, by the one or more computers, context data from a client device of a user;
  
  selecting, by the one or more computers, a user context corresponding to the context data from the client device, the user context being selected from among a plurality of user contexts;
  
  determining, by the one or more computers, a text segment for text-to-speech synthesis by a text-to-speech module based on the selected user context;
  
  generating, by the one or more computers, audio data comprising a synthesized utterance of the text segment using the text-to-speech module; and
  
  providing, by the one or more computers and to the client device, the audio data comprising the synthesized utterance of the text segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising receiving, by the one or more computers, data indicating a voice query detected by the client device;
    - wherein receiving the context data comprises receiving context data indicating a current context when the voice query was detected by the client device;
      
      wherein determining the text segment for text-to-speech synthesis comprises determining a text segment to provide as a response to the voice query, the text segment being determined based on the voice query and one or more scores associated with the selected user context; and
      
      wherein providing the audio data comprises providing the audio data to the client device for output as a response to the voice query.
  - 3. The method of claim 1, wherein determining the text segment comprises determining the text segment from one or more search results identified by a search engine in response to a query from the user.
  - 4. The method of claim 1, wherein receiving the context data comprises receiving data indicating a location, speed, or movement pattern of the client device;
    - wherein selecting the user context comprises selecting the user context based on the location, speed, or movement pattern of the client device indicated by the context data.
  - 5. The method of claim 1, wherein the context data indicates a GPS data indicating a current location associated with the user;
    - andwherein selecting the user context comprises selecting the user context based on the GPS data indicating the current location associated with the user.
  - 6. The method of claim 1, wherein the context data includes sensor data from a mobile device of the user;
    - andwherein selecting the user context comprises selecting the user context based on the sensor data from the mobile device of the user.
  - 7. The method of claim 1, wherein determining the text segment comprises selecting a text segment from among multiple text segments based on one or more scores associated with the selected user context.
  - 8. The method of claim 1, wherein determining the text segment comprises modifying a particular text segment for text-to-speech synthesis based on one or more scores associated with the selected user context.
  - 9. The method of claim 1, wherein the client device displays a mobile application that uses a text-to-speech interface.
  - 10. The method of claim 1, wherein determining the text segment for output by the text-to-speech module comprises:
    - identifying multiple text segments as candidates for a text-to-speech output to the user, the multiple text segments being associated with different contexts; and
      
      selecting from among the multiple text segments based at least on the selected user context.
  - 11. The method of claim 1, wherein the user context is selected based on one or more queries that were previously submitted by the user, data indicating a current task of the user, or an indication that the user failed to complete a task.

12. A system comprising:
- one or more computers; and
  
  a non-transitory computer-readable medium coupled to the one or more computers having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving, by the one or more computers, context data from a client device of a user;
  
  selecting, by the one or more computers, a user context corresponding to the context data from the client device, the user context being selected from among a plurality of user contexts;
  
  determining, by the one or more computers, a text segment for text-to-speech synthesis by a text-to-speech module based on the selected user context;
  
  generating, by the one or more computers, audio data comprising a synthesized utterance of the text segment using the text-to-speech module; and
  
  providing, by the one or more computers and to the client device, the audio data comprising the synthesized utterance of the text segment.
- View Dependent Claims (13, 14, 15)
- - 13. The system of claim 12, wherein:
    - the operations further comprise receiving, by the one or more computers, data indicating a voice query detected by the client device;
      
      receiving the context data comprises receiving context data indicating a current context when the voice query was detected by the client device;
      
      determining the text segment for text-to-speech synthesis comprises determining a text segment to provide as a response to the voice query, the text segment being determined based on the voice query and one or more scores associated with the selected user context; and
      
      providing the audio data comprises providing the audio data to the client device for output as a response to the voice query.
  - 14. The system of claim 12, wherein determining the text segment comprises determining the text segment from one or more search results identified by a search engine in response to a query from the user.
  - 15. The system of claim 12, wherein:
    - receiving the context data comprises receiving data indicating a location, speed, or movement pattern of the client device; and
      
      selecting the user context comprises selecting the user context based on the location, speed, or movement pattern of the client device indicated by the context data.

16. A non-transitory computer-readable storage device encoded with computer program instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- receiving, by the one or more computers, context data from a client device of a user;
  
  selecting, by the one or more computers, a user context corresponding to the context data from the client device, the user context being selected from among a plurality of user contexts;
  
  determining, by the one or more computers, a text segment for text-to-speech synthesis by a text-to-speech module based on the selected user context;
  
  generating, by the one or more computers, audio data comprising a synthesized utterance of the text segment using the text-to-speech module; and
  
  providing, by the one or more computers and to the client device, the audio data comprising the synthesized utterance of the text segment.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The non-transitory computer-readable storage device of claim 16, wherein:
    - the operations further comprise receiving, by the one or more computers, data indicating a voice query detected by the client device;
      
      receiving the context data comprises receiving context data indicating a current context when the voice query was detected by the client device;
      
      determining the text segment for text-to-speech synthesis comprises determining a text segment to provide as a response to the voice query, the text segment being determined based on the voice query and one or more scores associated with the selected user context; and
      
      providing the audio data comprises providing the audio data to the client device for output as a response to the voice query.
  - 18. The non-transitory computer-readable storage device of claim 16, wherein determining the text segment comprises determining the text segment from one or more search results identified by a search engine in response to a query from the user.
  - 19. The non-transitory computer-readable storage device of claim 16, wherein:
    - receiving the context data comprises receiving data indicating a location, speed, or movement pattern of the client device; and
      
      selecting the user context comprises selecting the user context based on the location, speed, or movement pattern of the client device indicated by the context data.
  - 20. The non-transitory computer-readable storage device of claim 16, wherein the user context is selected based on one or more queries that were previously submitted by the user, data indicating a current task of the user, or an indication that the user failed to complete a task.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sharifi, Matthew, Foerster, Jakob Nicolaus

Granted Patent

US 10,109,270 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/289   Phrasal analysis, e.g. fini...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links