ADAPTIVE TEXT-TO-SPEECH OUTPUTS

US 20190019501A1
Filed: 09/19/2018
Published: 01/17/2019
Est. Priority Date: 01/28/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, by data processing hardware, a user context of a user of a client device, the user context indicating a level of complexity of speech that the user is likely able to comprehend;

determining, by the data processing hardware, a particular text segment for text-to-speech output to the user, the particular text segment having a complexity score indicating a corresponding level of complexity associated with the particular text segment;

modifying, by the data processing hardware, the particular text segment for the text-to-speech output to the user based on the complexity score of the particular text segment and the selected user context;

generating, by the data processing hardware, audio data comprising a synthesized utterance of the modified particular text segment; and

providing, by the data processing hardware, the audio data comprising the synthesized utterance of the modified particular text segment to the client device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output.

0 Citations

22 Claims

1. A method comprising:
- determining, by data processing hardware, a user context of a user of a client device, the user context indicating a level of complexity of speech that the user is likely able to comprehend;
  
  determining, by the data processing hardware, a particular text segment for text-to-speech output to the user, the particular text segment having a complexity score indicating a corresponding level of complexity associated with the particular text segment;
  
  modifying, by the data processing hardware, the particular text segment for the text-to-speech output to the user based on the complexity score of the particular text segment and the selected user context;
  
  generating, by the data processing hardware, audio data comprising a synthesized utterance of the modified particular text segment; and
  
  providing, by the data processing hardware, the audio data comprising the synthesized utterance of the modified particular text segment to the client device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising, prior to determining the particular text segment for text-to-speech output to the user, receiving, at the data processing hardware, data indicating a voice query detected by the client device,wherein determining the particular text segment for text-to-speech output comprises generating the particular text segment as a response to the voice query, andwherein providing the audio data comprises providing the audio data to the client device for output as the response to the voice query.
  - 3. The method of claim 1, wherein determining the particular text segment for text-to-speech output to the user comprises generating the particular text segment from one or more search results identified by a search engine in response to a query from the user.
  - 4. The method of claim 1, wherein modifying the particular text segment for the text-to-speech output to the user comprises:
    - parsing the particular text segment into individual portions;
      
      determining a corresponding level of complexity associated with each individual portion within the particular text segment;
      
      identifying one or more individual portions within the particular text segment having levels of complexity that exceed the level of complexity of speech indicated by the selected user context; and
      
      modifying the identified one or more individual portions within the particular text segment to reduce the corresponding levels of complexity associated with the individual portions within the particular text segment below the level of complexity of speech indicated by the selected user context.
  - 5. The method of claim 1, wherein modifying the particular text segment for the text-to-speech output to the user comprises:
    - determining that the corresponding level of complexity associated with the particular text segment exceeds the level of complexity of speech indicated by the selected user context; and
      
      modifying the particular text segment to reduce the corresponding level of complexity associated with the particular text segment below the level of complexity of speech indicated by the selected user context.
  - 6. The method of claim 1, wherein:
    - receiving the context data comprises receiving data indicating a location, speed, or movement pattern of the client device; and
      
      selecting the user context comprises selecting the user context based on the location, speed, or movement pattern of the client device indicated by the context data.
  - 7. The method of claim 1, wherein:
    - the context data indicates GPS data indicating a current location associated with the user; and
      
      selecting the user context comprises selecting the user context based on the GPS data indicating the current location associated with the user.
  - 8. The method of claim 1, wherein:
    - the context data includes sensor data from the client device of the user; and
      
      selecting the user context comprises selecting the user context based on the sensor data from the mobile device of the user.
  - 9. The method of claim 1, wherein the client device displays a mobile application that uses a text-to-speech interface.
  - 10. The method of claim 1, further comprising:
    - receiving, at the data processing hardware, context data from the client device of the user; and
      
      selecting, by the data processing hardware, the user context corresponding to the context data from the client device, the user context selected from among a plurality of user contexts and indicating the level of complexity of speech that the user is likely able to comprehend at a given time when the context data was received.
  - 11. The method of claim 1, wherein determining the user context comprises inferring a language proficiency of the user based at least on previous queries submitted by the user.

12. A system comprising:
- data processing hardware; and
  
  memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising;
  
  determining a user context of a user of a client device, the user context indicating a level of complexity of speech that the user is likely able to comprehend;
  
  determining a particular text segment for text-to-speech output to the user, the particular text segment having a complexity score indicating a corresponding level of complexity associated with the particular text segment;
  
  modifying the particular text segment for the text-to-speech output to the user based on the complexity score of the particular text segment and the selected user context;
  
  generating audio data comprising a synthesized utterance of the modified particular text segment; and
  
  providing the audio data comprising the synthesized utterance of the modified particular text segment to the client device.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The system of claim 12, wherein the operations further comprise, prior to determining the particular text segment for text-to-speech output to the user, receiving data indicating a voice query detected by the client device,wherein determining the particular text segment for text-to-speech output comprises generating the particular text segment as a response to the voice query, andwherein providing the audio data comprises providing the audio data to the client device for output as the response to the voice query.
  - 14. The system of claim 12, wherein determining the particular text segment for text-to-speech output to the user comprises generating the particular text segment from one or more search results identified by a search engine in response to a query from the user.
  - 15. The system of claim 12, wherein modifying the particular text segment for the text-to-speech output to the user comprises:
    - parsing the particular text segment into individual portions;
      
      determining a corresponding level of complexity associated with each individual portion within the particular text segment;
      
      identifying one or more individual portions within the particular text segment having levels of complexity that exceed the level of complexity of speech indicated by the selected user context; and
      
      modifying the identified one or more individual portions within the particular text segment to reduce the corresponding levels of complexity associated with the individual portions within the particular text segment below the level of complexity of speech indicated by the selected user context.
  - 16. The system of claim 12, wherein modifying the particular text segment for the text-to-speech output to the user comprises:
    - determining that the corresponding level of complexity associated with the particular text segment exceeds the level of complexity of speech indicated by the selected user context; and
      
      modifying the particular text segment to reduce the corresponding level of complexity associated with the particular text segment below the level of complexity of speech indicated by the selected user context.
  - 17. The system of claim 12, wherein:
    - receiving the context data comprises receiving data indicating a location, speed, or movement pattern of the client device; and
      
      selecting the user context comprises selecting the user context based on the location, speed, or movement pattern of the client device indicated by the context data.
  - 18. The system of claim 12, wherein:
    - the context data indicates GPS data indicating a current location associated with the user; and
      
      selecting the user context comprises selecting the user context based on the GPS data indicating the current location associated with the user.
  - 19. The system of claim 12, wherein:
    - the context data includes sensor data from the client device of the user; and
      
      selecting the user context comprises selecting the user context based on the sensor data from the mobile device of the user.
  - 20. The system of claim 12, wherein the client device displays a mobile application that uses a text-to-speech interface.
  - 21. The system of claim 12, wherein the operations further comprise:
    - receiving context data from the client device of the user; and
      
      selecting the user context corresponding to the context data from the client device, the user context selected from among a plurality of user contexts and indicating the level of complexity of speech that the user is likely able to comprehend at a given time when the context data was received.
  - 22. The system of claim 12, wherein determining the user context comprises inferring a language proficiency of the user based at least on previous queries submitted by the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sharifi, Matthew, Foerster, Jakob

Granted Patent

US 10,453,441 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/289   Phrasal analysis, e.g. fini...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

0 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

0 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others