Adaptive text-to-speech outputs
First Claim
Patent Images
1. A method performed by one or more processing devices, the method comprising:
- determining, by the one or more processing devices, a complexity level of a voice input to a device;
determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input;
generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and
providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.
2 Assignments
0 Petitions
Accused Products
Abstract
In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output.
14 Citations
24 Claims
-
1. A method performed by one or more processing devices, the method comprising:
-
determining, by the one or more processing devices, a complexity level of a voice input to a device; determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input; generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
one or more processing devices; and one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to perform operations comprising; determining, by the one or more processing devices, a complexity level of a voice input to a device; determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input; generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method performed by one or more computers, the method comprising:
-
obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user; determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score; generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising; obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user; determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score; generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input. - View Dependent Claims (20, 21, 22)
-
-
23. One or more non-transitory computer-readable media storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations comprising:
-
determining, by the one or more processing devices, a complexity level of a voice input to a device; determining, by the one or more processing devices, a message for output in response to the voice input, the message being determined based on the determined complexity level of the voice input; generating, by the one or more processing devices, audio data comprising a synthesized utterance of the message; and providing, by the one or more processing devices, the audio data comprising the synthesized utterance for output in a response to the voice input.
-
-
24. One or more non-transitory computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
-
obtaining, by the one or more computers, data indicating that (i) a particular voice input was provided by a first user, and (ii) the particular voice input was provided by a second user that is different from the first user; determining, by the one or more computers, (i) a first language proficiency score for the first user and (ii) a second language proficiency score for the second user, wherein the first language proficiency score indicates a higher level of language proficiency than the second language proficiency score; generating, by the one or more computers, (i) first audio data comprising a synthesized utterance of a first message based on the first language proficiency score, and (ii) second audio data comprising a synthesized utterance of a second message based on the second language proficiency score, wherein the first message has a higher language complexity than the second message; and providing, by the one or more computers, (i) the first audio data to a client device of the first user in response to the particular voice input and (ii) the second audio data to a client device of the second user in response to the particular voice input.
-
Specification