Automated text to speech voice development
First Claim
Patent Images
1. A system comprising:
- one or more processors;
a computer-readable memory; and
a module comprising executable instructions stored in the computer-readable memory, the module, when executed by the one or more processors, configured to;
generate an audio representation of a text,wherein the audio representation comprises a sequence of speech segments selected from a plurality of speech segments,wherein the selection of the sequence of speech segments is based at least in part on a plurality of conversion rules, andwherein each speech segment of the sequence of speech segments corresponds to a subword unit of the text;
transmit, to a plurality of client devices, the text and the audio representation;
receive, from a first client device of the plurality of client devices, first feedback data associated with the audio representation;
receive, from a second client device of the plurality of client devices, second feedback data associated with the audio representation; and
use the first feedback data and the second feedback data to modify, at least in part, the plurality of speech segments or the plurality of conversion rules.
2 Assignments
0 Petitions
Accused Products
Abstract
A group of users may be presented with text and a synthesized speech recording of the text. The users can listen to the synthesized speech recording and submit feedback regarding errors or other issues with the synthesized speech. A system of one or more computing devices can analyze the feedback, modify the voice or language rules, and recursively test the modifications. The modifications may be determined through the use of machine learning algorithms or other automated processes.
-
Citations
31 Claims
-
1. A system comprising:
-
one or more processors; a computer-readable memory; and a module comprising executable instructions stored in the computer-readable memory, the module, when executed by the one or more processors, configured to; generate an audio representation of a text, wherein the audio representation comprises a sequence of speech segments selected from a plurality of speech segments, wherein the selection of the sequence of speech segments is based at least in part on a plurality of conversion rules, and wherein each speech segment of the sequence of speech segments corresponds to a subword unit of the text; transmit, to a plurality of client devices, the text and the audio representation; receive, from a first client device of the plurality of client devices, first feedback data associated with the audio representation; receive, from a second client device of the plurality of client devices, second feedback data associated with the audio representation; and use the first feedback data and the second feedback data to modify, at least in part, the plurality of speech segments or the plurality of conversion rules. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
under control of one or more computing devices configured with specific computer-executable instructions, generating an audio representation of a text, wherein the text comprises a word, wherein the audio representation comprises a sequence of speech segments of a plurality of speech segments, and wherein selection of the sequence of speech segments is based at least in part on a plurality of conversion rules; transmitting the audio representation and the text to a first client device and a second client device of a plurality of client devices; receiving first feedback data from the first client device, the first feedback data relating to the audio representation; receiving second feedback data from the second client device, the second feedback data relating to the audio representation; and determining, based at least in part on the first feedback data and the second feedback data, whether to modify at least one of (i) the plurality of speech segments or (ii) the plurality of conversion rules. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A system comprising:
-
one or more processors; a computer-readable memory; and a module comprising executable instructions stored in the computer-readable memory, the module, when executed by the one or more processors, configured to; generate an audio representation of a text, wherein the audio representation comprises a sequence of speech segments of a plurality of speech segments, and wherein the sequence is based at least in part on a plurality of conversion rules; transmit the audio representation to a first client device and a second client device of a plurality of client devices; receive first feedback data from the first client device, wherein the first feedback data relates to the audio representation; receive second feedback data from the second client device, wherein the second feedback data relates to the audio representation; and determine whether to modify at least one of (i) the plurality of conversion rules or (ii) the plurality of speech segments based at least in part on at least one of the first feedback data and the second feedback data. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification