Synthesizing an aggregate voice
First Claim
Patent Images
1. A computer implemented method for synthesizing multi-person speech into an aggregate voice, the method comprising:
- crowd-sourcing a data message configured to include a textual passage;
collecting, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage;
mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice;
wherein mapping the source voice profile includes;
extracting phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates;
converting, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and
applying, to the set of phoneme strings, the source voice profile;
assigning, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and
transmitting, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to a first speaker of the first set of enunciation data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and computer-implemented method for synthesizing multi-person speech into an aggregate voice is disclosed. The method may include crowd-sourcing a data message configured to include a textual passage. The method may include collecting, from a plurality of speakers, a set of vocal data for the textual passage. Additionally, the method may also include mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice.
-
Citations
15 Claims
-
1. A computer implemented method for synthesizing multi-person speech into an aggregate voice, the method comprising:
-
crowd-sourcing a data message configured to include a textual passage; collecting, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage; mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice;
wherein mapping the source voice profile includes;extracting phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates; converting, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and applying, to the set of phoneme strings, the source voice profile; assigning, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and transmitting, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to a first speaker of the first set of enunciation data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for synthesizing multi-person speech into an aggregate voice, the system comprising:
-
a crowd-sourcing module configured to crowd-source a data message including a textual passage; a collecting module configured to collect, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage; a mapping module configured to map a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice, wherein mapping the source voice profile to a subset of the set of vocal data to synthesize the aggregate voice includes; an extracting module configured to extract phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates; a converting module configured to convert, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and an applying module configured to apply, to the set of phoneme strings, the source voice profile; an assigning module configured to assign, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and a transmitting module configured to transmit, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to a first speaker of the first set of enunciation data. - View Dependent Claims (9, 10, 11)
-
-
12. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable storage medium does not comprise a transitory signal per se, wherein the computer readable program, when executed on a first computing device, causes the first computing device to:
-
crowd-source a data message configured to include a textual passage; collect, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage; map a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice; extract phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates; convert, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; apply, to the set of phoneme strings, the source voice profile; assign, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and transmit, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to a first speaker of the first set of enunciation data. - View Dependent Claims (13, 14, 15)
-
Specification