Synthesizing an aggregate voice

US 9,384,728 B2
Filed: 09/30/2014
Issued: 07/05/2016
Est. Priority Date: 09/30/2014
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for synthesizing multi-person speech into an aggregate voice, the method comprising:

crowd-sourcing a data message configured to include a textual passage;

collecting, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage;

mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice;

calculating, using a natural language processing technique configured to analyze the set of vocal data, a spoken word count for the first set of enunciation data;

computing, based on the spoken word count and a predetermined word quantity, reward credits;

transmitting, to a first speaker of the first set of enunciation data, the reward credits; and

transmitting, in response to synthesizing the aggregate voice, the aggregate voice to a remote device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and computer-implemented method for synthesizing multi-person speech into an aggregate voice is disclosed. The method may include crowd-sourcing a data message configured to include a textual passage. The method may include collecting, from a plurality of speakers, a set of vocal data for the textual passage. Additionally, the method may also include mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice.

Citations

14 Claims

1. A computer implemented method for synthesizing multi-person speech into an aggregate voice, the method comprising:
- crowd-sourcing a data message configured to include a textual passage;
  
  collecting, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage;
  
  mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice;
  
  calculating, using a natural language processing technique configured to analyze the set of vocal data, a spoken word count for the first set of enunciation data;
  
  computing, based on the spoken word count and a predetermined word quantity, reward credits;
  
  transmitting, to a first speaker of the first set of enunciation data, the reward credits; and
  
  transmitting, in response to synthesizing the aggregate voice, the aggregate voice to a remote device.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein mapping the source voice profile to a subset of the set of vocal data to synthesize the aggregate voice includes:
    - extracting phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates;
      
      converting, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and
      
      applying, to the set of phoneme strings, the source voice profile.
  - 3. The method of claim 1, wherein the source voice profile includes a predetermined set of phonological and prosodic characteristics corresponding to a voice of a first individual.
  - 4. The method of claim 3, wherein the phonological and prosodic characteristics include rhythm, stress, tone, and intonation.
  - 5. The method of claim 1, further comprising:
    - assigning, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and
      
      transmitting, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to the first speaker.
  - 6. The method of claim 1, further comprising:
    - detecting, by an incentive system, a transition phase of an entertainment content sequence;
      
      presenting, during the transition phase of the entertainment content sequence, a speech sample collection module configured to record enunciation data for the textual passage; and
      
      advancing, in response to recording enunciation data for the textual passage, the entertainment content sequence.

7. A system for synthesizing multi-person speech into an aggregate voice, the system comprising:
- a crowd-sourcing module configured to crowd-source a data message including a textual passage;
  
  a collecting module configured to collect, from a plurality of speakers, a set of vocal data for the textual passage, wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage, a second set of enunciation data corresponding to a second portion of the textual passage, and a third set of enunciation data corresponding to both the first and second portions of the textual passage;
  
  a mapping module configured to map a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice, the mapping module further comprising;
  
  an extracting module configured to extract phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates;
  
  a converting module configured to convert, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and
  
  an applying module configured to apply, to the set of phoneme strings, the source voice profile;
  
  a calculating module configured to calculate, using a natural language processing technique to analyze the set of vocal data, a spoken word count for the first set of enunciation data.a computing module configured to compute, based on the spoken word count and a predetermined word quantity, reward credits; and
  
  a transmitting module configured to transmit, to a first speaker of the first set of enunciation data, the reward credits, wherein the transmitting module is further configured to transmit the aggregate voice to a remote device.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The system of claim 7, wherein the source voice profile includes a predetermined set of phonological and prosodic characteristics corresponding to a voice of a first individual.
  - 9. The system of claim 8, wherein the phonological and prosodic characteristics include rhythm, stress, tone, and intonation.
  - 10. The system of claim 7, further comprising:
    - an assigning module configured to assign, based on evaluating the phonological data from the set of vocal data, a first quality score to the first set of enunciation data; and
      
      wherein the transmitting module is configured to transmit, in response to determining that the first quality score is greater than a first quality threshold, bonus credits to the first speaker.
  - 11. The system of claim 7, further comprising:
    - a detecting module configured to detect, using an incentive system, a transition phase of an entertainment content sequence;
      
      a presenting module configured to present, during the transition phase of the entertainment content sequence, a speech sample collection module configured to record enunciation data for the textual passage; and
      
      an advancing module configured to advance, in response to recording enunciation data for the textual passage, the entertainment content sequence.

12. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable storage medium is not a transitory signal per se, wherein the computer readable program, when executed on a first computing device, causes the first computing device to:
- crowd-source a data message configured to include a textual passage;
  
  collect, from a plurality of speakers, a set of vocal data for the textual passage;
  
  map a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice;
  
  calculating, using a natural language processing technique configured to analyze the set of vocal data, a spoken word count for a first set of enunciation data;
  
  assigning, based on evaluating phonological data from the set of vocal data, a first quality score to the first set of enunciation data;
  
  computing, based on the first quality score, the spoken word count, and a predetermined word quantity, reward credits;
  
  transmitting, in response to determining that the first quality score is greater than a first quality threshold, the reward credits to the first speaker; and
  
  transmitting, in response to synthesizing the aggregate voice, the aggregate voice to a remote device.
- View Dependent Claims (13, 14)
- - 13. The computer program product of claim 12, further comprising computer readable program code configured to:
    - extract phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates;
      
      convert, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; and
      
      apply, to the set of phoneme strings, the source voice profile.
  - 14. The computer program product of claim 12, further comprising computer readable program code configured to:
    - detect, by an incentive system, a transition phase of an entertainment content sequence;
      
      present, during the transition phase of the entertainment content sequence, a speech sample collection module configured to record enunciation data for the textual passage; and
      
      advance, in response to recording enunciation data for the textual passage, the entertainment content sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
de Freitas, Jose A. G., Hindle, Guy P., Taylor, James S.
Primary Examiner(s)
SINGH, SATWANT K

Application Number

US14/501,230
Publication Number

US 20160093286A1
Time in Patent Office

644 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/10   Prosody rules derived from ...

Synthesizing an aggregate voice

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Synthesizing an aggregate voice

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links