Sender-responsive text-to-speech processing

US 9,570,066 B2
Filed: 07/16/2012
Issued: 02/14/2017
Est. Priority Date: 07/16/2012
Status: Active Grant

First Claim

Patent Images

1. A method of speech synthesis, comprising the steps of:

(a) receiving speech input from a sender;

(b) obtaining at least one distinguishing characteristic of the sender from the speech input, wherein the at least one distinguishing characteristic includes conversational information or textual information of the speech input;

(c) obtaining baseline characteristics, wherein the baseline characteristics include articulation rate, courteousness, formants, or pitch frequency that a recipient user of the system is accustomed to hearing;

(d) selecting a default text-to-speech model based on the at least one distinguishing characteristic of the sender;

(e) modifying the selected default text-to-speech model using the received speech input;

(f) receiving, at a text-to-speech system, a text input sent by the sender;

(g) processing, via a processor of the system and the text-to-speech model, the text input responsive to the at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender;

(h) identifying baseline characteristics of the synthesized speech;

(i) applying an acoustic feature filter to the synthesized speech, wherein the acoustic feature filter is adjusted using the baseline characteristics obtained from the received speech; and

(j) communicating the synthesized speech to the recipient user of the system.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.

37 Citations

View as Search Results

16 Claims

1. A method of speech synthesis, comprising the steps of:
- (a) receiving speech input from a sender;
  
  (b) obtaining at least one distinguishing characteristic of the sender from the speech input, wherein the at least one distinguishing characteristic includes conversational information or textual information of the speech input;
  
  (c) obtaining baseline characteristics, wherein the baseline characteristics include articulation rate, courteousness, formants, or pitch frequency that a recipient user of the system is accustomed to hearing;
  
  (d) selecting a default text-to-speech model based on the at least one distinguishing characteristic of the sender;
  
  (e) modifying the selected default text-to-speech model using the received speech input;
  
  (f) receiving, at a text-to-speech system, a text input sent by the sender;
  
  (g) processing, via a processor of the system and the text-to-speech model, the text input responsive to the at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender;
  
  (h) identifying baseline characteristics of the synthesized speech;
  
  (i) applying an acoustic feature filter to the synthesized speech, wherein the acoustic feature filter is adjusted using the baseline characteristics obtained from the received speech; and
  
  (j) communicating the synthesized speech to the recipient user of the system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the at least one distinguishing characteristic is obtained from a former communication between the sender and the recipient.
  - 3. The method of claim 2 wherein the at least one distinguishing characteristic includes at least one of acoustic information or conversational demographic information extracted from a previous voice communication session with the sender.
  - 4. The method of claim 2 wherein the at least one distinguishing characteristic includes textual demographic information extracted from a previous text communication session with the sender.
  - 5. The method of claim 2 wherein the at least one distinguishing characteristic includes behavioral demographic information extracted from a previous voice or text communication with the sender.
  - 6. The method of claim 5 wherein the at least one distinguishing characteristic also includes textual demographic information and at least one of acoustic information or conversational demographic information extracted from a previous voice communication session with the sender.
  - 7. The method of claim 1 wherein the processing step includes using a TTS model that was selected from a plurality of TTS models in response to the at least one distinguishing characteristic, and was thereafter adapted in response to the at least one distinguishing characteristic.
  - 8. The method of claim 1 wherein the at least one distinguishing characteristic includes at least one collective attribute representative of a group to which the sender belongs.
  - 9. The method of claim 8 wherein the at least one collective attribute includes at least one of gender, age, ethnicity, dialect, or accent.
  - 10. The method of claim 1 wherein the at least one distinguishing characteristic includes at least one individual attribute that is personal to the sender that created the text input.
  - 11. The method of claim 10 wherein the at least one individual attribute is prosodic and includes at least one of pitch, intonation, pronunciation, stress, articulation rate, tone, loudness, or formant frequencies.
  - 12. A computer program product embodied in a non-transitory computer readable medium and including instructions usable by a computer processor of a TTS system to cause the system to implement steps of a method according to claim 1.

13. A method of speech synthesis, comprising the steps of:
- (a) obtaining at least one distinguishing characteristic of a sender from received speech input obtained during a communication session with the sender, wherein the at least one distinguishing characteristic includes conversational information or textual information of the speech input, and further obtaining baseline characteristics including articulation rate, courteousness, formants, or pitch frequency that a recipient is accustomed to hearing;
  
  (b) selecting a text-to-speech model based on the at least one distinguishing characteristic of the sender;
  
  (c) modifying the selected text-to-speech model using the at least one distinguishing characteristic of the sender;
  
  (d) receiving, at a text-to-speech (TTS) system, a text input sent by the sender in a subsequent communication session with the sender;
  
  (e) processing, via a processor of the system, the text input responsive to the modified text-to-speech model to produce synthesized speech that is representative of a voice of the sender of the text input;
  
  (f) identifying baseline characteristics of the synthesized speech;
  
  (g) applying an acoustic feature filter to the synthesized speech, wherein the acoustic feature filter is adjusted using the baseline characteristics obtained from the received speech; and
  
  (h) communicating the synthesized speech to a user of the system, the user being the recipient of the communication session.
- View Dependent Claims (14, 15, 16)
- - 14. The method of claim 13, wherein the obtaining step includes:
    - (a1) receiving, at an automatic speech recognition system, audio from the sender;
      
      (a2) pre-processing the received audio to generate acoustic feature vectors;
      
      (a3) decoding the generated acoustic feature vectors to produce a plurality of speech hypotheses;
      
      (a4) post-processing the speech hypotheses to identify speech in the audio from the sender and to create a transcript of the identified speech; and
      
      (a5) storing the identified speech.
  - 15. The method of claim 14, wherein the modifying of the text-to-speech model in step (c) comprises:
    - estimating a model transformation; and
      
      applying the model transformation to the TTS model selected in step (b) to produce an adapted TTS model, wherein the processing step (e) includes using the adapted TTS model to produce the synthesized speech.
  - 16. The method of claim 15, wherein the step of adapting the TTS model is carried out on speech in a voice mail message from the sender and in response to receiving the voice mail message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
General Motors LLC (General Motors Company)
Original Assignee
General Motors LLC (General Motors Company)
Inventors
Talwar, Gaurav, Zhao, Xufang, Hecht, Ron M.
Primary Examiner(s)
Sirjani, Fariba

Application Number

US13/550,009
Publication Number

US 20140019135A1
Time in Patent Office

1,674 Days
Field of Search

704/258
US Class Current

1/1
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/08 Text analysis or generation...

Sender-responsive text-to-speech processing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Sender-responsive text-to-speech processing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links