Method and apparatus for combining text to speech and recorded prompts

US 8,600,753 B1
Filed: 12/30/2005
Issued: 12/03/2013
Est. Priority Date: 12/30/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a text message for conversion to speech, the text message having a tagged portion and a non-tagged portion;

identifying a topic domain associated with the text message;

selecting, via a text-to-speech device, first phonemes from a phoneme database for the non-tagged portion based on first speech-related characteristics, wherein the phoneme database is specific to the topic domain and comprises phonemes labeled by database tags;

generating first speech synthesis rules for the non-tagged portion based on the first speech-related characteristics;

selecting second phonemes from the phoneme database based on second speech-related characteristics as indicated by message tags in the tagged portion of the text message, wherein the selecting is based on a matching of the message tags and the database tags, wherein the first phonemes and the second phonemes do not represent pre-recorded speech;

retrieving second speech synthesis rules for the tagged portion based on the second speech-related characteristics; and

synthesizing, via the text-to-speech device, speech by combining the first phonemes and the second phonemes using the first speech synthesis rules and the second speech synthesis rules.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.

Citations

9 Claims

1. A method comprising:
- receiving a text message for conversion to speech, the text message having a tagged portion and a non-tagged portion;
  
  identifying a topic domain associated with the text message;
  
  selecting, via a text-to-speech device, first phonemes from a phoneme database for the non-tagged portion based on first speech-related characteristics, wherein the phoneme database is specific to the topic domain and comprises phonemes labeled by database tags;
  
  generating first speech synthesis rules for the non-tagged portion based on the first speech-related characteristics;
  
  selecting second phonemes from the phoneme database based on second speech-related characteristics as indicated by message tags in the tagged portion of the text message, wherein the selecting is based on a matching of the message tags and the database tags, wherein the first phonemes and the second phonemes do not represent pre-recorded speech;
  
  retrieving second speech synthesis rules for the tagged portion based on the second speech-related characteristics; and
  
  synthesizing, via the text-to-speech device, speech by combining the first phonemes and the second phonemes using the first speech synthesis rules and the second speech synthesis rules.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein synthesizing speech further comprises executing a unit selection synthesis operation.
  - 3. The method of claim 1, wherein the first speech-related characteristics and the second speech-related characteristics comprise phonemes, durations and pitches associated with parsed portions of the text message.

4. An text-to-speech device having instructions stored which, when executed, cause the text-to-speech device to perform operations comprising:
- receiving a text message for conversion to speech, the text message having a tagged portion comprising message tags and a non-tagged portion;
  
  identifying a topic domain associated with the text message;
  
  generating first speech synthesis rules for the non-tagged portion;
  
  retrieving second speech synthesis rules for the tagged portion;
  
  retrieving first phonemes from a phoneme database for the non-tagged portion of the text message;
  
  retrieving second phonemes from the phoneme database for the tagged-portion of the text message, wherein the phoneme database is specific to the topic domain and comprises phonemes labeled by database tags, wherein the retrieving of the first phonemes and the second phonemes is based on a matching of the message tags and the database tags, and wherein the first phonemes and the second phonemes do not represent pre-recorded speech; and
  
  combining the first phonemes and the second phonemes to output an audible version of the text message using the first speech synthesis rules and the second speech synthesis rules.
- View Dependent Claims (5, 6)
- - 5. The text-to-speech device of claim 4, wherein the first phonemes and the second phonemes are retrieved by executing a unit selection synthesis operation.
  - 6. The text-to-speech device of claim 4, wherein the first phonemes and the second phonemes are retrieved based on speech related characteristics that comprise durations and pitches associated with respective portions of the text message.

7. A method comprising:
- receiving text to be converted to speech, the text having a tagged portion and a non-tagged portion;
  
  identifying, via a text-to-speech device, a topic domain associated with the text;
  
  for the non-tagged portion of the text, retrieving first phonemes from a phoneme database having first speech related characteristics, wherein the phoneme database is specific to the topic domain and comprises phonemes labeled by database tags;
  
  generating first speech synthesis rules for the non-tagged portion based on the first speech-related characteristics;
  
  for the tagged portion of the text, retrieving second phonemes from the database, the second phonemes having second speech related characteristics as indicated by message tags associated with the tagged portion, and wherein the retrieving is based on a matching of the message tags and the database tags wherein the first and the second phonemes do not represent pre-recorded speech;
  
  retrieving second speech synthesis rules for the tagged portion based on the second speech-related characteristics; and
  
  synthesizing, via the text-to-speech device, speech based on the text by combining the first phonemes and the second phonemes using the first speech synthesis rules and the second speech synthesis rules.
- View Dependent Claims (8, 9)
- - 8. The method of claim 7, wherein synthesizing speech further comprises executing a unit selection synthesis operation.
  - 9. The method of claim 7, wherein the first and the second speech related characteristics comprise durations and pitches associated with the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Conkie, Alistair
Primary Examiner(s)
Lerner, Martin

Application Number

US11/321,638
Time in Patent Office

2,895 Days
Field of Search

704/258, 704/260, 704/267, 704/261, 704/266, 704/269
US Class Current

704/260
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

Method and apparatus for combining text to speech and recorded prompts

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for combining text to speech and recorded prompts

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links