Generation of voice messages

US 6,175,821 B1
Filed: 04/26/1999
Issued: 01/16/2001
Est. Priority Date: 07/31/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating a voice message signal representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:

obtaining a recorded carrier speech signal representing at least a major part of the invariable portion;

obtaining a message-specific speech signal representing at least the variable portion;

generating a transition signal on the basis of the carrier and message-specific speech signals;

forming the voice message signal by concatenating all or part of one of the carrier speech signal and the message-specific speech signal, said transition signal and all or part of the other of said carrier speech signal and the message-specific speech signal.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice message is generated having an invariable portion and a variable portion. Most of the invariable portion is provided in the form of recorded speech whereas the variable portion is provided in the form of synthesized speech. The synthesized speech also extends by half a phoneme into the invariable portion of the message. The synthesized speech and the recorded speech are then concatenated, with a transition signal being formed on the basis of a boundary portion of each of the recorded and synthesized signals about any join. In forming the transition signal, a set of transition signal pitchmarks is created and an overlap-add technique is used to copy the waveform within the boundary portions of the speech signals around the transition signal pitchmarks. The signal around the penultimate pitchmark in the leading boundary portion is copied to the trailing half of the transition signal and the signal around the second pitchmark in the trailing boundary portion is copied to the leading half of the transition signal. In this way, the characteristics of the generated message around the join change gradually between the characteristics of the recorded speech and the characteristics of the synthesized speech.

Citations

32 Claims

1. A method of generating a voice message signal representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:
- obtaining a recorded carrier speech signal representing at least a major part of the invariable portion;
  
  obtaining a message-specific speech signal representing at least the variable portion;
  
  generating a transition signal on the basis of the carrier and message-specific speech signals;
  
  forming the voice message signal by concatenating all or part of one of the carrier speech signal and the message-specific speech signal, said transition signal and all or part of the other of said carrier speech signal and the message-specific speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method according to claim 1, further comprising the step of truncating one or both of the carrier speech signal and the message-specific speech signal to the extent that the total length removed is substantially equal to the length of the transition signal.
  - 3. A method according to claim 1 wherein said transition signal generating step involves the generation of a transition signal which represents a transition audio portion whose pitch varies from having an initial pitch similar to the end of the leading one of said carrier speech signal and said message-specific speech signal to having a final pitch similar to the beginning of the trailing one of the carrier speech signal and the message-specific speech signal.
  - 4. A method according to claim 1 wherein said transition signal generating step comprises:
5. A method according to claim 4 wherein said transition pitchmark providing step involves a linear interpolation between the pitch of the voice message on either side of the transition audio portion.
6. A method according to claim 4 wherein said mapping comprises mapping a combination of a carrier speech short-term signal and a message-specific speech short-term signal to one or more of said plurality of transition pitchmarks.
7. A method according to claim 1 wherein the transition audio portion is located around the centre of a phoneme of the invariable portion, which phoneme is closest to the boundary between the invariable portion and the variable portion of the voice message.

8. A method of generating a voice message-representing signal from a text-representing signal, said method comprising:
- obtaining a leading signal corresponding to a leading portion of said text-representing signal;
  
  obtaining a trailing signal corresponding to a trailing portion of said text-representing signal;
  
  wherein;
  
  said leading signal represents a first voice and said trailing signal represents a second voice, said first voice and said second voice each including one of a variable speech signal and an invariable speech signal which are discernibly differing in respect of at least one quality;
  
  at least a major portion of said leading signal represents a first voice message portion; and
  
  at least a major portion of said trailing signal represents a second voice message portion;
  
  said method further comprising the steps of;
  
  generating, on the basis of said leading signal and said trailing signal, a transition signal representing a transition audio portion, which audio portion varies from having an initial pitch similar to that of the end of said first voice message portion to having a final pitch similar to that of the beginning of the second voice message portion; and
  
  concatenating said at least major portion of said leading signal, said transition signal and said at least major portion of said trailing signal in providing said voice message-representing signal.

9. Apparatus for generating a voice message signal representing a message comprising a variable portion and an invariable portion, said apparatus comprising:
- means arranged in operation to receive a carrier speech signal representing at least a major part of the invariable portion;
  
  means arranged in operation to receive a message-specific speech signal representing at least the variable portion;
  
  means arranged in operation to generate a transition signal on the basis of said carrier and message-specific signals;
  
  means arranged in operation to form said voice message signal by concatenating one of said carrier signal and said message-specific signal, said transition signal and the other of said carrier and said message-specific signal.
- View Dependent Claims (10, 11)
- - 10. A text to speech conversion apparatus including a voice message signal generator according to claim 9.
  - 11. A voice operated database enquiry apparatus including a text to speech conversion apparatus according to claim 10.

12. Apparatus for generating voice message data representing a voice message having a variable portion and an invariable portion, said apparatus including:
- a storage medium having recorded therein processor readable code processable to generate said voice message data, said code comprising;
  
  message-specific speech procurement code processable to procure message-specific data representing said variable portion;
  
  carrier speech retrieval code processable to retrieve carrier speech data from a carrier speech store;
  
  transition data generating code processable to generate, on the basis of said carrier speech data and said message-specific speech data, transition data representing a transition audio portion;
  
  concatenation code processable to form said voice message data by concatenating one of said carrier speech data and said message-specific speech data, said transition data, and the other of said carrier speech data and said message-specific speech data to form said voice message-representing data.

13. A program storage device readable by a processing apparatus, said device tangibly embodying a program of instructions executable by the processor to perform method steps for:
- obtaining a carrier speech signal representing at least a major part of the invariable portion;
  
  obtaining a message-specific speech signal representing at least the variable portion;
  
  generating a transition signal on the basis of said carrier and message-specific signals;
  
  forming said voice message signal by concatenating one of said carrier signal and said message-specific signal, said transition signal and the other of said carrier and said message-specific signal.

14. A method of generating a voice message sample sequence representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:
- obtaining a recorded carrier speech sample sequence representing at least a major part of the invariable portion;
  
  obtaining a message-specific speech sample sequence representing at least the variable portion;
  
  generating a transition sample sequence on the basis of the carrier speech sample sequence and the message-specific speech sample sequence;
  
  forming the voice message sample sequence by concatenating all or part of either the carrier speech sample sequence or the message-specific speech sample sequence, said transition sample sequence and all or part of the other of said carrier speech sample sequence and the message-specific speech sample sequence.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. A method as in claim 14 further comprising the step of:
16. A method as in claim 14 wherein:
- said transition sample sequence generating step involves the generation of a transition sample sequence which represents a transition audio portion whose pitch varies from having an initial pitch similar to the end of the leading one of said carrier speech sample sequence and said message-specific speech sample sequence to having a final pitch similar to the beginning of the trailing one of the carrier speech sample sequence and the message-specific speech sample sequence.
17. A method as in claim 14 wherein said transition sample sequence generating step comprises:
- generating a plurality of transition pitchmarks, the spacing of which represents the pitch of a transition audio portion represented by said transition sample sequence;
  
  windowing the carrier speech sample sequence to provide carrier speech short-term sample sequence;
  
  windowing the message-specific speech sample sequence to provide message-specific speech short-term sample sequences; and
  
  mapping the carrier speech short-term sample sequences and the message-specific short-term sample sequences onto said transition pitchmarks to generate the transition sample sequence.
18. A method as in claim 17 wherein said transition pitchmark providing step involves a linear interpolation between the pitch of the voice message on either side of the transition audio portion.
19. A method as in claim 17 wherein said mapping comprises mapping a combination of a carrier speech short-term sample sequence and a message-specific speech short-term sample sequence to one or more of said plurality of transition pitchmarks.
20. A method as in claim 14 wherein the transition audio portion is located around the center of a phoneme of the invariable portion, which phoneme is closest to the boundary between the invariable portion and the variable portion of the voice message.

21. A method of generating a voice message-representing sample sequence from a text-representing signal, said method comprising:
- obtaining a leading sample sequence corresponding to a leading portion of said text-representing signal;
  
  obtaining a trailing sample sequence corresponding to a trailing portion of said text-representing signal;
  
  wherein;
  
  said leading sample sequence represents a first voice and said trailing sample sequence represents a second voice, said first voice and said second voice each representing one of variable and invariable speech which are discernibly differing in respect of at least one quality;
  
  at least a major portion of said leading sample sequence represents a first voice message portion; and
  
  at least a major portion of said trailing sample sequence represents a second voice message portion;
  
  said method further comprising the steps of;
  
  generating, on the basis of said leading sample sequence and said trailing sample sequence, a transition sample sequence representing a transition audio portion, which audio portion varies from having an initial pitch similar to that of the end of said first voice message portion to having a final pitch similar to that of the beginning of the second voice message portion; and
  
  concatenating said at least major portion of said leading sample sequence, said transition sample sequence and said at least major portion of said trailing sample sequence in providing said voice message-representing sample sequence.

22. Apparatus for generating a voice message sample sequence representing a message comprising a variable portion and an invariable portion, said apparatus comprising:
- means arranged in operation to receive a carrier speech sample sequence representing at least a major part of the invariable portion;
  
  means arranged in operation to receive a message-specific speech sample sequence representing at least the variable portion;
  
  means arranged in operation to generate a transition sample sequence on the basis of said carrier and message-specific sample sequences;
  
  means arranged in operation to form said voice message sample sequence by concatenating one of said carrier sample sequence and said message-specific sample sequence, said transition sample sequence and the other of said carrier and said message-specific sample sequence.
- View Dependent Claims (23, 24)
- - 23. A text to speech conversion apparatus including a voice message sample sequence generator as in claim 22.
  - 24. A voice operated database inquiry apparatus including a text to speech conversion apparatus as in claim 23.

25. Apparatus for generating voice message data representing a voice message having a variable portion and an invariable portion, said apparatus including:
- a storage medium having recorded therein processor readable code processable to generate said voice message data, said code comprising;
  
  message-specific speech procurement code processable to procure message-specific data representing said variable portion;
  
  carrier speech retrieval code processable to retrieve carrier speech data from a carrier speech store;
  
  transition data generating code processable to generate, on the basis of said carrier speech data and said message-specific speech data, transition data representing a transition audio portion;
  
  concatenation code processable to form said voice message data by concatenating one of said carrier speech data and said message-specific speech data, said transition data, and the other of said carrier speech data and said message-specific speech data to form said voice message-representing data.

26. A program storage device readable by a processing apparatus, said device tangibly embodying a program of instructions executable by the processor to perform method steps for:
- obtaining a carrier speech sample sequence representing at least a major part of the invariable portion;
  
  obtaining a message-specific speech sample sequence representing at least the variable portion;
  
  generating a transition sample sequence on the basis of said carrier and message-specific sample sequence;
  
  forming said voice message sample sequence by concatenating one of said carrier sample sequence and said message-specific sample sequence, said transition sample sequence and the other of said carrier and said message-specific sample sequence.

27. A method for generating a voice message having an invariable recorded carrier portion and a message-specific variable synthesized portion, said method comprising:
- concatenating adjacent leading and trailing segments of phoneme-representing speech signal samples for said portions without a discernible pause therebetween in the time domain; and
  
  including in said concatenation transition phoneme-representing speech signal samples distributed across the joined segments to effect a smoothed change in at least one speech quality signal component that would otherwise be discernibly different between said portions to a listener.
- View Dependent Claims (28, 29, 30, 31, 32)
- - 28. A method as in claim 27 wherein said transition signals are distributed between points representing the approximate centers of adjacent phonemes of said concatenated invariable and variable portions.
  - 29. A method as in claim 28 wherein said transition signals are generated by mapping phoneme-representing speech signal samples of at least some of both the variable and invariable portions onto each other as modified by a pre-defined transition window function.
  - 30. A method as in claim 27 wherein:
31. A method as in claim 30 wherein said invariable portion has been selected as part of a larger phrase which includes one possible choice for the variable portion.
32. A method as in claim 27 wherein said invariable portion has been selected as part of a larger phrase which includes one possible choice for the variable portion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Page, Julian H., Murrin, Paul
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Abebe, Daniel

Application Number

US09/125,707
Time in Patent Office

631 Days
Field of Search

704/270, 704/260, 704/265, 704/258, 704/207
US Class Current

704/258
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

H04M 2201/60   Medium conversion

H04M 3/4931   Directory assistance systems

Generation of voice messages

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Generation of voice messages

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links