Generation of voice messages
First Claim
1. A method of generating a voice message signal representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:
- obtaining a recorded carrier speech signal representing at least a major part of the invariable portion;
obtaining a message-specific speech signal representing at least the variable portion;
generating a transition signal on the basis of the carrier and message-specific speech signals;
forming the voice message signal by concatenating all or part of one of the carrier speech signal and the message-specific speech signal, said transition signal and all or part of the other of said carrier speech signal and the message-specific speech signal.
4 Assignments
0 Petitions
Accused Products
Abstract
A voice message is generated having an invariable portion and a variable portion. Most of the invariable portion is provided in the form of recorded speech whereas the variable portion is provided in the form of synthesized speech. The synthesized speech also extends by half a phoneme into the invariable portion of the message. The synthesized speech and the recorded speech are then concatenated, with a transition signal being formed on the basis of a boundary portion of each of the recorded and synthesized signals about any join. In forming the transition signal, a set of transition signal pitchmarks is created and an overlap-add technique is used to copy the waveform within the boundary portions of the speech signals around the transition signal pitchmarks. The signal around the penultimate pitchmark in the leading boundary portion is copied to the trailing half of the transition signal and the signal around the second pitchmark in the trailing boundary portion is copied to the leading half of the transition signal. In this way, the characteristics of the generated message around the join change gradually between the characteristics of the recorded speech and the characteristics of the synthesized speech.
-
Citations
32 Claims
-
1. A method of generating a voice message signal representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:
-
obtaining a recorded carrier speech signal representing at least a major part of the invariable portion;
obtaining a message-specific speech signal representing at least the variable portion;
generating a transition signal on the basis of the carrier and message-specific speech signals;
forming the voice message signal by concatenating all or part of one of the carrier speech signal and the message-specific speech signal, said transition signal and all or part of the other of said carrier speech signal and the message-specific speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
generating a plurality of transition pitchmarks, the spacing of which represents the pitch of a transition audio portion represented by said transition signal;
windowing the carrier speech signal to provide carrier speech short-term signals;
windowing the message-specific speech signal to provide message-specific speech short-term signals; and
mapping the carrier speech short-term signals and the message-specific short-term signals onto said transition pitchmarks to generate the transition signal.
-
-
5. A method according to claim 4 wherein said transition pitchmark providing step involves a linear interpolation between the pitch of the voice message on either side of the transition audio portion.
-
6. A method according to claim 4 wherein said mapping comprises mapping a combination of a carrier speech short-term signal and a message-specific speech short-term signal to one or more of said plurality of transition pitchmarks.
-
7. A method according to claim 1 wherein the transition audio portion is located around the centre of a phoneme of the invariable portion, which phoneme is closest to the boundary between the invariable portion and the variable portion of the voice message.
-
8. A method of generating a voice message-representing signal from a text-representing signal, said method comprising:
-
obtaining a leading signal corresponding to a leading portion of said text-representing signal;
obtaining a trailing signal corresponding to a trailing portion of said text-representing signal;
wherein;
said leading signal represents a first voice and said trailing signal represents a second voice, said first voice and said second voice each including one of a variable speech signal and an invariable speech signal which are discernibly differing in respect of at least one quality;
at least a major portion of said leading signal represents a first voice message portion; and
at least a major portion of said trailing signal represents a second voice message portion;
said method further comprising the steps of;
generating, on the basis of said leading signal and said trailing signal, a transition signal representing a transition audio portion, which audio portion varies from having an initial pitch similar to that of the end of said first voice message portion to having a final pitch similar to that of the beginning of the second voice message portion; and
concatenating said at least major portion of said leading signal, said transition signal and said at least major portion of said trailing signal in providing said voice message-representing signal.
-
-
9. Apparatus for generating a voice message signal representing a message comprising a variable portion and an invariable portion, said apparatus comprising:
-
means arranged in operation to receive a carrier speech signal representing at least a major part of the invariable portion;
means arranged in operation to receive a message-specific speech signal representing at least the variable portion;
means arranged in operation to generate a transition signal on the basis of said carrier and message-specific signals;
means arranged in operation to form said voice message signal by concatenating one of said carrier signal and said message-specific signal, said transition signal and the other of said carrier and said message-specific signal. - View Dependent Claims (10, 11)
-
-
12. Apparatus for generating voice message data representing a voice message having a variable portion and an invariable portion, said apparatus including:
-
a storage medium having recorded therein processor readable code processable to generate said voice message data, said code comprising;
message-specific speech procurement code processable to procure message-specific data representing said variable portion;
carrier speech retrieval code processable to retrieve carrier speech data from a carrier speech store;
transition data generating code processable to generate, on the basis of said carrier speech data and said message-specific speech data, transition data representing a transition audio portion;
concatenation code processable to form said voice message data by concatenating one of said carrier speech data and said message-specific speech data, said transition data, and the other of said carrier speech data and said message-specific speech data to form said voice message-representing data.
-
-
13. A program storage device readable by a processing apparatus, said device tangibly embodying a program of instructions executable by the processor to perform method steps for:
-
obtaining a carrier speech signal representing at least a major part of the invariable portion;
obtaining a message-specific speech signal representing at least the variable portion;
generating a transition signal on the basis of said carrier and message-specific signals;
forming said voice message signal by concatenating one of said carrier signal and said message-specific signal, said transition signal and the other of said carrier and said message-specific signal.
-
-
14. A method of generating a voice message sample sequence representing all or part of a message comprising a variable portion and an invariable portion, said method comprising:
-
obtaining a recorded carrier speech sample sequence representing at least a major part of the invariable portion;
obtaining a message-specific speech sample sequence representing at least the variable portion;
generating a transition sample sequence on the basis of the carrier speech sample sequence and the message-specific speech sample sequence;
forming the voice message sample sequence by concatenating all or part of either the carrier speech sample sequence or the message-specific speech sample sequence, said transition sample sequence and all or part of the other of said carrier speech sample sequence and the message-specific speech sample sequence. - View Dependent Claims (15, 16, 17, 18, 19, 20)
truncating one or both of the carrier speech sample sequence and the message-specific speech sample sequence to the extent that the total length removed is substantially equal to the length of the transition.
-
-
16. A method as in claim 14 wherein:
said transition sample sequence generating step involves the generation of a transition sample sequence which represents a transition audio portion whose pitch varies from having an initial pitch similar to the end of the leading one of said carrier speech sample sequence and said message-specific speech sample sequence to having a final pitch similar to the beginning of the trailing one of the carrier speech sample sequence and the message-specific speech sample sequence.
-
17. A method as in claim 14 wherein said transition sample sequence generating step comprises:
-
generating a plurality of transition pitchmarks, the spacing of which represents the pitch of a transition audio portion represented by said transition sample sequence;
windowing the carrier speech sample sequence to provide carrier speech short-term sample sequence;
windowing the message-specific speech sample sequence to provide message-specific speech short-term sample sequences; and
mapping the carrier speech short-term sample sequences and the message-specific short-term sample sequences onto said transition pitchmarks to generate the transition sample sequence.
-
-
18. A method as in claim 17 wherein said transition pitchmark providing step involves a linear interpolation between the pitch of the voice message on either side of the transition audio portion.
-
19. A method as in claim 17 wherein said mapping comprises mapping a combination of a carrier speech short-term sample sequence and a message-specific speech short-term sample sequence to one or more of said plurality of transition pitchmarks.
-
20. A method as in claim 14 wherein the transition audio portion is located around the center of a phoneme of the invariable portion, which phoneme is closest to the boundary between the invariable portion and the variable portion of the voice message.
-
21. A method of generating a voice message-representing sample sequence from a text-representing signal, said method comprising:
-
obtaining a leading sample sequence corresponding to a leading portion of said text-representing signal;
obtaining a trailing sample sequence corresponding to a trailing portion of said text-representing signal;
wherein;
said leading sample sequence represents a first voice and said trailing sample sequence represents a second voice, said first voice and said second voice each representing one of variable and invariable speech which are discernibly differing in respect of at least one quality;
at least a major portion of said leading sample sequence represents a first voice message portion; and
at least a major portion of said trailing sample sequence represents a second voice message portion;
said method further comprising the steps of;
generating, on the basis of said leading sample sequence and said trailing sample sequence, a transition sample sequence representing a transition audio portion, which audio portion varies from having an initial pitch similar to that of the end of said first voice message portion to having a final pitch similar to that of the beginning of the second voice message portion; and
concatenating said at least major portion of said leading sample sequence, said transition sample sequence and said at least major portion of said trailing sample sequence in providing said voice message-representing sample sequence.
-
-
22. Apparatus for generating a voice message sample sequence representing a message comprising a variable portion and an invariable portion, said apparatus comprising:
-
means arranged in operation to receive a carrier speech sample sequence representing at least a major part of the invariable portion;
means arranged in operation to receive a message-specific speech sample sequence representing at least the variable portion;
means arranged in operation to generate a transition sample sequence on the basis of said carrier and message-specific sample sequences;
means arranged in operation to form said voice message sample sequence by concatenating one of said carrier sample sequence and said message-specific sample sequence, said transition sample sequence and the other of said carrier and said message-specific sample sequence. - View Dependent Claims (23, 24)
-
-
25. Apparatus for generating voice message data representing a voice message having a variable portion and an invariable portion, said apparatus including:
-
a storage medium having recorded therein processor readable code processable to generate said voice message data, said code comprising;
message-specific speech procurement code processable to procure message-specific data representing said variable portion;
carrier speech retrieval code processable to retrieve carrier speech data from a carrier speech store;
transition data generating code processable to generate, on the basis of said carrier speech data and said message-specific speech data, transition data representing a transition audio portion;
concatenation code processable to form said voice message data by concatenating one of said carrier speech data and said message-specific speech data, said transition data, and the other of said carrier speech data and said message-specific speech data to form said voice message-representing data.
-
-
26. A program storage device readable by a processing apparatus, said device tangibly embodying a program of instructions executable by the processor to perform method steps for:
-
obtaining a carrier speech sample sequence representing at least a major part of the invariable portion;
obtaining a message-specific speech sample sequence representing at least the variable portion;
generating a transition sample sequence on the basis of said carrier and message-specific sample sequence;
forming said voice message sample sequence by concatenating one of said carrier sample sequence and said message-specific sample sequence, said transition sample sequence and the other of said carrier and said message-specific sample sequence.
-
-
27. A method for generating a voice message having an invariable recorded carrier portion and a message-specific variable synthesized portion, said method comprising:
-
concatenating adjacent leading and trailing segments of phoneme-representing speech signal samples for said portions without a discernible pause therebetween in the time domain; and
including in said concatenation transition phoneme-representing speech signal samples distributed across the joined segments to effect a smoothed change in at least one speech quality signal component that would otherwise be discernibly different between said portions to a listener. - View Dependent Claims (28, 29, 30, 31, 32)
said variable synthesized portion has been selected as part of a larger synthesized phrase.
-
-
31. A method as in claim 30 wherein said invariable portion has been selected as part of a larger phrase which includes one possible choice for the variable portion.
-
32. A method as in claim 27 wherein said invariable portion has been selected as part of a larger phrase which includes one possible choice for the variable portion.
Specification