HYBRID COMPRESSION OF TEXT-TO-SPEECH VOICE DATA
First Claim
1. A system comprising:
- one or more processors;
a computer-readable memory; and
a module comprising executable instructions stored in the computer-readable memory, the module, when executed by the one or more processors, configured to;
obtain a voice recording and a corresponding sequence of speech units;
select a first speech segment, wherein the first speech segment corresponds to a portion of the voice recording and wherein the first speech segment corresponds to a first speech unit;
apply a first compression technique to the first speech segment to create a first compressed speech segment, wherein the first compression technique comprises one of time domain compression or perceptual compression;
apply a second compression technique to the first compressed speech segment to create a second compressed speech segment, wherein the second compression technique comprises one of time domain compression or perceptual compression, and wherein the second compression technique is different from the first compression technique;
distribute the second compressed speech segment to a client computing device for use in a text-to-speech system.
2 Assignments
0 Petitions
Accused Products
Abstract
Recorded or synthesized speech segments of text-to-speech (TTS) systems may be compressed though the use of both time domain compression and perceptual compression techniques. The twice-compressed recording may be separated into speech segments corresponding to words or subword units for use in a TTS system. The compression rate of time domain compression, and the ratio of time domain compression to perceptual compression, may be modified for any speech segment. The compression amount or ratio may be determined based on linguistic or acoustic features of the word or subword unit that the speech segment represents. Differing compression amounts and ratios may be applied to portions of a single speech segment.
31 Citations
26 Claims
-
1. A system comprising:
-
one or more processors; a computer-readable memory; and a module comprising executable instructions stored in the computer-readable memory, the module, when executed by the one or more processors, configured to; obtain a voice recording and a corresponding sequence of speech units; select a first speech segment, wherein the first speech segment corresponds to a portion of the voice recording and wherein the first speech segment corresponds to a first speech unit; apply a first compression technique to the first speech segment to create a first compressed speech segment, wherein the first compression technique comprises one of time domain compression or perceptual compression; apply a second compression technique to the first compressed speech segment to create a second compressed speech segment, wherein the second compression technique comprises one of time domain compression or perceptual compression, and wherein the second compression technique is different from the first compression technique; distribute the second compressed speech segment to a client computing device for use in a text-to-speech system. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
-
applying, by a text-to-speech voice development system comprising one or more computing devices, a first compression technique to a portion of a voice recording to create a first compressed portion; and applying, by the voice development system, a second compression technique to the first compressed portion to create a second compressed portion; wherein the second compression technique is different from the first compression technique, and wherein at least one of the first compression technique or the second compression technique comprises time-domain compression. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable medium which stores a text-to-speech component comprising executable code that directs a client computing device to perform a process comprising:
-
receiving text comprising a sequence of words; and assembling an audio presentation corresponding to the text, the audio presentation comprising a sequence of speech segments, wherein the sequence of speech segments is based at least in part on the sequence of words, and wherein assembling the audio presentation comprises; retrieving a first compressed speech segment; applying two decompression techniques to the first compressed speech segment to obtain a first speech segment; retrieving a second compressed speech segment; applying two decompression techniques to the second compressed speech segment to obtain a second speech segment; concatenating the first speech segment and the second speech segment. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
Specification