USING NON-SPEECH SOUNDS DURING TEXT-TO-SPEECH SYNTHESIS
First Claim
1. A method, including:
- augmenting a portion of synthesized speech with a non-speech sound other than silence, the augmentation based on characteristics of the synthesized speech.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, apparatus, methods and computer program products are described for producing text-to-speech synthesis with non-speech sounds. In general, some of the pauses or silences that would otherwise be generated in synthesized speech are instead synthesized as non-speech sounds such as breaths. Non-speech sounds can be identified from pre-recorded speech that can include meta-data such as the grammatical and phrasal structure of words and sounds that precede and succeed non-speech sounds. A non-speech sound can be selected for use in synthesized speech based on the words, punctuation, grammatical and phrasal structure of text from which the speech is being synthesized, or other characteristics.
-
Citations
27 Claims
-
1. A method, including:
augmenting a portion of synthesized speech with a non-speech sound other than silence, the augmentation based on characteristics of the synthesized speech. - View Dependent Claims (2, 3, 4)
-
5. A method, including:
-
identifying a non-speech unit in a received input string, the non-speech unit not having an associated specific textual reference in the input string; matching the non-speech unit to an audio segment, the audio segment a voice sample of a non-speech sound; and synthesizing the input string, including combining the audio segments matched with the non-speech unit. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method, including:
-
receiving audio segments; parsing the audio segments into speech units and non-speech units; defining properties of or between speech units and non-speech units; and storing the units and the properties. - View Dependent Claims (19, 20, 21)
-
-
22. A computer program product, encoded on a computer-readable medium, operable to cause a data processing apparatus to:
Augmenting a portion of synthesized speech with a non-speech sound other than silence, the augmentation based on characteristics of the synthesized speech.
-
23. A computer program product, encoded on a computer-readable medium, operable to cause a data processing apparatus to:
-
identifying a non-speech unit in a received input string, the non-speech unit not having an associated specific textual reference in the input string; matching the non-speech unit to an audio segment, the audio segment a voice sample of a non-speech sounds; and synthesizing the input string, including combining the audio segments matched with the non-speech unit.
-
-
24. A computer program product, encoded on a computer-readable medium, operable to cause a data processing apparatus to:
-
receiving audio segments; parsing the audio segments into speech units and non-speech units; defining properties of or between speech units and non-speech units; and storing the units and the properties.
-
-
25. A system comprising:
augmenting a portion of synthesized speech with the a non-speech sounds other than silence, the augmentation based on characteristics of the synthesized speech
-
26. A system comprising:
-
means for identifying a non-speech unit in a received input string, the non-speech unit not having an associated specific textual reference in the input string; means for matching the non-speech unit to an audio segment, the audio segment a voice sample of a non-speech sounds; and means for synthesizing the input string, including combining the audio segments matched with the non-speech unit.
-
-
27. A system comprising:
-
means for receiving audio segments; means for parsing the audio segments into speech units and non-speech units; means for defining properties of or between speech units and non-speech units; and means for storing the units and the properties.
-
Specification