Speech samples library for text-to-speech and methods and apparatus for generating and using same
First Claim
1. A method for generation of an expressive speech library, comprising:
- recording a first speaker reading a text by a recording device including a non-transitory computer readable medium, wherein the recorded reading is saved in the non-transitory computer readable medium;
analyzing the recorded reading based on a set of predefined musical vectors by identifying at least one physical range of at least one musical parameter used by the first speaker when reading the text;
dividing the at least one identified physical range into a plurality of sub ranges; and
associating each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors;
determining based on the analysis whether at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker;
providing an indication to the first speaker to reread each of the at least one segment of the text;
recording the first speaker reading each of the at least one segment of text; and
including in the expressive speech library at least a recording of the first speaker reading each of the at least one segment of text.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.
-
Citations
21 Claims
-
1. A method for generation of an expressive speech library, comprising:
-
recording a first speaker reading a text by a recording device including a non-transitory computer readable medium, wherein the recorded reading is saved in the non-transitory computer readable medium; analyzing the recorded reading based on a set of predefined musical vectors by identifying at least one physical range of at least one musical parameter used by the first speaker when reading the text;
dividing the at least one identified physical range into a plurality of sub ranges; and
associating each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors;determining based on the analysis whether at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker; providing an indication to the first speaker to reread each of the at least one segment of the text; recording the first speaker reading each of the at least one segment of text; and including in the expressive speech library at least a recording of the first speaker reading each of the at least one segment of text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for generation of an expressive speech library, comprising:
-
an input device for capturing a voice of a first speaker reading a text and at least one segment of text; an analyzer for analyzing the recorded reading of the text based on a set of predefined musical vectors, wherein the analyzer is further configured to determine based on the analysis whether the at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker, wherein the analyzer is further configured to identify at least one physical range of at least one musical parameter used by the first speaker when reading the text;
divide the at least one identified physical range into a plurality of sub ranges; and
associate each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors; andan output device for notifying the first speaker to reread each of the at least one segment of text. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification