Speech samples library for text-to-speech and methods and apparatus for generating and using same

US 8,340,967 B2
Filed: 03/19/2008
Issued: 12/25/2012
Est. Priority Date: 03/21/2007
Status: Active Grant

First Claim

Patent Images

1. A method for generation of an expressive speech library, comprising:

recording a first speaker reading a text by a recording device including a non-transitory computer readable medium, wherein the recorded reading is saved in the non-transitory computer readable medium;

analyzing the recorded reading based on a set of predefined musical vectors by identifying at least one physical range of at least one musical parameter used by the first speaker when reading the text;

dividing the at least one identified physical range into a plurality of sub ranges; and

associating each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors;

determining based on the analysis whether at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker;

providing an indication to the first speaker to reread each of the at least one segment of the text;

recording the first speaker reading each of the at least one segment of text; and

including in the expressive speech library at least a recording of the first speaker reading each of the at least one segment of text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.

Citations

21 Claims

1. A method for generation of an expressive speech library, comprising:
- recording a first speaker reading a text by a recording device including a non-transitory computer readable medium, wherein the recorded reading is saved in the non-transitory computer readable medium;
  
  analyzing the recorded reading based on a set of predefined musical vectors by identifying at least one physical range of at least one musical parameter used by the first speaker when reading the text;
  
  dividing the at least one identified physical range into a plurality of sub ranges; and
  
  associating each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors;
  
  determining based on the analysis whether at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker;
  
  providing an indication to the first speaker to reread each of the at least one segment of the text;
  
  recording the first speaker reading each of the at least one segment of text; and
  
  including in the expressive speech library at least a recording of the first speaker reading each of the at least one segment of text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - determining whether a rerecorded segment of the text is to be rerecorded.
  - 3. The method of claim 1, wherein at least a portion of the set of predefined musical vectors is generated responsive of a prerecording of the text.
  - 4. The method of claim 1, wherein the text is prerecorded by at least one of:
    - the first speaker, and a second speaker.
  - 5. The method of claim 1, wherein the at least one musical parameter is any one of:
    - a pitch curve, a pitch perception, duration, and a volume.
  - 6. The method of claim 5, wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.
  - 7. The method of claim 1, wherein determining based on the analysis whether the at least one segment of text is to be reread by the first speaker further comprising:
    - determining the usefulness of each of the at least one segment of text.
  - 8. The method of claim 7, wherein the usefulness of the at least one segment of text is determined based, in part, on a number of vowels included in the at least one segment of text.
  - 9. The method of claim 7, further comprising:
    - selecting at least one word from the at least portion of the recorded text based on at least one musical vector that appears in the word; and
      
      providing the at least one segment of text for the first speaker to record the at least one musical vector that appears in the at least one selected word.
  - 10. The method of claim 9, wherein the at least one segment of text comprises at least any one of a word, a string of words, and a sentence with at least one of phonemes and phonemic context not contained in the selected at least one word.
  - 11. A computer software product embedded in a non-transient computer readable medium containing instructions that when executed on the computer perform the method of claim 1.

12. A system for generation of an expressive speech library, comprising:
- an input device for capturing a voice of a first speaker reading a text and at least one segment of text;
  
  an analyzer for analyzing the recorded reading of the text based on a set of predefined musical vectors, wherein the analyzer is further configured to determine based on the analysis whether the at least one segment of text corresponding to at least a portion of the recorded text is to be reread by the first speaker, wherein the analyzer is further configured to identify at least one physical range of at least one musical parameter used by the first speaker when reading the text;
  
  divide the at least one identified physical range into a plurality of sub ranges; and
  
  associate each sub range of the plurality of sub ranges with a different value of at least one of the musical vectors of the set of predefined musical vectors; and
  
  an output device for notifying the first speaker to reread each of the at least one segment of text.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The system of claim 12, wherein the analyzer is further configured to determine whether a rerecorded segment of the text is to be rerecorded.
  - 14. The system of claim 12, wherein at least a portion of the set of predefined musical vectors is generated responsive of a prerecording of the text.
  - 15. The system of claim 12, wherein the text is prerecorded by at least one of:
    - the first speaker, and a second speaker.
  - 16. The system of claim 12, wherein the at least one musical parameter is any one of:
    - a pitch curve, a pitch perception, duration, and a volume.
  - 17. The system of claim 16, wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.
  - 18. The system of claim 12, wherein the analyzer is further configured to determine the usefulness of each of the at least one segment of text.
  - 19. The system of claim 18, wherein the usefulness of the at least one segment of text is determined based, in part, on a number of vowels included in the at least one segment of text.
  - 20. The system of claim 18, wherein the analyzer is further configured to:
    - select at least one word from the at least portion of the recorded text based on at least one musical vector that appears in the word; and
      
      provide the at least one segment of text for the first speaker to record the at least one musical vector that appears in the at least one selected word.
  - 21. The system of claim 20, wherein the at least one segment of text comprises at least any one of a word, a string of words, and a sentence with at least one of phonemes and phonemic context not contained in the selected at least one word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
OSR Enterprises AG
Original Assignee
VivoText Ltd.
Inventors
Silbert, Gershon, Hakim, Andres
Primary Examiner(s)
Godbold, Douglas

Application Number

US12/532,170
Publication Number

US 20100131267A1
Time in Patent Office

1,742 Days
Field of Search

704258-269
US Class Current

704/267
CPC Class Codes

G10L 13/06   Elementary speech units use...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

Speech samples library for text-to-speech and methods and apparatus for generating and using same

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech samples library for text-to-speech and methods and apparatus for generating and using same

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links