Speech samples library for text-to-speech and methods and apparatus for generating and using same

US 8,775,185 B2
Filed: 11/27/2012
Issued: 07/08/2014
Est. Priority Date: 03/21/2007
Status: Active Grant

First Claim

Patent Images

1. A method for converting text into speech with a speech sample library, comprising:

providing an input text;

converting the input text to a sequence of triphones;

retrieving phonemic contexts of the sequence of triphones;

determining musical parameters characterizing each phoneme in the sequence of triphones;

predicting a set of numerical targets for the determined musical parameters, wherein the set of numerical targets is provided for each of the musical parameters;

detecting, in the speech sample library, pre-stored speech segments having at least the determined musical parameters of each phoneme in the sequence of triphones based on the phonemic contexts and the predicted set of numerical targets for the determined musical parameters which lie within a range of musical parameters of the pre-stored speech segments, wherein the detection of the pre-stored speech segments further includes searching the speech sample library for at least one of a central phoneme, phonemic context, and a musical index indicating at least one range of at least one of the musical parameters within which at least one of the numerical targets lies; and

concatenating the detected speech segments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for converting translating text into speech with a speech sample library is provided. The method comprises converting translating an input text to a sequence of triphones; determining musical parameters of each phoneme in the sequence of triphones; detecting, in the speech sample library, speech segments having at least the determined musical parameters; and concatenating the detected speech segments.

Citations

14 Claims

1. A method for converting text into speech with a speech sample library, comprising:
- providing an input text;
  
  converting the input text to a sequence of triphones;
  
  retrieving phonemic contexts of the sequence of triphones;
  
  determining musical parameters characterizing each phoneme in the sequence of triphones;
  
  predicting a set of numerical targets for the determined musical parameters, wherein the set of numerical targets is provided for each of the musical parameters;
  
  detecting, in the speech sample library, pre-stored speech segments having at least the determined musical parameters of each phoneme in the sequence of triphones based on the phonemic contexts and the predicted set of numerical targets for the determined musical parameters which lie within a range of musical parameters of the pre-stored speech segments, wherein the detection of the pre-stored speech segments further includes searching the speech sample library for at least one of a central phoneme, phonemic context, and a musical index indicating at least one range of at least one of the musical parameters within which at least one of the numerical targets lies; and
  
  concatenating the detected speech segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - adjusting the musical parameters of detected speech segments prior to concatenating the detected speech segments.
  - 3. The method of claim 1, wherein the at least one musical parameter is any one of:
    - a pitch curve, a pitch perception, duration, and a volume.
  - 4. The method of claim 3, wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.
  - 5. The method of claim 1, wherein the sequence of triphones includes overlapping triphones.
  - 6. The method of claim 1, wherein each of the detected speech segments comprises at least any one of:
    - a word, a string of words, and a sentence.
  - 7. A computer software product embedded in a non-transient computer readable medium containing instructions that when executed on the computer perform the method of claim 1.

8. An apparatus for converting text into speech with a speech sample library, comprising:
- an input unit for providing an input text;
  
  a parser for converting the text into a sequence of speech segments;
  
  a prosody predictor for predicting musical parameters of each phoneme in the sequence of triphones and a set of numerical targets for each of the predicted musical parameters of each phoneme in the sequence of triphones based on phonemic contexts and the set of numerical targets for the determined musical parameters which lie within a range of musical parameters of the pre-stored speech segments, wherein the set of numerical targets is provided for each of the musical parameters; and
  
  a search module for detecting, in the speech sample library, pre-stored speech segments having at least the determined musical parameter, wherein the search module is further configured to search in the speech sample library for at least one of a central phoneme, phonemic context, and a musical index indicating at least one range of at least one of the musical parameters within which at least of the numerical targets lies.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, further comprises:
    - a processing unit for adjusting the musical parameters of the detected speech segments prior to concatenating the detected speech segments.
  - 10. The apparatus of claim 8, wherein the at least one musical parameter is any one of:
    - a pitch curve, a pitch perception, duration, and a volume.
  - 11. The apparatus of claim 10, wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.
  - 12. The apparatus of claim 8, wherein the sequence of triphones includes overlapping triphones.
  - 13. The apparatus of claim 8, wherein each of the detected speech segments comprises at least any one of:
    - a word, a string of words, and a sentence.
  - 14. The apparatus of claim 8, wherein the speech sample library includes a plurality of recordings, each of the recordings includes a central phoneme pronounced with at least one musical parameter and in a phonemic context.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
OSR Enterprises AG
Original Assignee
VivoText Ltd.
Inventors
Hakim, Andres, Silbert, Gershon
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/686,140
Publication Number

US 20130085759A1
Time in Patent Office

588 Days
Field of Search

704258-260
US Class Current

704/267
CPC Class Codes

G10L 13/06   Elementary speech units use...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

Speech samples library for text-to-speech and methods and apparatus for generating and using same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Speech samples library for text-to-speech and methods and apparatus for generating and using same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links