System and method for supporting text-to-speech

US 7,921,014 B2
Filed: 07/09/2007
Issued: 04/05/2011
Est. Priority Date: 08/21/2006
Status: Active Grant

First Claim

Patent Images

1. A method of supporting text-to-speech synthesis, the method comprising:

acquiring first frequency data set in a language processing unit, the first frequency data indicating appearance frequencies of readings corresponding to text wordings;

recognizing speech produced by a user reading a learning text;

generating first learning data by associating recognized readings from the speech with portions of the learning text, or by recognizing both wordings and readings of phrases from the speech;

generating, based on the first learning data, second frequency data indicating appearance frequencies of readings corresponding to wordings of phrases from the speech;

generating a plurality of frequency data candidates, each frequency data candidate indicating, for at least one combination of a plurality of continuously-written phrases, an appearance frequency of at least one combination of readings, the appearance frequency of the at least one combination of readings comprising a weighted average of an appearance frequency of the at least one combination of readings from the first frequency data with an appearance frequency of the at least one combination of readings from the second frequency data, wherein each of the plurality of frequency data candidates uses different weights for the weighted average;

for each one of the plurality of frequency data candidates using different weights for the weighted average, using the language processing unit to generate a set of readings corresponding to the learning text using the one of the plurality of frequency data candidates, wherein the set of readings comprises a subset of readings that match readings of the first learning data, and calculating a ratio of the subset of readings to the set of readings, wherein a first frequency data candidate of the plurality of frequency data candidates has a highest calculated ratio;

updating frequency data in the language processing unit using the first frequency data candidate with the highest calculated ratio; and

setting the updated frequency data in the language processing unit.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.

Citations

1 Claim

1. A method of supporting text-to-speech synthesis, the method comprising:
- acquiring first frequency data set in a language processing unit, the first frequency data indicating appearance frequencies of readings corresponding to text wordings;
  
  recognizing speech produced by a user reading a learning text;
  
  generating first learning data by associating recognized readings from the speech with portions of the learning text, or by recognizing both wordings and readings of phrases from the speech;
  
  generating, based on the first learning data, second frequency data indicating appearance frequencies of readings corresponding to wordings of phrases from the speech;
  
  generating a plurality of frequency data candidates, each frequency data candidate indicating, for at least one combination of a plurality of continuously-written phrases, an appearance frequency of at least one combination of readings, the appearance frequency of the at least one combination of readings comprising a weighted average of an appearance frequency of the at least one combination of readings from the first frequency data with an appearance frequency of the at least one combination of readings from the second frequency data, wherein each of the plurality of frequency data candidates uses different weights for the weighted average;
  
  for each one of the plurality of frequency data candidates using different weights for the weighted average, using the language processing unit to generate a set of readings corresponding to the learning text using the one of the plurality of frequency data candidates, wherein the set of readings comprises a subset of readings that match readings of the first learning data, and calculating a ratio of the subset of readings to the set of readings, wherein a first frequency data candidate of the plurality of frequency data candidates has a highest calculated ratio;
  
  updating frequency data in the language processing unit using the first frequency data candidate with the highest calculated ratio; and
  
  setting the updated frequency data in the language processing unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kurata, Gakuto, Tachibana, Ryuki, Nagano, Toru, Nishimura, Masafumi
Primary Examiner(s)
Armstrong; Angela A

Application Number

US11/774,798
Publication Number

US 20080046247A1
Time in Patent Office

1,366 Days
Field of Search

704/260, 704/E13.001, 704/E13.005
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 15/26 Speech to text systems G10L...

System and method for supporting text-to-speech

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

1 Claim

Specification

Solutions

Use Cases

Quick Links

System and method for supporting text-to-speech

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

1 Claim

Specification

Subscription Required

Solutions

Use Cases

Quick Links