Speech information processing apparatus and method

US 8,160,882 B2
Filed: 01/23/2009
Issued: 04/17/2012
Est. Priority Date: 01/23/2008
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for processing speech information, comprising:

a first generation unit configured to generate a temporary child set to which at least one fundamental frequency pattern belongs by classifying a plurality of fundamental frequency patterns of inputted speech data based on a classification item of a context of the inputted speech data;

a first decision unit configured to decide a length of a temporary typical pattern of the temporary child set;

a model pattern setting unit configured to set a model pattern having an elastic section along a temporal direction;

a calculation unit configured to calculate an elastic ratio of the elastic section so that the length of the temporal typical pattern coincides with a length of the model pattern;

an elastic unit configured to expand or contract the elastic section of the model pattern based on the elastic ratio;

a second generation unit configured to generate the temporary typical pattern of the temporary child set by combining the fundamental frequency pattern belonging to the temporary child set with the model pattern having an elastic pattern expanded or contracted;

a second decision unit configured to calculate a distortion between the temporary typical pattern of the temporary child set and the fundamental frequency pattern belonging to the temporary child set, and to decide a child set as the temporary child set when the distortion is below a threshold;

a pattern storage unit configured to store a typical pattern as the temporary typical pattern of the child set; and

a rule storage unit configured to store a classification rule of the typical pattern as the classification item of the context of the fundamental frequency pattern belonging to the child set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A temporary child set is generated. An elastic ratio of an elastic section of a model pattern is calculated. A temporary typical pattern of the set is generated by combining the pattern belonging to the set with the model pattern having the elastic pattern expanded or contracted. A distortion between the temporary typical pattern of the set and the pattern belonging to the set is calculated, and a child set is determined as the set when the distortion is below a threshold. A typical pattern as the temporary typical pattern of the child set is stored with a classification rule as the classification item of the context of the pattern belonging to the child set.

7 Citations

View as Search Results

14 Claims

1. An apparatus for processing speech information, comprising:
- a first generation unit configured to generate a temporary child set to which at least one fundamental frequency pattern belongs by classifying a plurality of fundamental frequency patterns of inputted speech data based on a classification item of a context of the inputted speech data;
  
  a first decision unit configured to decide a length of a temporary typical pattern of the temporary child set;
  
  a model pattern setting unit configured to set a model pattern having an elastic section along a temporal direction;
  
  a calculation unit configured to calculate an elastic ratio of the elastic section so that the length of the temporal typical pattern coincides with a length of the model pattern;
  
  an elastic unit configured to expand or contract the elastic section of the model pattern based on the elastic ratio;
  
  a second generation unit configured to generate the temporary typical pattern of the temporary child set by combining the fundamental frequency pattern belonging to the temporary child set with the model pattern having an elastic pattern expanded or contracted;
  
  a second decision unit configured to calculate a distortion between the temporary typical pattern of the temporary child set and the fundamental frequency pattern belonging to the temporary child set, and to decide a child set as the temporary child set when the distortion is below a threshold;
  
  a pattern storage unit configured to store a typical pattern as the temporary typical pattern of the child set; and
  
  a rule storage unit configured to store a classification rule of the typical pattern as the classification item of the context of the fundamental frequency pattern belonging to the child set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The apparatus according to claim 1, whereinthe model pattern setting unit sets the model pattern using a fundamental frequency pattern of a parent set or an ancestor set of the temporary child set.
  - 3. The apparatus according to claim 1, whereinthe calculation unit calculates a series of elastic ratios by monotonously increasing the elastic section after monotonously decreasing or by monotonously decreasing the elastic section after monotonously increasing.
  - 4. The apparatus according to claim 1, whereina start point of the elastic section is a phoneme of accent nucleus, a phoneme of succeeding accent nucleus, or a phoneme of second succeeding accent nucleus, andan end point of the elastic section is a phoneme of end point of a prosodic control unit, a phoneme of preceding end point of the prosodic control unit, or a phoneme of second preceding end point of the prosodic control unit.
  - 5. The apparatus according to claim 1, further comprisinga selection unit configured to select the typical pattern having the classification rule matched with a content of each prosodic control unit inputted.
  - 6. The apparatus according to claim 1, whereinthe second generation unit generates the temporary typical pattern by calculating at least one of an average value, a variance value, and a standard deviation value at each time series point of the fundamental frequency patterns belonging to the temporary child set.
  - 7. The apparatus according to claim 1, whereinthe second generation unit generates the temporary typical pattern by(1) averaging the fundamental frequency patterns based on the model pattern as a bias,(2) calculating a variance value of the fundamental frequency patterns based on the model pattern as a bias,(3) maximizing or a minimizing the distortion of the fundamental frequency patterns based on the model pattern as a bias, or(4) quasi-maximizing the distortion of the fundamental frequency patterns based on the model pattern as a bias.
  - 8. The apparatus according to claim 1, whereinthe second decision unit calculates the distortion as(1) a sum of squared error value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(2) a sum of weighted squared error value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(3) a maximum of squared error value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(4) a sum of variance value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(5) a sum of weighted variance value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(6) a maximum of variance value at each time series point of the fundamental frequency pattern and the temporary typical pattern,(7) a sum of logarithm likelihood at each time series point of the fundamental frequency pattern and the temporary typical pattern,(8) a sum of weighted logarithm likelihood at each time series point of the fundamental frequency pattern and the temporary typical pattern, or(9) a maximum of logarithm likelihood at each time series point of the fundamental frequency pattern and the temporary typical pattern.
  - 9. The apparatus according to claim 1, further comprisinga prosodic control unit using at least one of a sentence, a breath group, an accent phase, a morpheme, a word, a mora, a syllable, a phoneme, a half-mora, and a unit divided from one phoneme by HMM.
  - 10. The apparatus according to claim 1, whereinthe classification item of the context is at least one of linguistic information of a prosodic control unit by analyzing a text, and an arbitrary attribute.
  - 11. The apparatus according to claim 10, whereinthe attribute is at least one of prominence information, utterance style information, intension information of a query, a conclusion or a emphasis, and mental attitude information of a suspicion, an interest, a discouragement or an admiration.
  - 12. The apparatus according to claim 4, whereinthe phoneme is at least one of a mora, a syllable, a phoneme, a half-mora, and a unit divided from one phoneme by HMM.

13. A method for processing speech information, comprising:
- generating a temporary child set to which at least one fundamental frequency pattern belongs by classifying a plurality of fundamental frequency patterns of inputted speech data based on a classification item of a context of the inputted speech data;
  
  deciding a length of a temporary typical pattern of the temporary child set;
  
  setting a model pattern having an elastic section along a temporal direction;
  
  calculating an elastic ratio of the elastic section so that the length of the temporal typical pattern coincides with a length of the model pattern;
  
  expanding or contracting the elastic section of the model pattern based on the elastic ratio;
  
  generating the temporary typical pattern of the temporary child set by combining the fundamental frequency pattern belonging to the temporary child set with the model pattern having an elastic pattern expanded or contracted;
  
  calculating a distortion between the temporary typical pattern of the temporary child set and the fundamental frequency pattern belonging to the temporary child set;
  
  deciding a child set as the temporary child set when the distortion is below a threshold;
  
  storing a typical pattern as the temporary typical pattern of the child set; and
  
  storing a classification rule of the typical pattern as the classification item of the context of the fundamental frequency pattern belonging to the child set.

14. A non-transitory computer readable medium that stores computer executable instructions for causing a computer to perform a method for processing speech information, the method comprising:
- generating a temporary child set to which at least one fundamental frequency pattern belongs by classifying a plurality of fundamental frequency patterns of inputted speech data based on a classification item of a context of the inputted speech data;
  
  deciding a length of a temporary typical pattern of the temporary child set;
  
  setting a model pattern having an elastic section along a temporal direction;
  
  calculating an elastic ratio of the elastic section so that the length of the temporal typical pattern coincides with a length of the model pattern;
  
  expanding or contracting the elastic section of the model pattern based on the elastic ratio;
  
  generating the temporary typical pattern of the temporary child set by combining the fundamental frequency pattern belonging to the temporary child set with the model pattern having an elastic pattern expanded or contracted;
  
  calculating a distortion between the temporary typical pattern of the temporary child set and the fundamental frequency pattern belonging to the temporary child set;
  
  deciding a child set as the temporary child set when the distortion is below a threshold;
  
  storing a typical pattern as the temporary typical pattern of the child set; and
  
  storing a classification rule of the typical pattern as the classification item of the context of the fundamental frequency pattern belonging to the child set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Mizutani, Nobuaki
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US12/358,660
Publication Number

US 20090187408A1
Time in Patent Office

1,180 Days
Field of Search

704/266
US Class Current

704/266
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 25/15 the extracted parameters be...

Speech information processing apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Speech information processing apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links