Speech processing method and apparatus and program therefor
First Claim
Patent Images
1. A speech processing method for deciding emphasized portion based on a set of speech parameters for each frame, comprising the steps of:
- (a) obtaining an emphasized-state appearance probability for a speech parameter vector, which is a quantized set of speech parameters for a current frame by using a codebook which stores, for each code, a speech parameter vector and an emphasized-state appearance probability, each of said speech parameter vectors including at least one of a fundamental frequency, power and a temporal variation of dynamic-measure and/or an inter-frame difference in at least one of those parameters;
(b) calculating an emphasized-state likelihood based on said emphasized-state appearance probability; and
(c) deciding whether a portion including said current frame is emphasized or not based on said calculated emphasized-state likelihood.
1 Assignment
0 Petitions
Accused Products
Abstract
A scheme to judge emphasized speech portions, wherein the judgment is executed by a statistical processing in terms of a set of speech parameters including a fundamental frequency, power and a temporal variation of a dynamic measure and/or their derivatives. The emphasized speech portions are used for clues to summarize an audio content or a video content with a speech.
42 Citations
26 Claims
-
1. A speech processing method for deciding emphasized portion based on a set of speech parameters for each frame, comprising the steps of:
-
(a) obtaining an emphasized-state appearance probability for a speech parameter vector, which is a quantized set of speech parameters for a current frame by using a codebook which stores, for each code, a speech parameter vector and an emphasized-state appearance probability, each of said speech parameter vectors including at least one of a fundamental frequency, power and a temporal variation of dynamic-measure and/or an inter-frame difference in at least one of those parameters;
(b) calculating an emphasized-state likelihood based on said emphasized-state appearance probability; and
(c) deciding whether a portion including said current frame is emphasized or not based on said calculated emphasized-state likelihood. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A speech processing apparatus for deciding whether input speech is emphasized or not based on a set of speech parameters for each frame of said input speech, said apparatus comprising:
-
a codebook which stores, for each code, a speech parameter vector and an emphasized-state appearance probability, each of said speech parameter vectors including at least a fundamental frequency, a power and a temporal variation of a dynamic-measure or an inter-frame difference in each of the parameters;
an emphasized-state likelihood calculating part for calculating an emphasized-state likelihood of a portion including a current frame based on said emphasized-state appearance probability; and
an emphasized state deciding part for deciding whether said portion including said current frame is emphasized or not based on said calculated emphasized-state likelihood. - View Dependent Claims (23, 24, 25, 26)
-
Specification