High quality speech reconstruction for a dialog method and system
First Claim
1. A method for speech dialog, comprising:
- receiving an input speech phrase that includes an instantiated variable;
extracting pitch and voicing characteristics for the instantiated variable;
performing voice recognition of the instantiated variable to determine a most likely set of recognition acoustic states;
converting the most likely set of recognition acoustic states to a most likely set of synthesis acoustic states; and
generating a synthesized value of the instantiated variable using the most likely set of synthesis acoustic states and the extracted pitch and voicing characteristics.
1 Assignment
0 Petitions
Accused Products
Abstract
An electronic device (400) for speech dialog includes functions that receive (405, 205) a speech phrase that includes an instantiated variable (315), generate pitch and voicing characteristics (330) of the instantiated variable, and performs voice recognition (410, 220) of the instantiated variable to determine a most likely set of recognition acoustic states (335). A trained map (358) is established (115) that maps recognition feature vectors derived from training speech (105) to synthesis feature vectors derived from the same training speech (110). Recognition feature vectors that represent the most likely set of recognition acoustic states for the recognized instantiated variable are converted to a most likely set of synthesis acoustic states (420) in accordance with the map. The electronic device may generate (421, 440, 445) a synthesized value of the instantiated variable using the most likely set of synthesis acoustic states and the pitch and voicing characteristics extracted from the instantiated variable.
31 Citations
20 Claims
-
1. A method for speech dialog, comprising:
-
receiving an input speech phrase that includes an instantiated variable;
extracting pitch and voicing characteristics for the instantiated variable;
performing voice recognition of the instantiated variable to determine a most likely set of recognition acoustic states;
converting the most likely set of recognition acoustic states to a most likely set of synthesis acoustic states; and
generating a synthesized value of the instantiated variable using the most likely set of synthesis acoustic states and the extracted pitch and voicing characteristics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An electronic device for speech dialog, comprising:
-
means for receiving an input speech phrase that includes an instantiated variable;
means for extracting pitch and voicing characteristics for the instantiated variable;
means for performing voice recognition of the instantiated variable to determine a most likely set of recognition acoustic states;
means for converting the most likely set of recognition acoustic states to a most likely set of synthesis acoustic states; and
means for generating a synthesized value of the instantiated variable using the most likely set of synthesis acoustic states and the extracted pitch and voicing characteristics. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A media that includes a set of stored program instructions, comprising:
-
a function for receiving an input speech phrase that includes an instantiated variable;
a function for extracting pitch and voicing characteristics for the instantiated variable;
a function for performing voice recognition of the instantiated variable to determine a most likely set of recognition acoustic states;
a function for converting the most likely set of recognition acoustic states to a most likely set of synthesis acoustic states; and
a function for generating a synthesized value of the instantiated variable using the most likely set of synthesis acoustic states and the extracted pitch and voicing characteristics. - View Dependent Claims (19, 20)
-
Specification