Dynamic prosody adjustment for voice-rendering synthesized data
First Claim
Patent Images
1. A computer-implemented method for voice-rendering synthesized data comprising:
- receiving speech from a user including a user instruction associated with a task;
retrieving synthesized data to be voice rendered for responding to said user instruction;
identifying, for the synthesized data to be voice rendered, a particular prosody setting including determining current voice characteristics of the user from said user instruction and selecting the particular prosody setting in dependence upon the current voice characteristics of the user;
retrieving context information including historical context data associated with historical user prosody settings;
determining, in dependence upon the synthesized data to be voice rendered and the context information for the context in which the synthesized data is to be voice rendered, a section of the synthesized data to be rendered including determining the context information for the context in which the synthesized data is to be voice rendered, identifying in dependence upon the context information a section length, and selecting a section of the synthesized data to be rendered in dependence upon the identified section length;
wherein identifying in dependence upon the context information a section length further comprises;
identifying in dependence upon the context information a rendering time; and
determining a section length to be rendered in dependence upon the prosody settings and the rendering time;
rendering the section of the synthesized data in dependence upon the identified particular prosody setting to provide a response to said user instruction.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.
390 Citations
21 Claims
-
1. A computer-implemented method for voice-rendering synthesized data comprising:
-
receiving speech from a user including a user instruction associated with a task; retrieving synthesized data to be voice rendered for responding to said user instruction; identifying, for the synthesized data to be voice rendered, a particular prosody setting including determining current voice characteristics of the user from said user instruction and selecting the particular prosody setting in dependence upon the current voice characteristics of the user; retrieving context information including historical context data associated with historical user prosody settings; determining, in dependence upon the synthesized data to be voice rendered and the context information for the context in which the synthesized data is to be voice rendered, a section of the synthesized data to be rendered including determining the context information for the context in which the synthesized data is to be voice rendered, identifying in dependence upon the context information a section length, and selecting a section of the synthesized data to be rendered in dependence upon the identified section length; wherein identifying in dependence upon the context information a section length further comprises; identifying in dependence upon the context information a rendering time; and determining a section length to be rendered in dependence upon the prosody settings and the rendering time; rendering the section of the synthesized data in dependence upon the identified particular prosody setting to provide a response to said user instruction. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for voice-rendering synthesized data comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions capable of:
-
receiving speech from a user including a user instruction associated with a task; retrieving synthesized data to be voice rendered for responding to said user instruction; identifying, for the synthesized data to be voice rendered, a particular prosody setting including determining current voice characteristics of the user from said user instruction and selecting the particular prosody setting in dependence upon the current voice characteristics of the user; retrieving context information including historical context data associated with historical user prosody settings; determining, in dependence upon the synthesized data to be voice rendered and the context information for the context in which the synthesized data is to be voice rendered, a section of the synthesized data to be rendered including determining the context information for the context in which the synthesized data is to be voice rendered, identifying in dependence upon the context information a section length, and selecting a section of the synthesized data to be rendered in dependence upon the identified section length; wherein identifying in dependence upon the context information a section length further comprises; identifying in dependence upon the context information a rendering time; and determining a section length to be rendered in dependence upon the prosody settings and the rendering time; rendering the section of the synthesized data in dependence upon the identified particular prosody setting to provide a response to said user instruction. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for voice-rendering synthesized data, the computer program product disposed on a computer readable recording medium, the computer program product comprising computer program instructions capable of:
-
receiving speech from a user including a user instruction associated with a task; retrieving synthesized data to be voice rendered for responding to said user instruction; identifying, for the synthesized data to be voice rendered, a particular prosody setting including determining current voice characteristics of the user from said user instruction and selecting the particular prosody setting in dependence upon the current voice characteristics of the user; retrieving context information including historical context data associated with historical user prosody settings; determining, in dependence upon the synthesized data to be voice rendered and the context information for the context in which the synthesized data is to be voice rendered, a section of the synthesized data to be rendered including determining the context information for the context in which the synthesized data is to be voice rendered, identifying in dependence upon the context information a section length, and selecting a section of the synthesized data to be rendered in dependence upon the identified section length; wherein identifying in dependence upon the context information a section length further comprises; identifying in dependence upon the context information a rendering time; and determining a section length to be rendered in dependence upon the prosody settings and the rendering time; rendering the section of the synthesized data in dependence upon the identified particular prosody setting to provide a response to said user instruction. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification