Capture and application of sender voice dynamics to enhance communication in a speech-to-text environment
First Claim
1. A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said method comprising:
- selecting predetermined parameters for recognition and representation of dynamics in human utterances;
creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system. A plurality of predetermined parameters for recognition and representation of dynamics in human utterances are selected. An enhanced human speech recognition software program is created implementing the predetermined parameters on a data processing system. The enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition. The dynamics in a human utterance is captured utilizing the enhanced human speech recognition software. The human utterance is converted into a textual representation utilizing the speech-to-text ability of the software. Finally, the dynamics are merged along with the textual representation of the human utterance to produce a marked-up text document on the data processing system.
-
Citations
24 Claims
-
1. A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said method comprising:
-
selecting predetermined parameters for recognition and representation of dynamics in human utterances;
creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
transmitting said marked-up document along with said textual representation to an output device of said data processing system.
-
-
3. The method of claim 1, further comprising the steps of:
-
converting said marked-up text file into a voice file utilizing a text-to-speech application; and
providing said voice file with said voice dynamics utilizing an output voice synthesizer to an output device of said data processing system.
-
-
4. The method of claim 1, wherein said selecting step further includes the steps of:
-
determining levels for said speech parameters to represent normal speech patterns; and
creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
-
-
5. The method of claim 1, wherein said capturing step captures a plurality of human voice dynamics including tone, emphasis, inflection and volume.
-
6. The method of claim 1, wherein said capturing step further records the voice dynamics concurrently with the textual representation of said human utterance.
-
7. The method of claim 1, whereby said merging step further overlays said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
-
8. The method of claim 1, wherein said merging step utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.
-
9. A system for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said system comprising:
-
means for selecting predetermined parameters for recognition and representation of dynamics in human utterances;
means for creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
means for capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
means for converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
means for merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
means for converting said marked-up text file into a voice file utilizing a text-to-speech application; and
means for providing said voice file with said voice dynamics utilizing an output voice synthesizer to an output device of said data processing system.
-
-
12. The system of claim 9, wherein said selecting means further includes:
-
means for determining levels for said speech parameters to represent normal speech patterns; and
means for creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
-
-
13. The system of claim 9, wherein said capturing means captures a plurality of human voice dynamics including tone, emphasis, inflection and volume.
-
14. The system of claim 9, wherein said capturing means further records the voice dynamics concurrently with the textual representation of said human utterance.
-
15. The system of claim 9, whereby said merging means further overlays said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
-
16. The system of claim 9, wherein said merging means utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.
-
17. A computer program product for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said program product comprising of:
-
a storage medium;
program instructions stored or said storage medium for;
selecting predetermined parameters for recognition and representation of dynamics in human utterances;
creating an enhanced human speech recognition a software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
converting said marked-up text file into a voice file utilizing a text-to-speech application; and
providing said voice file with said voice dynamics utilizing an output voice synthesizer to an output device of said data processing system.
-
-
20. The computer program product of claim 17, wherein said program instructions for said selecting step further includes program instructions for:
-
determining levels for said speech parameters to represent normal speech patterns; and
creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
-
-
21. The computer program product of claim 17, wherein said program instructions for said capturing step includes program instructions for capturing a plurality of human voice dynamics including tone, emphasis, inflection and volume.
-
22. The computer program product of claim 17, wherein said program instructions for said capturing step further permits recording the voice dynamics concurrently with the textual representation of said human utterance.
-
23. The computer program product of claim 17, whereby said program instructions for said merging step further includes program instructions to overlay said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
-
24. The computer program product of claim 17, wherein said program instructions for said merging step utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.
Specification