Capture and application of sender voice dynamics to enhance communication in a speech-to-text environment

US 6,175,820 B1
Filed: 01/28/1999
Issued: 01/16/2001
Est. Priority Date: 01/28/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said method comprising:

selecting predetermined parameters for recognition and representation of dynamics in human utterances;

creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;

capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;

converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and

merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system. A plurality of predetermined parameters for recognition and representation of dynamics in human utterances are selected. An enhanced human speech recognition software program is created implementing the predetermined parameters on a data processing system. The enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition. The dynamics in a human utterance is captured utilizing the enhanced human speech recognition software. The human utterance is converted into a textual representation utilizing the speech-to-text ability of the software. Finally, the dynamics are merged along with the textual representation of the human utterance to produce a marked-up text document on the data processing system.

Citations

24 Claims

1. A method for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said method comprising:
- selecting predetermined parameters for recognition and representation of dynamics in human utterances;
  
  creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
  
  capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
  
  converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
  
  merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising the steps of:
3. The method of claim 1, further comprising the steps of:
- converting said marked-up text file into a voice file utilizing a text-to-speech application; and
  
  providing said voice file with said voice dynamics utilizing an output voice synthesizer to an output device of said data processing system.
4. The method of claim 1, wherein said selecting step further includes the steps of:
- determining levels for said speech parameters to represent normal speech patterns; and
  
  creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
5. The method of claim 1, wherein said capturing step captures a plurality of human voice dynamics including tone, emphasis, inflection and volume.
6. The method of claim 1, wherein said capturing step further records the voice dynamics concurrently with the textual representation of said human utterance.
7. The method of claim 1, whereby said merging step further overlays said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
8. The method of claim 1, wherein said merging step utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.

9. A system for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said system comprising:
- means for selecting predetermined parameters for recognition and representation of dynamics in human utterances;
  
  means for creating an enhanced human speech recognition software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
  
  means for capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
  
  means for converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
  
  means for merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, further comprising means for transmitting said marked-up document along with said textual representation to an output device of said data processing system.
  - 11. The system of claim 9, further comprising:
12. The system of claim 9, wherein said selecting means further includes:
- means for determining levels for said speech parameters to represent normal speech patterns; and
  
  means for creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
13. The system of claim 9, wherein said capturing means captures a plurality of human voice dynamics including tone, emphasis, inflection and volume.
14. The system of claim 9, wherein said capturing means further records the voice dynamics concurrently with the textual representation of said human utterance.
15. The system of claim 9, whereby said merging means further overlays said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
16. The system of claim 9, wherein said merging means utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.

17. A computer program product for providing voice dynamics of human utterances converted to and represented by text within a data processing system, said program product comprising of:
- a storage medium;
  
  program instructions stored or said storage medium for;
  
  selecting predetermined parameters for recognition and representation of dynamics in human utterances;
  
  creating an enhanced human speech recognition a software program implementing said predetermined parameters on a data processing system, wherein said enhanced software program includes an ability to monitor and record human voice dynamics and provide speech-to-text recognition;
  
  capturing said dynamics in a human utterance utilizing said enhanced human speech recognition software;
  
  converting said human utterance into a textual representation utilizing said speech-to-text ability of said software; and
  
  merging said dynamics along with said textual representation of the human utterance to produce a marked-up text document on said data processing system.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The computer program product of claim 17, further comprising program instructions for transmitting said marked-up document along with said textual representation to an output device of said data processing system.
  - 19. The computer program product of claim 17, further comprising program instructions for:
20. The computer program product of claim 17, wherein said program instructions for said selecting step further includes program instructions for:
- determining levels for said speech parameters to represent normal speech patterns; and
  
  creating a range of possible parameter values based on said levels to represent a plurality of voice dynamics wherein point within said range corresponds to a specific representation of a given voice dynamic.
21. The computer program product of claim 17, wherein said program instructions for said capturing step includes program instructions for capturing a plurality of human voice dynamics including tone, emphasis, inflection and volume.
22. The computer program product of claim 17, wherein said program instructions for said capturing step further permits recording the voice dynamics concurrently with the textual representation of said human utterance.
23. The computer program product of claim 17, whereby said program instructions for said merging step further includes program instructions to overlay said textual representation of said human utterance with its corresponding dynamics to provide a visual representation of the dynamics, wherein said visual representation is composed of predefined characteristics which include bolding, italic, hyphenation, and strictly textual cues to the dynamics associated with said human utterance.
24. The computer program product of claim 17, wherein said program instructions for said merging step utilizes a set of tagging information, and further wherein said tagging information includes a set of extended markup language (XML) tags which utilize a data type definition for speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Dietz, Timothy Alan
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/238,809
Time in Patent Office

719 Days
Field of Search

704/235, 704/260, 704/278, 704/276
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/1807   using prosody or stress

G10L 25/15   the extracted parameters be...

Capture and application of sender voice dynamics to enhance communication in a speech-to-text environment

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Capture and application of sender voice dynamics to enhance communication in a speech-to-text environment

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links