Simultaneous multi-user real-time speech recognition system

US 7,047,192 B2
Filed: 06/27/2001
Issued: 05/16/2006
Est. Priority Date: 06/28/2000
Status: Expired due to Term

First Claim

Patent Images

1. A system for creating and enhancing a transcript of a telephone conversation in a telephone call between two separate persons, the conversation consisting of a series of audio statements each of which is spoken by one of the persons, comprising:

(a) a first telephone receiver adapted to receive a series of audio statements from the first person and convert them into a first analog audio signal,(b) a second telephone receiver adapted to receive a series of audio statements from the second person and convert them into a second analog audio signal,(c) an analog-to-digital converter adapted to convert the first analog audio signal to a first digital audio signal, and the second analog audio signal to a second digital audio signal,(d) a splitter that duplicates the first digital audio signal into two identical digital audio signals, a third digital audio signal and a fourth digital audio signal, respectively, and that duplicates the second digital audio signal into two identical digital audio signals, a fifth digital audio signal and a sixth digital audio signal, respectively,(e) a first divider for dividing the third digital audio signal into audio segments to form a third segmented digital audio signal wherein each audio segment is time indexed, and the audio segments of the third segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(f) a second divider for dividing the fifth digital audio signal into audio segments to form a fifth segmented digital audio signal, wherein each audio segment is time indexed, and the audio segments of the fifth segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(g) a first audio storage device adapted to store the third segmented digital audio signal,(h) a second audio storage device adapted to store the fifth segmented digital audio signal,(i) a first audio-to-text converter adapted to transcribe the fourth digital audio signal to a first raw transcript of the conversation, the first raw transcript including a plurality of text words, each text word in the fourth digital audio signal being indexed to the audio segment in the third segmented audio signal to which it relates,(j) a first text storage device adapted to store the first raw transcript produced by the first audio-to-text converter,(k) a first text-to-audio associator adapted to associate each text word in the first raw transcript directly with the audio segment from which the text word was transcribed,(l) a viewer adapted to display the text located in the first text storage device, in the form of the first raw transcript of the conversation,(m) a highlighter associated with the viewer and adapted to specify a first specific text word in the first raw transcript displayed in the viewer,(n) an audio player associated with the viewer and adapted to employ the text-to-audio associator to audibility play back the audio segment associated with the first specific text word,(o) a manual editor associated with the viewer and adapted to correct the first specific text word, based on the use of the audio player, to enhance the first raw transcript to a first enhanced transcript, and(p) a second text storage device adapted to store the first enhanced transcript produced by the manual editor.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention is a combination of software and hardware components and methodologies that enable speech recognition for multiple users simultaneously. It introduces the concept of a “conversational voice log” and how voice logs are combined to represent the spoken words of a meeting or group conversations. It defines the components needed, command set for control, text output features, and usage of such a system.

69 Citations

View as Search Results

2 Claims

1. A system for creating and enhancing a transcript of a telephone conversation in a telephone call between two separate persons, the conversation consisting of a series of audio statements each of which is spoken by one of the persons, comprising:
- (a) a first telephone receiver adapted to receive a series of audio statements from the first person and convert them into a first analog audio signal,(b) a second telephone receiver adapted to receive a series of audio statements from the second person and convert them into a second analog audio signal,(c) an analog-to-digital converter adapted to convert the first analog audio signal to a first digital audio signal, and the second analog audio signal to a second digital audio signal,(d) a splitter that duplicates the first digital audio signal into two identical digital audio signals, a third digital audio signal and a fourth digital audio signal, respectively, and that duplicates the second digital audio signal into two identical digital audio signals, a fifth digital audio signal and a sixth digital audio signal, respectively,(e) a first divider for dividing the third digital audio signal into audio segments to form a third segmented digital audio signal wherein each audio segment is time indexed, and the audio segments of the third segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(f) a second divider for dividing the fifth digital audio signal into audio segments to form a fifth segmented digital audio signal, wherein each audio segment is time indexed, and the audio segments of the fifth segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(g) a first audio storage device adapted to store the third segmented digital audio signal,(h) a second audio storage device adapted to store the fifth segmented digital audio signal,(i) a first audio-to-text converter adapted to transcribe the fourth digital audio signal to a first raw transcript of the conversation, the first raw transcript including a plurality of text words, each text word in the fourth digital audio signal being indexed to the audio segment in the third segmented audio signal to which it relates,(j) a first text storage device adapted to store the first raw transcript produced by the first audio-to-text converter,(k) a first text-to-audio associator adapted to associate each text word in the first raw transcript directly with the audio segment from which the text word was transcribed,(l) a viewer adapted to display the text located in the first text storage device, in the form of the first raw transcript of the conversation,(m) a highlighter associated with the viewer and adapted to specify a first specific text word in the first raw transcript displayed in the viewer,(n) an audio player associated with the viewer and adapted to employ the text-to-audio associator to audibility play back the audio segment associated with the first specific text word,(o) a manual editor associated with the viewer and adapted to correct the first specific text word, based on the use of the audio player, to enhance the first raw transcript to a first enhanced transcript, and(p) a second text storage device adapted to store the first enhanced transcript produced by the manual editor.

2. A method for creating and enhancing a transcript of a telephone conversation in a telephone call between two separate persons, the conversation consisting of a series of audio statements each of which is spoken by one of the persons, comprising:
- (a) using a first telephone receiver to receive a series of audio statements from the first person and convert them into a first analog audio signal,(b) using a second telephone receiver to receive a series of audio statements from the second person and convert them into a second analog audio signal,(c) using an analog-to-digital converter to convert the first analog audio signal to a first digital audio signal, and the second analog audio signal to a second digital audio signal,(d) using a splitter to duplicate the first digital audio signal into two identical digital audio signals, a third digital audio signal and a fourth digital audio signal, respectively, and to duplicate the second digital audio signal into two identical digital audio signals, a fifth digital audio signal and a sixth digital audio signal, respectively,(e) using a first divider to divide the third digital audio signal into audio segments to form a third segmented digital audio signal wherein each audio segment is time indexed, and the audio segments of the third segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(f) using a second divider to divide the fifth digital audio signal into audio segments to form a fifth segmented digital audio signal, wherein each audio segment is time indexed, and the audio segments of the fifth segmented digital audio signal are bounded by two ascertainable events, said events being selected from the group of events comprising when one of the persons makes a telephone call, when a second person answers the phone call, when a person starts or stops speaking during the telephone call, when a second person speaks while the first person is speaking (considered as three separate events) during the telephone call, when audio volume of one of the audio signals increases (either mechanically or by a person raising the loudness of their voice), when audio volume of one of the audio signals decreases, when a button on a phone keypad is pressed, when a phone line is muted or unmuted, when a collect call is accepted, when a specific word or phrase is spoken, when a playback of an automatic recorded message occurs, when a phone number is verified, and when actions are taken based on a recorded message,(g) using a first audio storage device to store the third segmented digital audio signal,(h) using a second audio storage device to store the fifth segmented digital audio signal,(i) using a first audio-to-text converter to transcribe the fourth digital audio signal to a first raw transcript of the conversation, the first raw transcript including a plurality of text words, each text word in the fourth digital audio signal being indexed to the audio segment in the third segmented audio signal to which it relates,(j) using a first text storage device to store the first raw transcript produced by the first audio-to-text converter,(k) using a first text-to-audio associator to associate each text word in the first raw transcript directly with the audio segment from which the text word was transcribed,(l) using a viewer to display the text located in the first text storage device, in the form of the first raw transcript of the conversation,(m) using a highlighter associated with the viewer to specify a first specific text word in the first raw transcript displayed in the viewer,(n) using an audio player associated with the viewer to employ the text-to-audio associator to audibility play back the audio segment associated with the first specific text word,(o) using a manual editor associated with the viewer to correct the first specific text word, based on the use of the audio player, to enhance the first raw transcript to a first enhanced transcript, and(p) using a second text storage device to store the first enhanced transcript produced by the manual editor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SynKloud Technologies, LLC (SY Ventures)
Original Assignee
Darrell A. Poirier
Inventors
Poirier, Darrell A.
Primary Examiner(s)
CHAWAN, VIJAY B

Application Number

US09/893,171
Publication Number

US 20020049589A1
Time in Patent Office

1,784 Days
Field of Search

704/231, 704/270, 704/235, 704/270.1, 704/278, 704/251, 704/276, 714/44, 714/47, 361/685, 361/709
US Class Current

704/235
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 15/34 Adaptation of a single reco...

Simultaneous multi-user real-time speech recognition system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

69 Citations

2 Claims

Specification

Solutions

Use Cases

Quick Links

Simultaneous multi-user real-time speech recognition system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

69 Citations

2 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links