Transcription system for multiple speakers, using and establishing identification

US 6,332,122 B1
Filed: 06/23/1999
Issued: 12/18/2001
Est. Priority Date: 06/23/1999
Status: Expired due to Term

First Claim

Patent Images

1. In a computer system having a text independent speech recognition application, a method of transcribing text from multiple speakers comprising the steps of:

receiving a speech signal from one of a plurality of speakers through a single channel;

assigning a unique speaker ID to said speaker providing said speech signal through said channel;

processing said speech signal into text using a speech recognition model;

creating a document containing said text;

associating said processed speech signal and said text in said document with said unique speaker ID assigned to said speaker; and

, monitoring said speech signal for a speaker change to a different one of said plurality of speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for transcribing text from multiple speakers in a computer system having a speech recognition application. The system receives speech from one of a plurality of speakers through a single channel, assigns a speaker ID to the speaker, transcribes the speech into text, and associates the speaker ID with the speech and text. In order to detect a speaker change, the system monitors the speech input through the channel for a speaker change.

232 Citations

36 Claims

1. In a computer system having a text independent speech recognition application, a method of transcribing text from multiple speakers comprising the steps of:
- receiving a speech signal from one of a plurality of speakers through a single channel;
  
  assigning a unique speaker ID to said speaker providing said speech signal through said channel;
  
  processing said speech signal into text using a speech recognition model;
  
  creating a document containing said text;
  
  associating said processed speech signal and said text in said document with said unique speaker ID assigned to said speaker; and
  
  , monitoring said speech signal for a speaker change to a different one of said plurality of speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 22, 23)
- - 2. In a method of transcribing text from multiple speakers as claimed in claim 1, further comprising the step:
3. In a method of transcribing text from multiple speakers as claimed in claim 2, further comprising the step:
- associating said processed speech signal and said text in said document from said different speaker with said different unique speaker ID.
4. In a method of transcribing text from multiple speakers as claimed in claim 1, wherein said speech signal is buffered.
5. In a method of transcribing text from multiple speakers as claimed in claim 1, wherein at least one of said speakers is an enrolled speaker, and a preassigned unique speaker ID is assigned to said enrolled speaker.
6. In a method of transcribing text from multiple speakers as claimed in claim 1, wherein text in said document which has been processed from portions of said speech signal which can be attributed to one speaker is distinguished from text in said document which has been processed from other portions of said speech signal which can be attributed to different speakers.
7. In a method of transcribing text from multiple speakers as claimed in claim 6, wherein text in said document which can be attributed to different speakers is distinguished by starting a new paragraph in said document for every speaker change.
8. In a method of transcribing text from multiple speakers as claimed in claim 1, wherein at least one of said speakers is an unenrolled speaker.
9. In a method of transcribing text from multiple speakers as claimed in claim 8, wherein a speech signal and corresponding processed text from said unenrolled speaker is used to enroll said speaker.
10. In a method of transcribing text from multiple speakers as claimed in claim 8, wherein at least a portion of said speech signal and corresponding processed text from said unenrolled speaker is used to develop a speaker dependent speech recognition model.
11. In a method of transcribing text from multiple speakers as claimed in claim 10, wherein said speaker dependent model is used to reprocess the text in said document for said unenrolled speaker.
12. In a method of transcribing text from multiple speakers as claimed in claim 1, wherein a different speech recognition model is used to reprocess said text in said document.
22. In a system as claimed in claim 8, wherein at least a portion of said speech signal and corresponding processed text from said unenrolled speaker is used to develop a speaker dependent speech recognition model.
23. In a system as claimed in claim 22, wherein said speaker dependent model is used to reprocess the text in said document for said unenrolled speaker.

13. In a computer system having a text independent speech recognition application adapted for transcribing text from multiple speakers comprising:
- means for receiving a speech signal from one of a plurality of speakers through a single channel;
  
  means for assigning a unique speaker ID to said speaker providing said speech signal through said channel;
  
  means for processing said speech signal into text using a speech recognition model;
  
  means for creating a document containing said text;
  
  means for associating said processed speech signal and said text in said document with said unique speaker ID assigned to said speaker; and
  
  , means for monitoring said speech signal for a speaker change to a different one of said plurality of speakers.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 24)
- - 14. In a system as claimed in claim 13, further comprising:
15. In a system as claimed in claim 14, further comprising:
- means for associating said processed speech signal and said text in said document from said different speaker with said different unique speaker ID.
16. In a system as claimed in claim 13, wherein said speech signal is buffered.
17. In a system as claimed in claim 13, wherein at least one of said speakers is an enrolled speaker, and a preassigned unique speaker ID is assigned to said enrolled speaker.
18. In a system as claimed in claim 13, wherein text in said document which has been processed from portions of said speech signal which can be attributed to one speaker is distinguished from text in said document which has been processed from other portions of said speech signal which can be attributed to different speakers.
19. In a system as claimed in claim 18, wherein text in said document which can be attributed to different speakers is distinguished by starting a new paragraph in said document for every speaker change.
20. In a system as claimed in claim 13, wherein at least one of said speakers is an unenrolled speaker.
21. In a system as claimed in claim 20, wherein a speech signal and corresponding processed text from said unenrolled speaker is used to enroll said speaker.
24. In a system as claimed in claim 13, wherein a different speech recognition model is used to reprocess said text in said document.

25. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- receiving a speech signal from one of a plurality of speakers through a single channel;
  
  assigning a unique speaker ID to said speaker providing said speech signal through said channel;
  
  processing said speech signal into text using a speech recognition model;
  
  creating a document containing said text;
  
  associating said processed speech signal and said text in said document with said unique speaker ID assigned to said speaker; and
  
  , monitoring said speech signal for a speaker change to a different one of said plurality of speakers.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The machine readable storage as claimed in claim 25, further including a plurality of code sections executable by a machine for causing the machine to perform the step of:
27. The machine readable storage as claimed in claim 26, further including a plurality of code sections executable by a machine for causing the machine to perform the step of:
- associating said processed speech signal and said text in said document from said different speaker with said different unique speaker ID.
28. The machine readable storage as claimed in claim 25, wherein said speech signal is buffered.
29. The machine readable storage as claimed in claim 25, wherein at least one of said speakers is an enrolled speaker, and a preassigned unique speaker ID is assigned to said enrolled speaker.
30. The machine readable storage as claimed in claim 25, wherein text in said document which has been processed from portions of said speech signal which can be attributed to one speaker is distinguished from text in said document which has been processed from other portions of said speech signal which can be attributed to different speakers.
31. The machine readable storage as claimed in claim 30, wherein text in said document which can be attributed to different speakers is distinguished by starting a new paragraph in said document for every speaker change.
32. The machine readable storage as claimed in claim 25, wherein at least one of said speakers is an unenrolled speaker.
33. The machine readable storage as claimed in claim 32, wherein a speech signal and corresponding processed text from said unenrolled speaker is used to enroll said speaker.
34. The machine readable storage as claimed in claim 32, wherein at least a portion of said speech signal and corresponding processed text from said unenrolled speaker is used to develop a speaker dependent speech recognition model.
35. The machine readable storage as claimed in claim 34, wherein said speaker dependent model is used to reprocess the text in said document for said unenrolled speaker.
36. The machine readable storage as claimed in claim 25, wherein a different speech recognition model is used to reprocess said text in said document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Lewis, James R., Maes, Stephane H., Ortega, Kerry A., Wang, Huifang, Vanbuskirk, Ronald E.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/337,392
Time in Patent Office

909 Days
Field of Search

704/246, 704/271, 704/270, 704/275, 704/250.56
US Class Current

704/270
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 17/00 Speaker identification or v...

Transcription system for multiple speakers, using and establishing identification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

232 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Transcription system for multiple speakers, using and establishing identification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

232 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links