Acoustic model training

US 10,096,315 B2
Filed: 04/05/2017
Issued: 10/09/2018
Est. Priority Date: 03/31/2016
Status: Active Grant

First Claim

Patent Images

1. A method, executed by one or more processors, the method comprising:

conducting speech recognition on a channel recording of a conversation to provide time boundaries and written language corresponding to utterances within the channel recording;

determining sentence or phrase boundaries for a transcription of the conversation;

aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording; and

training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, executed by a computer, includes receiving a channel recording corresponding to a conversation, receiving a transcription for the conversation, generating a conversation-specific language model for the conversation using the transcription, and conducting speech recognition on the channel recording using the conversation-specific language model to provide time boundaries and written language corresponding to utterances within the channel recording. The method further includes determining sentence or phrase boundaries for the transcription, aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording, and training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording. A computer system and computer program product corresponding to the method are also disclosed herein.

19 Citations

20 Claims

1. A method, executed by one or more processors, the method comprising:
- conducting speech recognition on a channel recording of a conversation to provide time boundaries and written language corresponding to utterances within the channel recording;
  
  determining sentence or phrase boundaries for a transcription of the conversation;
  
  aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording; and
  
  training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, conducting speech recognition on the channel recording comprises using a conversation-specific language model.
  - 3. The method of claim 2, wherein conducting speech recognition on the channel recording comprises interpolating between the conversation-specific language model and a general language model.
  - 4. The method of claim 2, wherein the conversation-specific language model comprises an n-gram model where n is greater than or equal to 6.
  - 5. The method of claim 1, wherein training the speech recognizer comprises training an acoustic model using a channel recording and a corresponding transcript.
  - 6. The method of claim 1, wherein aligning written language comprises conducting a dynamic programming procedure.
  - 7. The method of claim 1, wherein the sentence or phrase boundaries for the channel recording are determined without manual segmentation or silence-based automatic segmentation of the channel recording.
  - 8. The method of claim 1, further comprising trimming the channel recording according to the sentence or phrase boundaries for the channel recording.
  - 9. The method of claim 1, wherein trimming the channel recording eliminates extraneous segments.
  - 10. The method of claim 9, wherein the extraneous segments correspond to placing a customer on hold.
  - 11. The method of claim 1, further comprising pruning the transcript according to the sentence or phrase boundaries for the transcript.

12. A computer system comprising:
- one or more computer processors;
  
  one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising instructions to perform;
  
  conducting speech recognition on a channel recording of a conversation to provide time boundaries and written language corresponding to utterances within the channel recording;
  
  determining sentence or phrase boundaries for a transcription of the conversation;
  
  aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording; and
  
  training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The computer system of claim 12, wherein conducting speech recognition on the channel recording comprises using a conversation-specific language model.
  - 14. The computer system of claim 13, wherein conducting speech recognition on the channel recording comprises interpolating between the conversation-specific language model and a general language model.
  - 15. The computer system of claim 13, wherein the conversation-specific language model comprises an n-gram model where n is greater than or equal to 6.
  - 16. The computer system of claim 12, wherein training the speech recognizer comprises training an acoustic model using a channel recording and a corresponding transcript.
  - 17. The computer system of claim 12, wherein aligning written language comprises conducting a dynamic programming procedure.
  - 18. The computer system of claim 12, wherein the sentence or phrase boundaries for the channel recording are determined without manual segmentation or silence-based automatic segmentation of the channel recording.
  - 19. The computer system of claim 12, further comprising trimming the channel recording according to the sentence or phrase boundaries for the channel recording.

20. A method, executed by one or more processors, the method comprising:
- conducting speech recognition on concurrent channel recordings of a conversation using at least one conversation specific language model to provide time boundaries and written language corresponding to utterances within the concurrent channel recordings;
  
  determining sentence or phrase boundaries for a transcription of the conversation;
  
  aligning written language within one or more transcriptions corresponding to the concurrent channel recordings with the written language corresponding to the utterances with the concurrent channel recording to provide sentence or phrase boundaries for the concurrent channel recordings; and
  
  training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the concurrent channel recordings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Kuo, Hong-Kwang J., Mangu, Lidia L., Thomas, Samuel
Primary Examiner(s)
Albertalli, Brian Louis

Application Number

US15/479,304
Publication Number

US 20170287469A1
Time in Patent Office

552 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/05   Word boundary detection

G10L 15/063   Training

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

Acoustic model training

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic model training

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links