SPEECH PROCESSING SYSTEM AND METHOD

US 20120253811A1
Filed: 08/23/2011
Published: 10/04/2012
Est. Priority Date: 03/30/2011
Status: Active Grant

First Claim

Patent Images

1. A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers;

the method comprising;

receiving speech;

dividing the speech into segments as it is received;

processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising;

performing primary decoding of the segment using an acoustic model and a language model;

obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding;

comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker;

updating the selected speaker profile;

performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile;

outputting the decoded speech for the identified speaker,wherein the speaker profiles are updated as further segments of speech relating to a speaker profile are processed.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers;

- the method comprising:
- receiving speech;
- dividing the speech into segments as it is received;
- processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising:
  - performing primary decoding of the segment using an acoustic model and a language model;
  - obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding;
  - comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker;
  - updating the selected speaker profile;
  - performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile;
- outputting the decoded speech for the identified speaker,
  wherein the speaker profiles are updated as further segments of speech relating to a speaker profile are processed.

85 Citations

View as Search Results

14 Claims

1. A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers;
- the method comprising;
  
  receiving speech;
  
  dividing the speech into segments as it is received;
  
  processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising;
  
  performing primary decoding of the segment using an acoustic model and a language model;
  
  obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding;
  
  comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker;
  
  updating the selected speaker profile;
  
  performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile;
  
  outputting the decoded speech for the identified speaker,wherein the speaker profiles are updated as further segments of speech relating to a speaker profile are processed.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method according to claim 1, wherein if the segment parameters do not match closely enough with a stored speaker profile, a new speaker profile is initialised based on the parameters obtained for the segment.
  - 3. A method according to claim 2, wherein a new speaker profile is initialised if the likelihood or value of auxiliary function is greater for a unity transform or generic transform than for one of the stored speaker profiles.
  - 4. A method according to claim 2, wherein a new speaker profile is initialised if the likelihood or value of auxiliary function is less than a predetermined threshold.
  - 5. A method according to claim 1, wherein obtaining segment parameters comprises obtaining the parameters which allow a speaker transform to be estimated, said speaker transform adapting the speech of the new speaker to that of the independent speaker of the acoustic model.
  - 6. A method according to claim 5, wherein the speaker transform is an MLLR or CMLLR transform.
  - 7. A method according to claim 5, wherein the speaker profile comprises both adaptive and prior statistics.
  - 8. A method according to claim 1, wherein the primary decoding uses a language model.
  - 9. A method according to claim 1, wherein secondary decoding comprises rescoring a lattice of possible text corresponding to the segment.
  - 10. A method according to claim 1, wherein dividing the input speech into segments comprises detecting where there is silence in the input speech.
  - 11. A method according to claim 1, wherein the base speaker is a canonical speaker.
  - 12. A method according to claim 1, wherein the base speaker is the speaker of a previous segment.
  - 13. A carrier medium carrying computer readable instructions for controlling the computer to carry out the method of claim 1.

14. A system for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers;
- the system comprising;
  
  a receive for audio containing speech; and
  
  a processor, said processor being adapted to;
  
  divide the speech into segments as it is received;
  
  process the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising;
  
  perform primary decoding of the segment using an acoustic model and a language model;
  
  obtain segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding;
  
  compare the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker;
  
  update the selected speaker profile; and
  
  perform a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile,the system further comprising an output for outputting the decoded speech for the identified speaker,wherein the speaker profiles are updated as further segments of speech relating to a speaker profile are processed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Gales, Mark John Francis, Chin, Kean Kheong, BRESLIN, Catherine, Knill, Katherine Mary

Granted Patent

US 8,612,224 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

G10L 17/00 Speaker identification or v...

SPEECH PROCESSING SYSTEM AND METHOD

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

85 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH PROCESSING SYSTEM AND METHOD

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

85 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links