Method and apparatus for speaker recognition via comparing an unknown input to reference data

US 6,389,392 B1
Filed: 12/08/1998
Issued: 05/14/2002
Est. Priority Date: 10/15/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method of speaker recognition comprising comparing an input signal representing speech from an unknown speaker with reference data representing speech from each of a plurality of pre-defined speakers, at least one of the pre-defined speakers being represented by at least two instances of reference data, the method comprising:

comparing successive segments of the input signal with successive segments of the reference data and generating a comparison result for each successive segment, and, for each pre-defined speaker having at least two instances of reference data, the comparison result for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for each successive segment for the said pre-defined speaker, and identifying the unknown speaker on the basis of the composite comparison results.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.

29 Citations

View as Search Results

18 Claims

1. A method of speaker recognition comprising comparing an input signal representing speech from an unknown speaker with reference data representing speech from each of a plurality of pre-defined speakers, at least one of the pre-defined speakers being represented by at least two instances of reference data, the method comprising:
- comparing successive segments of the input signal with successive segments of the reference data and generating a comparison result for each successive segment, and, for each pre-defined speaker having at least two instances of reference data, the comparison result for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for each successive segment for the said pre-defined speaker, and identifying the unknown speaker on the basis of the composite comparison results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method according to claim 1 wherein the length of each instance of reference data is made equal to the mean length of the instances of reference data.
  - 3. A method according to claim 2 wherein the length of the input signal is made equal to the mean length of the reference data before the comparison step is carried out.
  - 4. A method according to claim 1 wherein each comparison result for each segment is weighted, prior to the calculation of the comparison result, in accordance with an estimated level of mismatch associated with the segment and the reference data.
  - 5. A method according to claim 4 wherein the comparison score D is the average of the weighted score for each closest matching segment i.e. $\begin{matrix} D = \frac{1}{N} \end{matrix}$
    - ∑
      
      n=1N
      
      
      
      w
      
      (n)
      
      d
      
      (n)(1)
6. A method according to claim 5 wherein $\begin{matrix} w \end{matrix}$
- (n)=[1J
  
  ∑
  
  j=1J
  
  
  
  dj′
  
  
  
  (n)]-1(2)where J is the number of allowed speakers used to determine the weighting factor and d′
  
  _j(n) is the comparison result for the nth segment of the input signal and the nth segment of the jth model.
7. A method according to claim 6 for verifying the identity of an unknown speaker, wherein the unknown speaker provides information relating to a claimed identity and the comparison result for the reference data associated with the information is compared to the comparison results for the other reference data and, if a criterion is met, the unknown speaker is verified as the speaker associated with the information.
8. A method according to claim 7 wherein the weighting factor is dependent on the comparison results for J+1 allowed speakers, where J+1 speakers represent the identified speaker and the J speakers having comparison scores closest to that of the identified speaker.
9. A method according to claim 1 wherein the input signal is divided into frames and the segments are one frame in length.

10. Speaker recognition apparatus comprising:
- an input for receiving an input signal representing speech from an unknown speaker;
  
  reference data representing speech from each of a plurality of pre-defined speakers, at least one of the pre-defined speakers being represented by at least two instances of reference data, the method comprising;
  
  comparing means for comparing successive segments of the input signal with successive segments of the reference data and generating a comparison result for each successive segment, decision means for generating, for each pre-defined pattern having at least two instances of reference data, a composite comparison result for the said pre-defined speaker from the comparison result for the closest matching segment of reference data for each segment of the input signal, and for identifying the unknown speaker on the basis of the composite comparison results.
- View Dependent Claims (11, 12, 13, 18)
- - 11. Apparatus according to claim 10 further comprising linear adjustment means for adjusting the length of each instance of reference data to be equal to the mean length of the instances of reference data.
  - 12. Apparatus according to claim 11 wherein linear adjustment means is arranged also to adjust the length of the input signal to be equal to the mean length of the instances of reference data before the comparison step is carried out.
  - 13. Apparatus according to claim 10 wherein each comparison result for each segment is weighted, prior to the calculation of the comparison result, in accordance with an estimated level of mismatch associated with the segment and the reference data.
  - 18. Apparatus according to claim 10 wherein the input signal is divided into frames, each frame representing a portion of an unknown utterance, and the segments are one frame in length.

14. Apparatus according to 13 wherein the comparison result D is the average of the weighted result for each closest matching segment i.e. $\begin{matrix} D = \frac{1}{N} \end{matrix}$
- ∑
  
  n=1N
  
  
  
  w
  
  (n)
  
  d
  
  (n)(1)where N is the number of segments, w(n) is the weighting factor for the nth segment and d(n) is the comparison result for the nth segment.
- View Dependent Claims (15, 16, 17)
- - 15. Apparatus according to claim 14 wherein $\begin{matrix} w \end{matrix}$
    - (n)=[1J
      
      ∑
      
      j=1J
      
      
      
      dj′
      
      
      
      (n)]-1(2)
16. Apparatus according to claim 15 for verifying the identity of an unknown speaker, wherein the apparatus includes an input for receiving information relating to a claimed identity of the unknown speaker and the comparison result for the speaker corresponding to the claimed identity is compared to the other comparison scores and, if a criterion is met, the unknown speaker is verified as the corresponding speaker.
17. Apparatus method according to claim 16 wherein the weighting factor is dependent on the comparison results for J+1 allowed speakers, where J+1 speakers represent the identified speaker and the J speakers having comparison scores closest to that of the identified speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Sivakumaran, Perasiriyan, Ariyaeeinia, Aladdin Mohammad, Pawlewski, Mark
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/202,026
Time in Patent Office

1,253 Days
Field of Search

704/231, 704/240, 704/239, 704/241, 704/246, 704/248, 704/250, 704/200, 704/251, 704/252, 704/247, 704/254, 704/255
US Class Current

704/241
CPC Class Codes

G06F 18/28   Determining representative ...

G10L 17/02   Preprocessing operations, e...

G10L 17/06   Decision making techniques;...

Method and apparatus for speaker recognition via comparing an unknown input to reference data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for speaker recognition via comparing an unknown input to reference data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others