Cut and paste spoofing detection using dynamic time warping

US 9,002,706 B2
Filed: 12/10/2009
Issued: 04/07/2015
Est. Priority Date: 12/10/2008
Status: Active Grant

First Claim

Patent Images

1. A method for comparing voice utterances, the method comprising the steps of:

receiving, at a computer, a plurality of voice utterances of a given text sample;

extracting a plurality of features from a first voice utterance of the given text sample and extracting a plurality of features from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance;

applying dynamic time warping to one or more time dependent characteristics of the first and/or second voice utterance by minimizing one or more distance measures, wherein a distance measure is a measure of a difference between a time dependent characteristic of the first voice utterance and a corresponding time dependent characteristic of the second voice utterance, and wherein a time dependent characteristic of a voice utterance is a time dependent characteristic of either a single feature or a combination of two or more features; and

calculating a total distance measure, wherein the total distance measure is a measure for a difference between the first voice utterance of the given text sample and the second voice utterance of the given text sample, wherein the total distance measure is calculated based at least based on one or more pairs of time dependent characteristics, and wherein a pair of time dependent characteristics is composed of a time dependent characteristic of the first or second voice utterance and of a dynamically time warped time dependent characteristic of the respectively second or first voice utterance, or wherein a pair of time dependent characteristics is composed of a dynamically time warped time dependent characteristic of the first voice utterance and of a dynamically time warped time dependent characteristic of the second voice utterance;

wherein the total distance measure is used to detect that the second voice utterance is a result of cut and paste spoofing;

wherein the detection of cut and paste spoofing of a second voice utterance is accomplished by measuring abrupt temporal changes of feature values.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention refers to a method for comparing voice utterances, the method comprising the steps: extracting a plurality of features (201) from a first voice utterance of a given text sample and extracting a plurality of features (201) from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance; applying dynamic time warping (202) to one or more time dependent characteristics of the first and/or second voice utterance e.g. by minimizing one or more distance measures, wherein a distance measure is a measure for the difference of a time dependent characteristic of the first voice utterance and a corresponding time dependent characteristic of the second voice utterance, and wherein a time dependent characteristic of a voice utterance is a time dependent characteristic of either a single feature or a combination of two or more features; calculating a total distance measure (203), wherein the total distance measure is a measure for the difference between the first voice utterance of the given text sample and the second voice utterance of said given text sample, wherein the total distance measure is calculated based on one or more pairs of said time dependent characteristic, and wherein a pair of time dependent characteristic is calculate total composed of a time dependent characteristic of the first or second voice utterance and of a dynamically time warped (202) time dependent characteristic of the respectively second or first voice utterance, or wherein a pair of time dependent characteristic is composed of a dynamically time warped (202) time dependent characteristic of the first voice utterance and of a dynamically time warped (202) time dependent characteristic of the second voice utterance.

10 Citations

View as Search Results

13 Claims

1. A method for comparing voice utterances, the method comprising the steps of:
- receiving, at a computer, a plurality of voice utterances of a given text sample;
  
  extracting a plurality of features from a first voice utterance of the given text sample and extracting a plurality of features from a second voice utterance of said given text sample, wherein each feature is extracted as a function of time, and wherein each feature of the second voice utterance corresponds to a feature of the first voice utterance;
  
  applying dynamic time warping to one or more time dependent characteristics of the first and/or second voice utterance by minimizing one or more distance measures, wherein a distance measure is a measure of a difference between a time dependent characteristic of the first voice utterance and a corresponding time dependent characteristic of the second voice utterance, and wherein a time dependent characteristic of a voice utterance is a time dependent characteristic of either a single feature or a combination of two or more features; and
  
  calculating a total distance measure, wherein the total distance measure is a measure for a difference between the first voice utterance of the given text sample and the second voice utterance of the given text sample, wherein the total distance measure is calculated based at least based on one or more pairs of time dependent characteristics, and wherein a pair of time dependent characteristics is composed of a time dependent characteristic of the first or second voice utterance and of a dynamically time warped time dependent characteristic of the respectively second or first voice utterance, or wherein a pair of time dependent characteristics is composed of a dynamically time warped time dependent characteristic of the first voice utterance and of a dynamically time warped time dependent characteristic of the second voice utterance;
  
  wherein the total distance measure is used to detect that the second voice utterance is a result of cut and paste spoofing;
  
  wherein the detection of cut and paste spoofing of a second voice utterance is accomplished by measuring abrupt temporal changes of feature values.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein the first voice utterance has been recorded previously, and wherein the second voice utterance is received from a speaker upon request.
  - 3. The method according to claim 2, wherein the total distance measure is used to authenticate the speaker of the second voice utterance.
  - 4. The method according to claim 1, wherein in the plurality of features comprises one or more of the following features:
    - the logPitch of a pitch or a function thereof, wherein logPitch is the logarithm of the pitch;
      
      the logF1 of a first formant or a function thereof, wherein logF1 is the logarithm of the first formant;
      
      the logF2 of a second formant or a function thereof, wherein logF2 is the logarithm of the second formant;
      
      the logE of energy of or a function thereof, wherein logE is the logarithm of the energy;
      
      C1 or a function thereof, wherein C1 is the low frequency energy divided by the high frequency energy;
      
      and temporal derivatives of any of the above features such as the temporal derivative of logPitch, logF1, logF2, logE and C1.
  - 5. The method according to claim 4, wherein a distance measure of dynamic time warping is defined as one of a Euclidean distance, a Mahalanobis distance, and a Cosine distance.
  - 6. The method according to claim 5, wherein the total distance measure is defined as a Euclidean distance, a Mahalanobis distance or a Cosine distance.
  - 7. The method according to claim 6, wherein the distance measure is calculated based at least on a single pair of time dependent characteristics, wherein each time dependent characteristic is a characteristic of a single feature.
  - 8. The method according to claim 6, wherein the distance measure is calculated based on a single pair of time dependent characteristics, wherein each time dependent characteristic is a characteristic of a combination of a plurality of features.
  - 9. The method of claim 6, wherein the total distance measure is calculated based at least on a plurality of pairs of time dependent characteristics, wherein each time dependent characteristic is a characteristic of a single feature.
  - 10. The method of claim 6, wherein the total distance measure is calculated based on a plurality of pairs of time dependent characteristics, wherein at least one time dependent characteristic is a characteristic of a single feature and at least one characteristic of a combination of a plurality of features.
  - 11. The method of claim 6, wherein the total distance measure is calculated based on a plurality of pairs of time dependent characteristics, wherein each time dependent characteristic is a characteristic of a combination of a plurality of features.
  - 12. The method of claim 11, wherein a plurality of total distance measures is calculated (203), and wherein the comparison of the first voice utterance with the second voice utterance is based on the plurality of total distance measures by selecting one or more total distance measures from the plurality of total distance measures and/or by combining at least two total distance measures.
  - 13. A computer-readable medium comprising computer-executable instructions for performing the method of claim 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Agnitio s.l. (Microsoft Corporation)
Inventors
Lopez, Jesus Antonio Villalba, Gimenez, Alfonso Ortega, Solano, Eduardo Lleida, Gomar, Marta Garcia, Redondo, Sara Varela
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US13/515,281
Publication Number

US 20140081638A1
Time in Patent Office

1,944 Days
Field of Search

None
US Class Current

704/246
CPC Class Codes

B66B 13/26   between closing doors

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/06   Decision making techniques;...

G10L 17/24   the user being prompted to ...

G10L 17/26   Recognition of special voic...

Cut and paste spoofing detection using dynamic time warping

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Cut and paste spoofing detection using dynamic time warping

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others