Criteria for usable repetitions of an utterance during speech reference enrollment

US 6,012,027 A
Filed: 09/17/1997
Issued: 01/04/2000
Est. Priority Date: 05/27/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech reference enrollment method, comprising the steps of:

(a) receiving a first utterance of a vocabulary word;

(b) extracting a plurality of features from the first utterance;

(c) receiving a second utterance of the vocabulary word;

(d) determining a duration of the second utterance;

(e) when the duration is less than a minimum duration, requesting a user speak a third utterance of the vocabulary word and proceeding to step (i);

(f) extracting the plurality of features from the second utterance;

(g) determining a first similarity between the plurality of features from the first utterance and the plurality of features from the second utterance;

(h) when the first similarity is less than a predetermined similarity, requesting a user to speak a third utterance of the vocabulary word;

(i) extracting the plurality of features from the third utterance;

(j) determining a second similarity between the plurality of features from the first utterance and the plurality of features from the third utterance; and

(k) when the second similarity is greater than or equal to the predetermined similarity, forming a reference for the vocabulary word.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech reference enrollment method involves the following steps: (a) requesting a user speak a vocabulary word; (b) detecting a first utterance (354); (c) requesting the user speak the vocabulary word; (d) detecting a second utterance (358); (e) determining a first similarity between the first utterance and the second utterance (362); (f) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word; (g) detecting a third utterance (366); (h) determining a second similarity between the first utterance and the third utterance (370); and (i) when the second similarity is greater than or equal to the predetermined similarity, creating a reference (364).

64 Citations

View as Search Results

20 Claims

1. A speech reference enrollment method, comprising the steps of:
- (a) receiving a first utterance of a vocabulary word;
  
  (b) extracting a plurality of features from the first utterance;
  
  (c) receiving a second utterance of the vocabulary word;
  
  (d) determining a duration of the second utterance;
  
  (e) when the duration is less than a minimum duration, requesting a user speak a third utterance of the vocabulary word and proceeding to step (i);
  
  (f) extracting the plurality of features from the second utterance;
  
  (g) determining a first similarity between the plurality of features from the first utterance and the plurality of features from the second utterance;
  
  (h) when the first similarity is less than a predetermined similarity, requesting a user to speak a third utterance of the vocabulary word;
  
  (i) extracting the plurality of features from the third utterance;
  
  (j) determining a second similarity between the plurality of features from the first utterance and the plurality of features from the third utterance; and
  
  (k) when the second similarity is greater than or equal to the predetermined similarity, forming a reference for the vocabulary word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further including the steps of:
    - (l) when the second similarity is less than the predetermined similarity, determining a third similarity between the plurality of features from the second utterance and the plurality of features from the third utterance;
      
      (m) when the third similarity is greater than or equal to the predetermined similarity, forming the reference for the vocabulary word.
  - 3. The method of claim 2, further including the steps of:
    - (n) when the third similarity is less than the predetermined similarity, returning to step (a).
  - 4. The method of claim 1, wherein step (c) further includes the steps of:
    - (c1) determining a duration of the second utterance;
      
      (c2) when the duration is greater than a maximum duration, disregarding the second utterance.
  - 5. The method of claim 4, wherein step (c1) further includes the steps of:
    - (i) setting an amplitude threshold;
      
      (ii) determining a start time when an input signal exceeds the amplitude threshold;
      
      (iii) determining an end time, after the start time, when the input signal is less than the amplitude threshold;
      
      (iv) calculating the duration as a difference between the end time and the start time.
  - 6. The method of claim 1, wherein step (f) further includes the steps of:
    - (f1) determining an estimate of a number of voiced speech frames;
      
      (f2) when the estimate of the number of voiced speech frames is less than a threshold requesting the user repeat the vocabulary word;
      
      (f3) returning to step (c).
  - 7. The method of claim 1, wherein step (a) further includes the steps of:
    - (a1) determining a signal to noise ratio of the first utterance;
      
      (a2) when the signal to noise ratio is less than a predetermined signal to noise ratio, increasing a gain of a voice amplifier.
  - 8. The method of claim 7, further including the step of:
    - (a3) requesting the user repeat the vocabulary word.
  - 9. The method of claim 1, wherein step (b) further includes the step of:
    - (b1) determining an amplitude histogram of the first utterance.

10. A speech reference enrollment method, comprising the steps of:
- (a) requesting a user speak a vocabulary word;
  
  (b) detecting a first utterance;
  
  (c) determining if the first utterance exceeds an amplitude threshold;
  
  (d) when the first utterance does not exceed the amplitude threshold, return to step (a);
  
  (e) requesting the user speak the vocabulary word;
  
  (f) detecting a second utterance;
  
  (g) determining a first similarity between the first utterance and the second utterance;
  
  (h) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word;
  
  (i) detecting a third utterance;
  
  (i) determining a second similarity between the first utterance and the third utterance; and
  
  (k) when the second similarity is greater than or equal to the predetermined similarity, creating a reference.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, further including the steps of:
    - (l) determining a third similarity between the second utterance and the third utterance;
      
      (m) when the third similarity is greater than or equal to the predetermined similarity, creating the reference.
  - 12. The method of claim 11, further including the steps of:
    - (n) when the third similarity is less than the predetermined similarity, returning to step (a).
  - 13. The method of claim 10, wherein step (b) further includes the steps of:
    - (b1) determining an estimate of a number of voiced speech frames;
      
      (b2) when the number of voiced speech frames is less than a predetermined number of voiced speech frames, returning to step (a).
  - 14. The method of claim 10, wherein step (b) further includes the steps of:
    - (b1) determining a duration of the first utterance;
      
      (b2) when the duration is less than a minimum duration, returning to step (a);
      
      (b3) when the duration is greater than a maximum duration, returning to step (a).

15. A computer readable storage medium containing computer readable instructions that when executed by a computer performs the following steps:
- (a) requesting a user speak a vocabulary word;
  
  (b) receiving a first digitized utterance;
  
  (c) extracting a plurality of features from the first digitized utterance;
  
  (d) determining a signal to noise ratio;
  
  (e) when the signal to noise ratio is less than a predetermined signal to noise ratio, returning to step (a);
  
  (f) requesting the user speak the vocabulary word;
  
  (g) receiving a second digitized utterance of the vocabulary word;
  
  (h) extracting the plurality of features from the second digitized utterance;
  
  (i) determining a first similarity between the plurality of features from the first digitized utterance and the plurality of features from the second digitized utterance;
  
  (j) when the first similarity is less than a predetermined similarity, requesting the user to speak a third utterance of the vocabulary word;
  
  (k) extracting the plurality of features from a third digitized utterance;
  
  (l) determining a second similarity between the plurality of features from the first digitized utterance and the plurality of features from the third digitized utterance; and
  
  (m) when the second similarity is greater than or equal to the predetermined similarity, forming a reference for the vocabulary word.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer readable storage medium of claim 15, further executing the steps of:
    - (n) when the second similarity is less than the predetermined similarity, determining a third similarity between the plurality of features from the second digitized utterance and the plurality of features from the third digitized utterance;
      
      (o) when the third similarity is greater than or equal to the predetermined similarity, forming the reference for the vocabulary word.
  - 17. The computer readable storage medium of claim 16, further executing the steps of:
    - (p) when the third similarity is less than the predetermined similarity, returning to step (a).
  - 18. The computer readable storage medium of claim 15, wherein step (e) further includes the steps of:
    - (e1) determining if an amplifier gain is saturated;
      
      (e2) when the amplifier gain is saturated, going to step (a).
  - 19. The computer readable storage medium of claim 15, wherein step (e) further includes the step of increasing a gain of an amplifier.
  - 20. The computer readable storage medium of claim 18, wherein step (e2) further includes the step of decreasing a gain of an amplifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Ameritech Corp. (AT&T, Inc.)
Inventors
Bossemeyer, Jr., Robert Wesley
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/932,078
Time in Patent Office

839 Days
Field of Search

704/243, 704/238, 704/239, 704/248, 704/253, 704/251, 704/231
US Class Current

704/243
CPC Class Codes

G10L 15/07   to the speaker

G10L 17/04   Training, enrolment or mode...

G10L 2015/0631   Creating reference template...

G10L 2015/0636   Threshold criteria for the ...

G10L 2015/0638   Interactive procedures

H04M 2201/40   using speech recognition sp...

H04M 3/382   using authorisation codes o...

H04M 3/493   Interactive information ser...

Criteria for usable repetitions of an utterance during speech reference enrollment

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Criteria for usable repetitions of an utterance during speech reference enrollment

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links