Voice log-in using spoken name input

US 5,293,452 A
Filed: 07/01/1991
Issued: 03/08/1994
Est. Priority Date: 07/01/1991
Status: Expired due to Term

First Claim

Patent Images

1. A voice log-in method for logging in to a system based on computerized recognition of a spoken name input comprising the steps:

creating an augmented name recognition model from the spoken name input for each person to be enrolled, wherein said augmented name recognition model includes constituent name-part utterances, and also includes any pause in the spoken name input, and wherein said augmented name recognition model represents a portion of said constituent name-part utterances as optional, thereby accommodating elimination of optional name-part utterances, and wherein said pause is also represented as optional, thereby accommodating unpredictable variations in said pause, and wherein said creating name recognition models is accomplished using HMM (Hidden Markov Modeling) to create HMM name recognition models;

storing said name recognition model for each person enrolled in a name recognition model database;

comparing the spoken name input with the stored name recognition models each time a person seeks access to the system by voice log-in; and

logging a person in to the system if a pattern match is found, during said comparing, between the spoken name input and one of the stored name recognition models.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice log-in system is based on a person'"'"'s spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition. In an enrollment phase, a dual pass endpointing procedure defines both the person'"'"'s full name (broad endpoints), and the component names separated by pauses (precise endpoints). An HMM (Hidden Markov Model) recognition model generator generates a corresponding HMM name recognition model modified by the insertion of additional skip transitions for the pauses between component names. In a recognition/update phase, a spoken-name speech signal is input to an HMM name recognition engine which performs speaker-independent name recognition--the modified HMM name recognition model permits the name recognition operation to accommodate pauses between component names of variable duration.

322 Citations

17 Claims

1. A voice log-in method for logging in to a system based on computerized recognition of a spoken name input comprising the steps:
- creating an augmented name recognition model from the spoken name input for each person to be enrolled, wherein said augmented name recognition model includes constituent name-part utterances, and also includes any pause in the spoken name input, and wherein said augmented name recognition model represents a portion of said constituent name-part utterances as optional, thereby accommodating elimination of optional name-part utterances, and wherein said pause is also represented as optional, thereby accommodating unpredictable variations in said pause, and wherein said creating name recognition models is accomplished using HMM (Hidden Markov Modeling) to create HMM name recognition models;
  
  storing said name recognition model for each person enrolled in a name recognition model database;
  
  comparing the spoken name input with the stored name recognition models each time a person seeks access to the system by voice log-in; and
  
  logging a person in to the system if a pattern match is found, during said comparing, between the spoken name input and one of the stored name recognition models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The voice log-in method of claim 1, wherein the step of creating augmented name recognition models comprises the substeps:
    - receiving the spoken name input for each person to be enrolled;
      
      performing an endpointing procedure on the spoken name input to delimit the beginning and end of a corresponding full-name utterance, and to locate the precise endpoints within such delimited full-name utterance that define said constituent name-part utterances and associated pause; and
      
      creating an augmented name recognition model based on broad and precise endpoints and said full-name and name part utterances in which any said constituent name-part utterance and any pause in the spoken name input are represented as optional.
  - 3. The voice log-in model of claim 2, wherein the step of performing an endpointing procedure comprises the substeps:
    - making a first endpointing pass using broad endpoint criteria to delimit the beginning and end of a corresponding full-name utterance; and
      
      thenmaking a second endpointing pass using precise endpoint criteria to locate the precise endpoints within such delimited full-name utterance that define said constituent name-part utterances and said associated pause.
  - 4. The voice log-in method of claim 3, wherein said endpointing procedure is dual-pass and energy-based, such that the energy of the spoken name input is computed prior to endpointing.
  - 5. The voice log-in method of claim 3, wherein the first and second endpointing passes are accomplished by the following substeps:
    - converting the spoken name input into speech energy computed every frame of a predetermined duration;
      
      estimating a speech utterance level parameter, using fast upward adaptation and slow downward adaption, and a noise level parameter, using slow upward adaptation and fast downward adaptation;
      
      determining an utterance detection threshold using the speech utterance level parameter, the noise level parameter, and a predetermined minimum RMS speech energy level;
      
      declaring the beginning of an utterance to be when the speech energy remains above the utterance detection threshold for a predetermined duration;
      
      declaring the end of an utterance to be when (a) the speech energy remains below the utterance detection threshold for a specified duration, and (b) no new utterance beginning is detected for a specified utterance separation duration;
      
      endpointing the speech energy, for the first endpoint pass, with a relatively large value for said specified utterance separation duration to obtain the broad endpoints that delimit the full-name utterance, and endpointing said delimited full-name utterance, for the second pass, again with a small value for the utterance separation duration to obtain the precise endpoints.
  - 6. The voice log-in method of claim 3, wherein the step of creating augmented name recognition models using the broad and precise endpoints and the full-name and name-part utterances comprises the step:
    - creating an augmented name recognition model for internal utterance endings and utterance beginnings by inserting an additional skip transition into the name recognition model, such that any name-part utterance, or any associated pause, or any combination of both, is made optional, thereby accommodating elimination of said any name-part utterance and unpredictable variations in said any associated pause.
  - 7. The voice log-in method of claim 6, wherein the full-name model is a finite state automaton.
  - 8. The voice log-in method of claim 3, wherein the step of creating augmented name recognition models using the broad and precise endpoints and the full-name and name-part utterances comprises the substeps:
    - creating a name-part model for each name-part utterance;
      
      creating a full-name utterance model characterizing a sequence of constituent name-part models; and
      
      augmenting the full-name model by including in the full-name utterance model between each constituent name-part model a nonspeech model representing nonspeech associated with a pause, such that the pause is made optional, thereby accommodating unpredictable variations in said any pause.
  - 9. The voice log-in method of claim 1, further comprising the step of:
    - updating a person'"'"'s name recognition model each time said person is logging in to the system by averaging the spoken name input and the name recognition model, and storing the updated name recognition model in the name recognition model database.

10. A voice log-in method for logging in to a system based on computerized recognition of a spoken name input comprising the steps:
- creating an augmented name recognition model from the spoken name input for each person to be enrolled, wherein said augmented name recognition model includes constituent name-part utterances, and also includes any pause in the spoken name input, and wherein said augmented name recognition model represents a portion of said constituent name-part utterances as optional, thereby accommodating elimination of optional name-part utterances, and wherein said pause is also represented as optional, thereby accommodating unpredictable variations in said pause;
  
  storing a name recognition model for each person enrolled in a name recognition model database;
  
  comparing the spoken name input for each person seeking access to the system by voice log-in with the stored name recognition models; and
  
  logging a person in to the system if during said comparing, a pattern match is found between the spoken name input and one of the stored name recognition models.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The voice log-in method of claim 10, wherein the step of creating augmented name recognition models comprises the substeps:
    - receiving the spoken name input for each person to be enrolled;
      
      performing an endpointing procedure on the spoken name input to delimit the beginning and end of a corresponding full-name utterance, and to locate the precise endpoints within such delimited full-name utterance that define said constituent name-part utterances and any associated pause; and
      
      creating an augmented name recognition model, based on broad and precise endpoints and said full-name and name-part utterances, in which any said name-part utterance and any pause in the spoken name input are represented as optional.
  - 12. The voice log-in model of claim 11, wherein the step of performing an endpointing procedure comprises the substeps:
    - making a first endpointing pass using broad endpoint criteria to delimit the beginning and end of a corresponding full-name utterance; and
      
      thenmaking a second endpointing pass using precise endpoint criteria to locate the precise endpoints within such delimited full-name utterance that define constituent name-part utterances and any associated pause.
  - 13. The voice log-in method of claim 10, wherein the step of creating augmented name recognition models using broad and precise endpoints and full-name and name-part utterances comprises the step:
    - creating an augmented name recognition model, for internal utterance endings and utterances beginnings that delimit constituent name-part utterances, by inserting an additional skip transition into the name recognition model, such that any name-part utterance, or any associated pause, or any combination of both, is made optional, thereby accommodating elimination of said any name-part utterance and unpredictable variations in said any pause.
  - 14. The voice log-in method of claim 10, wherein the step of creating augmented name recognition models using broad and precise endpoints and full-name and name-part utterances comprises the substeps:
    - creating a name-part model for each name-part utterance;
      
      creating a full-name utterance model characterizing a sequence of constituent name-part models; and
      
      augmenting the full-name utterance model by including in the full-name utterance model between constituent name-part models a nonspeech model representing nonspeech associated with a pause, such that any name-part utterance, or the pause, or any combination of both, is made optional, thereby accommodating elimination of said any name-part utterance and unpredictable variations in said any pause.

15. A voice log-in method for logging in to a system based on computerized recognition of a spoken name input comprising the steps:
- receiving the spoken name input for each person to be enrolled;
  
  making a first endpointing pass using broad endpoint criteria to delimit the beginning and end of a corresponding full-name utterance; and
  
  thenmaking a second endpointing pass using precise endpoint criteria to locate the precise endpoints within such delimited full-name utterance that define constituent name-part utterances and any associated pause;
  
  creating an augmented name recognition model, based on broad and precise endpoints and the full-name and name-part utterances, wherein said augmented name recognition model represents a portion of said constituent name-part utterances as optional, thereby accommodating elimination of optional name-part utterances, and wherein said pause is also represented as optional, thereby accommodating unpredictable variations in said pause;
  
  storing said augmented name recognition model for each person enrolled in a name recognition model database;
  
  comparing the spoken name input each time a person seeks access to the system by voice log-in with the stored augmented name recognition models; and
  
  logging a person in to the system if, while during said comparing, a pattern match is found between the spoken name input and one of the stored name recognition models.
- View Dependent Claims (16, 17)
- - 16. The voice log-in method of claim 15, wherein the step of creating augmented name recognition models using broad and precise endpoints and the full-name and name-part utterances comprises the step:
    - creating an augmented name recognition model, for internal utterance ending and beginnings that delimit constituent name-part utterances, by inserting an additional skip transition into the name recognition model, such that any name-part utterance, or any associated pause, or any combination of both, is made optional, thereby accommodating elimination of said any name-part utterance and unpredictable variations in said any pause.
  - 17. The voice log-in method of claim 15, wherein the step of creating augmented name recognition models using the board and precise endpoints and the full-name and name-part utterances comprises the substeps:
    - creating a name-part model for each name-part utterance;
      
      creating a full-name utterance model characterizing a sequence of constituent name-part models; and
      
      augmenting the full-name model by including in the full-name utterance model between constituent name-part models a nonspeech model representing nonspeech associated with a pause, such that any name-part utterance, or the pause, or any combination of both, is made optional, thereby accommodating elimination of said any name-part utterance and unpredictable variations in said any pause.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Picone, Joseph, Wheatley, Barbara J.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/724,298
Time in Patent Office

981 Days
Field of Search

381/41-45, 395/2.59, 395/2.65
US Class Current

704/250
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 17/04   Training, enrolment or mode...

G10L 2015/0631   Creating reference template...

G10L 2025/783   based on threshold decision

Voice log-in using spoken name input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

322 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Voice log-in using spoken name input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

322 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links