Interactive computer system recognizing spoken commands

US 5,664,061 A
Filed: 06/05/1995
Issued: 09/02/1997
Est. Priority Date: 04/21/1993
Status: Expired due to Term

First Claim

Patent Images

1. An interactive computer system comprising:

a processor executing a target computer program having a series of active program states occurring over a succession of time periods, said target computer program generating active state image data signals representing an active state image for an active state of the target computer program occurring during each time period, each active state image containing one or more objects;

means for displaying at least a first active-state image for a first active state occurring during a first time period;

means for identifying an object displayed in the first active-state image, and for generating from an identified object displayed in the first active-state image a list of one or more first active-state commands identifying a first active-state function which can be performed in the first active state of the target computer program;

means for storing a system vocabulary of acoustic command models, each acoustic command model representing one or more series of acoustic feature values representing an utterance of one or more words associated with the acoustic command model;

means for identifying a first active-state vocabulary of acoustic command models for the first active state, the first active-state vocabulary comprising the acoustic command models from the system vocabulary representing the first active-state commands, wherein the first active-state vocabulary changes dynamically as a function of both the identity of the target computer program and the active state image data signals which identify an active state of the target computer program; and

a speech recognizer for measuring a value of at least one feature of an utterance during each of a first sequence of time intervals within the first time period to produce a first series of feature signals, said speech recognizer comparing the first series of feature signals to each of the acoustic command models in the first active-state vocabulary to generate a match score for the utterance and each acoustic command model, and said speech recognizer outputting a command signal corresponding to the acoustic command model from the first active-state vocabulary having a best match score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An interactive computer system having a processor executing a target computer program, and having a speech recognizer for converting an utterance into a command signal for the target computer program. The target computer program has a series of active program states occurring over a series of time periods. At least a first active-state image is displayed for a first active state occurring during a first time period. At least one object displayed in the first active-state image is identified, and a list of one or more first active-state commands identifying functions which can be performed in the first active state of the target computer program is generated from the identified object. A first active-state vocabulary of acoustic command models for the first active state comprises the acoustic command models from a system vocabulary representing the first active-state commands. A speech recognizer measures the value of at least one feature of an utterance during each of a series of successive time intervals within the first time period to produce a series of feature signals. The measured feature signals are compared to each of the acoustic command models in the first active-state vocabulary to generate a match score for the utterance and each acoustic command model. The speech recognizer outputs a command signal corresponding to the command model from the first active-state vocabulary having the best match score.

131 Citations

25 Claims

1. An interactive computer system comprising:
- a processor executing a target computer program having a series of active program states occurring over a succession of time periods, said target computer program generating active state image data signals representing an active state image for an active state of the target computer program occurring during each time period, each active state image containing one or more objects;
  
  means for displaying at least a first active-state image for a first active state occurring during a first time period;
  
  means for identifying an object displayed in the first active-state image, and for generating from an identified object displayed in the first active-state image a list of one or more first active-state commands identifying a first active-state function which can be performed in the first active state of the target computer program;
  
  means for storing a system vocabulary of acoustic command models, each acoustic command model representing one or more series of acoustic feature values representing an utterance of one or more words associated with the acoustic command model;
  
  means for identifying a first active-state vocabulary of acoustic command models for the first active state, the first active-state vocabulary comprising the acoustic command models from the system vocabulary representing the first active-state commands, wherein the first active-state vocabulary changes dynamically as a function of both the identity of the target computer program and the active state image data signals which identify an active state of the target computer program; and
  
  a speech recognizer for measuring a value of at least one feature of an utterance during each of a first sequence of time intervals within the first time period to produce a first series of feature signals, said speech recognizer comparing the first series of feature signals to each of the acoustic command models in the first active-state vocabulary to generate a match score for the utterance and each acoustic command model, and said speech recognizer outputting a command signal corresponding to the acoustic command model from the first active-state vocabulary having a best match score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. An interactive computer system as claimed in claim 1, characterized in that:
    - the first active-state vocabulary comprises substantially less than all of the acoustic command models from the system vocabulary; and
      
      the speech recognizer does not compare the measured feature signals for the first time period to any acoustic command model which is not in the first active-state vocabulary.
  - 3. An interactive computer system as claimed in claim 2, characterized in that:
    - the means for displaying displays at least a second active-state image different from the first active-state image for a second active state occurring during a second time period different from the first time period;
      
      the means for identifying an object identifies an object displayed in the second active-state image, and generates from an identified object displayed in the second active-state image a catalog of one or more second active-state commands identifying a second active-state function function which can be performed in the second active state of the target computer program;
      
      the means for identifying a first active-state vocabulary identifies a second active-state vocabulary of acoustic command models for the second active state, the second active-state vocabulary comprising the acoustic command models from the system vocabulary representing the second active-state commands, the second active-state vocabulary being at least partly different from the first active-state vocabulary; and
      
      the speech recognizer measures the value of at least one feature of the utterance during each of a second sequence of time intervals within the second time period to produce a second series of feature signals, said speech recognizer comparing the second series of feature signals for the second time period to each of the acoustic command models in the second active-state vocabulary to generate the match score for the utterance and each acoustic command model, and said speech recognizer outputting the command signal corresponding to the acoustic command model from the second active-state vocabulary having the best match score.
  - 4. An interactive computer system as claimed in claim 3, characterized in that the target computer program has only one active state occurring during each time period.
  - 5. An interactive computer system as claimed in claim 4, characterized in that the target computer program comprises an operating system program.
  - 6. An interactive computer system as claimed in claim 5, characterized in that the target computer program comprises an application program and an operating system program.
  - 7. An interactive computer system as claimed in claim 6, characterized in that the target computer program comprises two or more application programs and an operating system program.
  - 8. An interactive computer system as claimed in claim 6, characterized in that at least some commands for the active-state identify functions which can be performed on the identified object in the active-state image for the active-state.
  - 9. An interactive computer system as claimed in claim 8, characterized in that the identified object in the active-state image comprises one or more of a character, a word, an icon, a button, a scroll bar, a slider, a list box, a menu, a check box, a container, or a notebook.
  - 10. An interactive computer system as claimed in claim 9, characterized in that the speech recognizer outputs two or more command signals corresponding to the command models from the active-state vocabulary having best match scores for a given time period.
  - 11. An interactive computer system as claimed in claim 10, characterized in that the active-state vocabulary of acoustic command models for each active state further comprises a set of global acoustic command models representing global commands identifying functions which can be performed in each active state of the target computer program.
  - 12. An interactive computer system as claimed in claim 11, characterized in that the means for displaying comprises a display.
  - 13. An interactive computer system as claimed in claim 11, characterized in that the means for displaying displays both the active-state image for the active state occurring during a time period, and at least a portion of one or more images for program states not occurring during the time period.

14. A method of computer interaction comprising:
- executing, on a processor, a target computer program having a series of active program states occurring over a succession of time periods, said target computer program generating active state image data signals representing an active state image for an active state of the target computer program occurring during each time period, each active state image containing one or more objects;
  
  displaying at least a first active-state image for a first active state occurring during a first time period;
  
  identifying an object displayed in the first active-state image, and generating from an identified object displayed in the first active-state image a list of one or more first active-state commands identifying a first active-state function which can be performed in the first active state of the target computer program;
  
  storing a system vocabulary of acoustic command models, each acoustic command model representing one or more series of acoustic feature values representing an utterance of one or more words associated with the acoustic command model;
  
  identifying a first active-state vocabulary of acoustic command models for the first active state, the first active-state vocabulary comprising the acoustic command models from the system vocabulary representing the first active-state commands wherein the first active-state vocabulary changes dynamically as a function of both the identity of the target computer program and the active state image data signals which identify an active state of the target computer program; and
  
  measuring a value of at least one feature of an utterance during each of first sequence of time intervals within the first time period to produce a first series of feature signals;
  
  comparing the first series of feature signals to each of the acoustic command models in the first active-state vocabulary to generate a match score for the utterance and each acoustic command model; and
  
  outputting a command signal corresponding to the acoustic command model from the first active-state vocabulary having a best match score.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 15. A method of computer interaction as claimed in claim 14, characterized in that:
    - the first active-state vocabulary comprises substantially less than all of the acoustic command models from the system vocabulary; and
      
      the step of comparing does not compare the measured feature signals for the first time period to any acoustic command model which is not in the first active-state vocabulary.
  - 16. A method of computer interaction as claimed in claim 15, further comprising the steps of:
    - displaying at least a second active-state image different from the first active-state image for a second active state occurring during a second time period different from the first time period;
      
      identifying an object displayed in the second active-state image, and generating from an identified object displayed in the second active-state image a catalog of one or more second active-state commands identifying a second active-state function which can be performed in the second active state of the target computer program;
      
      identifying a second active-state vocabulary of acoustic command models for the second active state, the second active-state vocabulary comprising the acoustic command models from the system vocabulary representing the second active-state commands, the second active-state vocabulary being at least partly different from the first active-state vocabulary;
      
      measuring the value of at least one feature of the utterance during each of a second sequence of time intervals within the second time period to produce a series of feature signals;
      
      comparing the second series of feature signals for the second time period to each of the acoustic command models in the second active-state vocabulary to generate the match score for the utterance and each acoustic command model; and
      
      outputting the command signal corresponding to the acoustic command model from the second active-state vocabulary having the best match score.
  - 17. A method of computer interaction as claimed in claim 16, characterized in that the target computer program has only one active state occurring during each time period.
  - 18. A method of computer interaction as claimed in claim 17, characterized in that the target computer program comprises an operating system program.
  - 19. A method of computer interaction as claimed in claim 18, characterized in that the target computer program comprises an application program and an operating system program.
  - 20. A method of computer interaction as claimed in claim 19, characterized in that the target computer program comprises two or more application programs and an operating system program.
  - 21. A method of computer interaction as claimed in claim 19, characterized in that at least some commands for the active-state identify functions which can be performed on the identified object in the active-state image for the active state.
  - 22. A method of computer interaction as claimed in claim 21, characterized in that the identified object in the active-state image comprises one or more of a character, a word, an icon, a button, a scroll bar, a slider, a list box, a menu, a check box, a container, or a notebook.
  - 23. A method of computer interaction as claimed in claim 22, characterized in that the step of outputting a command signal comprises outputting two or more command signals corresponding to the command models from the active-state vocabulary having best match scores for a given time period.
  - 24. A method of computer interaction as claimed in claim 23, characterized in that the active-state vocabulary of acoustic command models for each active state further comprises a set of global acoustic command models representing global commands identifying functions which can be performed in each active state of the target computer program.
  - 25. A method of computer interaction as claimed in claim 24, further comprising the step of displaying both the active-state image for the active state occurring during a time period, and at least a portion of one or more images for program states not occurring during the time period.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Karat, John, Levy, Stephen Eric, Andreshak, Joseph Charles, Lucassen, John, Daggett, Gregg H., Mack, Robert Lawrence
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Collins, Alphonso A.

Application Number

US08/462,735
Time in Patent Office

820 Days
Field of Search

395/2.44, 395/2.4, 395/2.6, 395/2.79, 395/2.84
US Class Current

704/275
CPC Class Codes

G06F 3/16   Sound input; Sound output s...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Interactive computer system recognizing spoken commands

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

131 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Interactive computer system recognizing spoken commands

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

131 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others