Speech interface system and method for control and interaction with applications on a computing system

US 8,165,886 B1
Filed: 09/29/2008
Issued: 04/24/2012
Est. Priority Date: 10/04/2007
Status: Active Grant

First Claim

Patent Images

1. A speech processing method, comprising:

receiving a speech input representing at least one of a command and a stream of data;

analyzing the speech for characteristics of a command structure, and if so, entering a command mode;

in a command mode,analyzing the speech input with respect to a set of at least one grammar representation, to determine an ambiguity and a completeness;

based in the determined ambiguity and completeness, prompting the user in a contextually appropriate manner for further speech input, to at least one of reduce ambiguity and increase completeness; and

if the speech input is sufficiently unambiguous and sufficiently complete, generating an output representing the command; and

in an absence of a characteristic of a command structures;

treating the speech input as one representative of data; and

generating an output as a symbolic representation of the speech input,wherein the output generated representing the command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of;

after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and

a command restores one of a previously preserved system state.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech processing system which exploits statistical modeling and formal logic to receive and process speech input, which may represent data to be received, such as dictation, or commands to be processed by an operating system, application or process. A command dictionary and dynamic grammars are used in processing speech input to identify, disambiguate and extract commands. The logical processing scheme ensures that putative commands are complete and unambiguous before processing. Context sensitivity may be employed to differentiate data and commands. A multi faceted graphic user interface may be provided for interaction with a user to speech enable interaction with applications and processes that do not necessarily have native support for speech input.

600 Citations

24 Claims

1. A speech processing method, comprising:
- receiving a speech input representing at least one of a command and a stream of data;
  
  analyzing the speech for characteristics of a command structure, and if so, entering a command mode;
  
  in a command mode,analyzing the speech input with respect to a set of at least one grammar representation, to determine an ambiguity and a completeness;
  
  based in the determined ambiguity and completeness, prompting the user in a contextually appropriate manner for further speech input, to at least one of reduce ambiguity and increase completeness; and
  
  if the speech input is sufficiently unambiguous and sufficiently complete, generating an output representing the command; and
  
  in an absence of a characteristic of a command structures;
  
  treating the speech input as one representative of data; and
  
  generating an output as a symbolic representation of the speech input,wherein the output generated representing the command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of;
  
  after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and
  
  a command restores one of a previously preserved system state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, further comprising the steps of:
    - entering a data input mode if the step of analyzing the speech for characteristics of a command structure does not result in entering a command mode or if the speech input represents a command to enter a data input mode;
      
      in a data input mode;
      
      treating the speech input as one representative of data, unless a context of the speech input indicates that the data input mode has terminated, and thereafter entering the command mode.
  - 3. The speech processing method of claim 1, wherein, if the speech input represents a command to enter a data input mode, entering a data input mode wherein subsequent speech input is analyzed for a command,if a command is found determining a context,if a command is in the context of data input, treating the speech input as one representative of data, otherwise generating an output as a symbolic representation of the speech input.
  - 4. The method according to claim 1, further comprising the step of maintaining at least one data structure representing at least a status of a grammar, wherein the data structure is updated based on the speech input and a context;
    - and the speech input, wherein the set of at least one grammar representation is generated dynamically based at least in part on available ones of a set of temporally varying available functions within the command structure.
  - 5. The method according to claim 1, wherein said analyzing determines if a single string of speech input comprises at least one of a single command impacting at least two software constructs, at least two commands, and a combination of at least one command and data, and processing the speech input in accordance with the determination.
  - 6. The method according to claim 1, wherein said analyzing step is performed by a plurality of analyzers in parallel, each analyzer analyzing according to a different set of criteria, and wherein the outputs of the plurality of analyzers are directed to a plurality of respective applications.
  - 7. The method according to claim 1, wherein at least one of a non-linguistic implicit input is employed as a cue to determine at least one of a context, and a target software construct for analyzing said input;
    - and at least one of a temporal analysis, natural language analysis, and syntactic analysis are used to determine a context of the speech input.
  - 8. The method according to claim 1, wherein the output generated representing the command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state.
  - 9. The method according to claim 1, wherein a plurality of applications are concurrently available, and said steps of analyzing and generating an output are performed with respect to, and directed at, a particular one of the available applications.

10. A speech processing method, comprising:
- analyzing a set of contexts to determine available commands;
  
  formulating command structures corresponding to the determined available commands;
  
  statistically modeling at least portions of the command structures;
  
  receiving a natural language speech input representing at least one command;
  
  processing the speech input with respect to the statistically modeled portions of the command structures;
  
  determining, with respect to the statistically modeled portions of the command structures, if the speech input likely represents a command;
  
  if the speech input likely represents at least one command, determining a completeness and an ambiguity of the likely at least one command;
  
  if the likely at least one command is too ambiguous or incomplete for execution, prompting the speaker for further input to decrease an ambiguity or increase the completeness;
  
  if the likely at least one command is sufficiently unambiguous and complete for execution, executing the command,wherein the command is targeted to one of a plurality of respective applications, while preserving a respective prior system state, wherein at least one of;
  
  after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and
  
  a command restores one of a previously preserved system state.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method according to claim 10, further comprising the step of maintaining at least one data structure representing at least a status of a grammar, wherein the data structure is updated based on the speech input and a context;
    - and the speech input, wherein the set of at least one grammar representation is generated dynamically based at least in part on available ones of a set of temporally varying available functions within the command structures.
  - 12. The method according to claim 10, wherein said analyzing determines if a single string of speech input comprises at least one of a single command impacting at least two software constructs, at least two commands, and a combination of at least one command and data, and processing the speech input in accordance with the determination.
  - 13. The method according to claim 10, wherein said analyzing step is performed by a plurality of analyzers in parallel, each analyzer analyzing according to a different set of criteria, and wherein the outputs of the plurality of analyzers are directed to a plurality of respective applications.
  - 14. The method according to claim 10, wherein at least one of a non-linguistic implicit input is employed as a cue to determine at least one of a context, and a target software construct for analyzing said input;
    - and at least one of a temporal analysis, natural language analysis, and syntactic analysis are used to determine a context of the speech input.
  - 15. The method according to claim 10, wherein the command is targeted to one of a plurality of respective applications, while preserving a respective prior system state, wherein at a command restores one of a previously preserved system state.
  - 16. The method according to claim 10, wherein a plurality of applications are concurrently available, and said step of analyzing is performed with respect to a particular one of the available applications and the command is executed by that respective application.

17. A speech processing method, comprising:
- receiving a natural language speech input representing commands and data in the form of spoken words;
  
  analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data;
  
  determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required;
  
  if required additional command input is not received within a contextually appropriate period, prompting the speaker for additional input to complete the command sufficient for at least partial execution;
  
  at least partially executing commands; and
  
  passing speech containing words intended as data to a data sink,wherein a command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of;
  
  after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and
  
  a command restores one of a previously preserved system state.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The method according to claim 17, further comprising the step of maintaining at least one data structure representing at least a status of a grammar, wherein the data structure is updated based on the speech input and a context;
    - and the speech input, wherein the set of at least one grammar representation is generated dynamically based at least in part on available ones of a set of temporally varying available functions within the command structure.
  - 19. The method according to claim 17, wherein at least one of a non-linguistic implicit input is employed as a cue to determine at least one of a context, and a target software construct for analyzing said input;
    - and at least one of a temporal analysis, natural language analysis, and syntactic analysis are used to determine a context of the speech input.
  - 20. The method according to claim 17, wherein a command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state.
  - 21. The method according to claim 17, wherein a plurality of applications are concurrently available, and said analyzing step is performed with respect to, and directed at, a particular one of the available applications and the command is at least partially executed by that respective application.

22. A method for recursive processing of speech, comprising:
- receiving a natural language speech input representing commands and data in the form of spoken words to be processed, the speech input comprising a command structure wherein a command is targeted to one of a plurality of respective applications while preserving a respective prior system state and in which a processing result for a first portion of the speech input is necessary for determining a processing result for a second portion of the speech input;
  
  analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data;
  
  determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required;
  
  at least partially executing commands;
  
  passing speech containing words intended as data to a data sink;
  
  assigning control of processing of the speech input to a first processing unit, for generating the processing result for the first portion of the speech input; and
  
  delegating, from the first processing unit, to a second processing unit, control of processing the second portion of the speech input, the determining of the processing result for the second portion of the speech input by the second processing unit being deferred until the processing result for the first portion is available, and after the processing result for the second portion is available, deferring control back to the first processing unit,wherein at least one of;
  
  after execution of the command structure, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and
  
  execution of command structure restores one of a previously preserved system state.
- View Dependent Claims (23)
- - 23. The method according to claim 22, wherein the second portion of the speech input comprises a command structure in which a processing result for a first subportion of the second portion input is necessary for determining a processing result for a second subportion of the second portion, further comprising:
    - delegating, from the second processing unit, to a third processing unit, control of processing the second subportion, the determining of the processing result for the second subportion by the third processing unit being deferred until the processing result for the first subportion is available, and after the processing result for the second subportion is available, deferring control back to the second processing unit.

24. A speech processing method, comprising:
- receiving a natural language speech input representing commands and data in the form of spoken words targeted to one of a plurality of respective applications, an execution of a second command interrupting an execution of a first command, wherein a respective prior system state representing a system state at the time of interruption is preserved, and wherein a plurality of system states may be preserved concurrently;
  
  analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data;
  
  determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required;
  
  at least partially executing commands;
  
  passing speech containing words intended as data to a data sink; and
  
  after execution of the second command, in dependence on at least one of;
  
  (i) a predefined condition, (ii) the second command, and (iii) a result of an execution of the second command, one of;
  
  (i) the preserved system state prior to interruption of the first command is restored, (ii) another preserved system state is restored, or (iii) the processing is assumed by an application without restoring the prior system state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Great Northern Research, LLC
Inventors
Gagnon, Jean, Roy, Philippe, Lagassey, Paul J.
Primary Examiner(s)
AZAD, ABUL K

Application Number

US12/241,028
Time in Patent Office

1,303 Days
Field of Search

704/275, 715/727, 715/728
US Class Current

704/275
CPC Class Codes

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

G10L 17/22   Interactive procedures; Man...

Speech interface system and method for control and interaction with applications on a computing system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

600 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Speech interface system and method for control and interaction with applications on a computing system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

600 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links