Speech interface system and method for control and interaction with applications on a computing system
First Claim
1. A speech processing method, comprising:
- receiving a speech input representing at least one of a command and a stream of data;
analyzing the speech for characteristics of a command structure, and if so, entering a command mode;
in a command mode,analyzing the speech input with respect to a set of at least one grammar representation, to determine an ambiguity and a completeness;
based in the determined ambiguity and completeness, prompting the user in a contextually appropriate manner for further speech input, to at least one of reduce ambiguity and increase completeness; and
if the speech input is sufficiently unambiguous and sufficiently complete, generating an output representing the command; and
in an absence of a characteristic of a command structures;
treating the speech input as one representative of data; and
generating an output as a symbolic representation of the speech input,wherein the output generated representing the command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of;
after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and
a command restores one of a previously preserved system state.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech processing system which exploits statistical modeling and formal logic to receive and process speech input, which may represent data to be received, such as dictation, or commands to be processed by an operating system, application or process. A command dictionary and dynamic grammars are used in processing speech input to identify, disambiguate and extract commands. The logical processing scheme ensures that putative commands are complete and unambiguous before processing. Context sensitivity may be employed to differentiate data and commands. A multi faceted graphic user interface may be provided for interaction with a user to speech enable interaction with applications and processes that do not necessarily have native support for speech input.
597 Citations
24 Claims
-
1. A speech processing method, comprising:
-
receiving a speech input representing at least one of a command and a stream of data; analyzing the speech for characteristics of a command structure, and if so, entering a command mode; in a command mode, analyzing the speech input with respect to a set of at least one grammar representation, to determine an ambiguity and a completeness; based in the determined ambiguity and completeness, prompting the user in a contextually appropriate manner for further speech input, to at least one of reduce ambiguity and increase completeness; and if the speech input is sufficiently unambiguous and sufficiently complete, generating an output representing the command; and in an absence of a characteristic of a command structures; treating the speech input as one representative of data; and generating an output as a symbolic representation of the speech input, wherein the output generated representing the command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of; after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and a command restores one of a previously preserved system state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech processing method, comprising:
-
analyzing a set of contexts to determine available commands; formulating command structures corresponding to the determined available commands; statistically modeling at least portions of the command structures; receiving a natural language speech input representing at least one command; processing the speech input with respect to the statistically modeled portions of the command structures; determining, with respect to the statistically modeled portions of the command structures, if the speech input likely represents a command; if the speech input likely represents at least one command, determining a completeness and an ambiguity of the likely at least one command; if the likely at least one command is too ambiguous or incomplete for execution, prompting the speaker for further input to decrease an ambiguity or increase the completeness; if the likely at least one command is sufficiently unambiguous and complete for execution, executing the command, wherein the command is targeted to one of a plurality of respective applications, while preserving a respective prior system state, wherein at least one of; after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and a command restores one of a previously preserved system state. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A speech processing method, comprising:
-
receiving a natural language speech input representing commands and data in the form of spoken words; analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data; determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required; if required additional command input is not received within a contextually appropriate period, prompting the speaker for additional input to complete the command sufficient for at least partial execution; at least partially executing commands; and passing speech containing words intended as data to a data sink, wherein a command is targeted to one of a plurality of respective applications while preserving a respective prior system state, wherein at least one of; after command execution, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and a command restores one of a previously preserved system state. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A method for recursive processing of speech, comprising:
-
receiving a natural language speech input representing commands and data in the form of spoken words to be processed, the speech input comprising a command structure wherein a command is targeted to one of a plurality of respective applications while preserving a respective prior system state and in which a processing result for a first portion of the speech input is necessary for determining a processing result for a second portion of the speech input; analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data; determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required; at least partially executing commands; passing speech containing words intended as data to a data sink; assigning control of processing of the speech input to a first processing unit, for generating the processing result for the first portion of the speech input; and delegating, from the first processing unit, to a second processing unit, control of processing the second portion of the speech input, the determining of the processing result for the second portion of the speech input by the second processing unit being deferred until the processing result for the first portion is available, and after the processing result for the second portion is available, deferring control back to the first processing unit, wherein at least one of; after execution of the command structure, and in dependence on a result thereof, a system state is selectively restored or processing assumed by another application without restoring the prior system state; and execution of command structure restores one of a previously preserved system state. - View Dependent Claims (23)
-
-
24. A speech processing method, comprising:
-
receiving a natural language speech input representing commands and data in the form of spoken words targeted to one of a plurality of respective applications, an execution of a second command interrupting an execution of a first command, wherein a respective prior system state representing a system state at the time of interruption is preserved, and wherein a plurality of system states may be preserved concurrently; analyzing the speech for contextual indicia to distinguish between spoken commands instructing a device at take automated action, and spoken words intended as data; determining whether speech analyzed to comprise commands, represents a sufficiently complete command capable of at least partial execution, or whether additional command input is required; at least partially executing commands; passing speech containing words intended as data to a data sink; and after execution of the second command, in dependence on at least one of;
(i) a predefined condition, (ii) the second command, and (iii) a result of an execution of the second command, one of;
(i) the preserved system state prior to interruption of the first command is restored, (ii) another preserved system state is restored, or (iii) the processing is assumed by an application without restoring the prior system state.
-
Specification