Process for automatic control of one or more devices by voice commands or by real-time voice dialog and apparatus for carrying out this process
First Claim
1. A process for the automatic control of one or several devices by speech command or by speech dialog in real-time operation, wherein:
- entered speech commands are recognized by a speaker-independent compound-word speech recognizer and a speaker-dependent speech recognizer and are classified according to their recognition probability;
recognized, admissible speech commands are checked for their plausibility, and the admissible and plausible speech command with the highest recognition probability is identified as the entered speech command, and the functions assigned to this speech command of the device or devices or responses of the speech dialog system are initiated or generated;
a process wherein;
one of the speech commands and the speech dialogs is formed or controlled, respectively on the basis of at least one syntax structure, at least one base command vocabulary and, if necessary, at least one speaker-specific additional command vocabulary;
the syntax structure and the base command vocabulary are provided in speaker-independent form and are fixed during real-time operation;
the speaker-specific additional command vocabulary or vocabularies are entered or changed by the respective speaker in that during training phases during or outside of the real-time operation, the speech recognizer that operates on the basis of a speaker-dependent recognition method is trained by the respective speaker through single or multiple input of the additional commands for the speech-specific features of the respective speaker;
in the real-time operation, the speech dialog or the control of the device or devices takes place as follows;
speech commands spoken in by the respective speaker are transmitted to a speaker-independent compound-word recognizer operating on the basis of phonemes or whole-word models and to the speaker-dependent speech recognizer, where they are respectively subjected to a feature extraction and are examined and classified in the compound-word speech recognizer with the aid of the features extracted there to determine the existence of base commands from the respective base command vocabulary according to the respectively specified syntax structure, and are examined and classified in the speaker-dependent speech recognizer with the aid of the features extracted there to determine the existence of additional commands from the respective additional command vocabulary;
the commands that have been classified as recognized with a certain probability and the syntax structures of the two speech recognizers are then joined to form hypothetical speech commands, and that these are examined and classified according to the specified syntax structure as to their reliability and recognition probability;
the admissible hypothetical speech commands are subsequently examined as to their plausibility on the basis of predetermined criteria, and that among the hypothetical speech commands recognized as plausible, the one with the highest recognition probability is selected and is identified as the speech command entered by the respective speaker;
that subsequently a function or functions assigned to the identified speech command of the respective device to be controlled are initiated or a response or responses are generated in accordance with a specified speech dialog structure for continuing the speech dialog.
5 Assignments
0 Petitions
Accused Products
Abstract
A speech dialog system wherein a process for automatic control of devices by speech dialog is used applying methods of speech input, speech signal processing and speech recognition, syntatical-grammatical postediting as well as dialog, executive sequencing and interface control, and which is characterized in that syntax and command structures are set during real-time dialog operation; preprocessing, recognition and dialog control are designed for operation in a noise-encumbered environment; no user training is required for recognition of general commands; training of individual users is necessary for recognition of special commands; the input of commands is done in linked form, the number of words used to form a command for speech input being variable; a real-time processing and execution of the speech dialog is established; and the speech input and output is done in the hands-free mode.
-
Citations
57 Claims
-
1. A process for the automatic control of one or several devices by speech command or by speech dialog in real-time operation, wherein:
-
entered speech commands are recognized by a speaker-independent compound-word speech recognizer and a speaker-dependent speech recognizer and are classified according to their recognition probability;
recognized, admissible speech commands are checked for their plausibility, and the admissible and plausible speech command with the highest recognition probability is identified as the entered speech command, and the functions assigned to this speech command of the device or devices or responses of the speech dialog system are initiated or generated;
a process wherein;
one of the speech commands and the speech dialogs is formed or controlled, respectively on the basis of at least one syntax structure, at least one base command vocabulary and, if necessary, at least one speaker-specific additional command vocabulary;
the syntax structure and the base command vocabulary are provided in speaker-independent form and are fixed during real-time operation;
the speaker-specific additional command vocabulary or vocabularies are entered or changed by the respective speaker in that during training phases during or outside of the real-time operation, the speech recognizer that operates on the basis of a speaker-dependent recognition method is trained by the respective speaker through single or multiple input of the additional commands for the speech-specific features of the respective speaker;
in the real-time operation, the speech dialog or the control of the device or devices takes place as follows;
speech commands spoken in by the respective speaker are transmitted to a speaker-independent compound-word recognizer operating on the basis of phonemes or whole-word models and to the speaker-dependent speech recognizer, where they are respectively subjected to a feature extraction andare examined and classified in the compound-word speech recognizer with the aid of the features extracted there to determine the existence of base commands from the respective base command vocabulary according to the respectively specified syntax structure, and are examined and classified in the speaker-dependent speech recognizer with the aid of the features extracted there to determine the existence of additional commands from the respective additional command vocabulary;
the commands that have been classified as recognized with a certain probability and the syntax structures of the two speech recognizers are then joined to form hypothetical speech commands, and that these are examined and classified according to the specified syntax structure as to their reliability and recognition probability;
the admissible hypothetical speech commands are subsequently examined as to their plausibility on the basis of predetermined criteria, and that among the hypothetical speech commands recognized as plausible, the one with the highest recognition probability is selected and is identified as the speech command entered by the respective speaker;
that subsequently a function or functions assigned to the identified speech command of the respective device to be controlled are initiated or a response or responses are generated in accordance with a specified speech dialog structure for continuing the speech dialog. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. An apparatus for the automatic control of one or several devices by speech commands or by speech dialog in real-time operation, wherein the entered speech commands are recognized by a speaker-independent compound-word speech recognizer and a speaker-dependent speech recognizer and are classified according to their recognition probability, recognized, admissible speech commands are checked for their plausibility, the admissible and plausible speech command with the highest recognition probability is identified as the entered speech command, and the functions associated with the identified speech command for the device or devices, or the responses of the speech dialog system are initiated or generated;
- with the apparatus including a voice input/output unit that is connected via a speech signal preprocessing unit with a speech recognition unit, which in turn is connected to a sequencing control, a dialog control, and an interface control, and wherein the speech recognition unit consists of a speaker independent compound-word recognizer and a speaker-dependent additional speech recognizer, which are both connected on the output side with a unit for combined syntactical-grammatical or semantical post processing, with said unit being linked to the sequencing control, the dialog control, and the interface control.
- View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
Specification