Speech recognition accuracy in a multimodal input system
First Claim
1. A speech recognition method for use in a complementary multimodal input system, the method comprising the steps of:
- receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.
77 Citations
31 Claims
-
1. A speech recognition method for use in a complementary multimodal input system, the method comprising the steps of:
-
receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
using the speech recognition method according to any preceding claim to generate recognised words as a first modality input; and
processing the recognised words and the further modality input or each further modality input in accordance with rules to generate an input for a process.
-
-
12. Speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising:
-
receiving means for receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
identifying means for identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
recognition means for recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said recognition means is adapted to use said models each comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
speech input means for inputting speech as the first modality input;
speech digitizing means for digitizing the input speech;
further modality input means for inputting the data in the at least one further modality;
the speech recognition apparatus according to any one of claims 12 to 20 for generating recognised words using the digitised speech and the data in the at least one further modality; and
processing means for processing the recognized words and the further modality input or each further modality input in accordance with rules to generate an input for a process.
-
-
22. A processing system for implementing a process, the system comprising:
the multimodal input system according to claim 21 for generating an input; and
processing means for processing the generated input.
-
23. A method of recognising speech using multimodal input data, the method comprising the steps of:
-
receiving multimodal data comprising speech data and data in at least further modality; and
recognising words by comparing features in the speech data and features in the further modality data with word models having states for the recognition of speech, wherein said word models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data. - View Dependent Claims (24)
-
-
25. A speech recognition apparatus, comprising:
-
a receiver operable to receive multimodal data comprising speech data and data in at least one further modality; and
a recogniser operable to recognise words by comparing features in the speech data and features in the further modality data with states in word models having states for the recognition of speech, each word model comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input. - View Dependent Claims (26)
-
-
27. A speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising:
-
a receiver operable to receive a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
an identifier operable to identify at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
a recogniser operable to recognise words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, the models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.
-
-
28. A method of recognising data using multimodal input data, the method comprising the steps of:
-
receiving multimodal data comprising data input using a plurality of modalities; and
recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data. - View Dependent Claims (30, 31)
-
-
29. An apparatus for recognising data using multimodal input data, the apparatus comprising:
-
receiving means for receiving multimodal data comprising data input using a plurality of modalities; and
recognising means for recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.
-
Specification