Speech recognition accuracy in a multimodal input system

US 6,823,308 B2
Filed: 02/16/2001
Issued: 11/23/2004
Est. Priority Date: 02/18/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition method for use in a complementary multimodal input system, the method comprising the steps of:

receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;

identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and

recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.

77 Citations

View as Search Results

31 Claims

1. A speech recognition method for use in a complementary multimodal input system, the method comprising the steps of:
- receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
  
  identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
  
  recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A speech recognition method according to claim 1, wherein said models of words are organised in a net of words in accordance with grammar rules.
  - 3. A speech recognition method according to claim 1, wherein said data in said at least one further modality input comprises data identifying events.
  - 4. A speech recognition method according to claim 1, wherein the words are recognised by sequentially comparing identified features in the speech with the states in a first dimension and also comparing identified features in the further modality input or each further modality input in the further dimension or each respective further dimension to try to reach a final state.
  - 5. A speech recognition method according to claim 1, wherein the states in the models for the recognition of speech comprise states of Hidden Markov models.
  - 6. A speech recognition method according to claim 1, wherein said identified features define events in the further modality input or each further modality input.
  - 7. A speech recognition method according to claim 6, wherein said events comprise pointing events comprising one or more actions.
  - 8. A speech recognition method according to claim 1, wherein said states have probabilities associated therewith and the recognition step comprises comparing the identified features with the states to determine a word with the highest probability at a final state.
  - 9. Program code for controlling a processor to implement the method of claim 1.
  - 10. A carrier medium carrying the program code according to claim 9.
  - 11. A multimodal input method comprising:

12. Speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising:
- receiving means for receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
  
  identifying means for identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
  
  recognition means for recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said recognition means is adapted to use said models each comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. Speech recognition apparatus according to claim 12, including storage means for storing said models.
  - 14. Speech recognition apparatus according to claim 12, wherein said recognition means is adapted to use said models organised in a net of words in accordance with grammar rules.
  - 15. Speech recognition apparatus according to claim 12, wherein said receiving means is adapted to receive said data in said at least one further modality input comprising data identifying events.
  - 16. Speech recognition apparatus according to claim 12, wherein said recognition means is adapted to recognise the words by sequentially comparing identified features in the speech with the states in a first dimension and also by comparing identified features in the further modality input or each further modality input in the further dimension or each respective further dimension to try to reach a final state.
  - 17. Speech recognition apparatus according to claim 12, wherein said recognition means is adapted to use states of Hidden Markov models as the states in the models for the recognition of speech.
  - 18. Speech recognition apparatus according to claim 12, wherein said identifying means is adapted to identify said features defining events in the further modality input or each further modality input.
  - 19. Speech recognition apparatus according to claim 18, wherein said events comprise pointing events comprising one or more actions.
  - 20. Speech recognition apparatus according to claim 12, wherein said recognition means is adapted to use said models wherein said states have probabilities associated therewith, and to compare the identified features with the states to determine a word with the highest probability at a final state.
  - 21. A multimodal input system comprising:
22. A processing system for implementing a process, the system comprising:
- the multimodal input system according to claim 21 for generating an input; and
  
  processing means for processing the generated input.

23. A method of recognising speech using multimodal input data, the method comprising the steps of:
- receiving multimodal data comprising speech data and data in at least further modality; and
  
  recognising words by comparing features in the speech data and features in the further modality data with word models having states for the recognition of speech, wherein said word models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.
- View Dependent Claims (24)
- - 24. A speech recognition method according to claim 23, wherein the step of recognising words comprises sequentially comparing features in the speech data with the states in a first dimension and comparing features in the data of the further modality or in each further modality with states in the further dimension or each respective further dimension to try to reach a final state.

25. A speech recognition apparatus, comprising:
- a receiver operable to receive multimodal data comprising speech data and data in at least one further modality; and
  
  a recogniser operable to recognise words by comparing features in the speech data and features in the further modality data with states in word models having states for the recognition of speech, each word model comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input.
- View Dependent Claims (26)
- - 26. A speech recognition apparatus according to claim 25, wherein said recogniser means is adapted to recognise words by sequentially comparing features in the speech data with the states in a first dimension and by comparing features in the data in the further modality or each further modality with states in the further dimension or each respective further dimension to try to reach a final state.

27. A speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising:
- a receiver operable to receive a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input;
  
  an identifier operable to identify at least one feature in the speech and at least one feature in the data in said at least one further modality input; and
  
  a recogniser operable to recognise words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, the models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.

28. A method of recognising data using multimodal input data, the method comprising the steps of:
- receiving multimodal data comprising data input using a plurality of modalities; and
  
  recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.
- View Dependent Claims (30, 31)
- - 30. Program code for controlling a processor to implement the method of claim 28.
  - 31. A carrier medium carrying the program code according to claim 30.

29. An apparatus for recognising data using multimodal input data, the apparatus comprising:
- receiving means for receiving multimodal data comprising data input using a plurality of modalities; and
  
  recognising means for recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Fortescue, Nicolas David, Keiller, Robert Alexander
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A

Application Number

US09/783,971
Publication Number

US 20010037201A1
Time in Patent Office

1,376 Days
Field of Search

704/200, 704/220, 704/240, 704/256, 704/9, 704/232, 707/3
US Class Current

704/256
CPC Class Codes

G10L 15/24 Speech recognition using no...

G10L 15/32 Multiple recognisers used i...

Speech recognition accuracy in a multimodal input system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

77 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition accuracy in a multimodal input system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links