Multimodal information inputting method and apparatus for embodying the same
First Claim
1. A multimodal information inputting apparatus comprising:
- display means;
object-voice correspondence acquiring means for recognizing a pointed object based on both a movement of a cursor on said display means depending upon operation of said pointing device and a voice produced in parallel to said operation when a pointing device is operated to select objects being displayed on said display means; and
command generating means for generating a command for an application program based on correspondence information between said object and said voice obtained by said object-voice correspondence acquiring means,wherein said object-voice correspondence acquiring means includes,spoken language recognizing means for recognizing a language included in said voice based on voice information associated with said voice produced and recognizing a starting time and an ending time of said language;
reference recognizing means for recognizing objects as referent candidates for said voice containing a demonstrative word based on operation information associated with operation of said pointing device; and
merging means for retrieving an object corresponding to said voice containing said demonstrative word from said referent candidates, and merging information associated with said object with information associated with said voice corresponding to said object, andwherein said referent recognizing means selects said object as said referent candidate for said voice containing said demonstrative word if a moving speed of a cursor in a region of said object has a local minimum value which is less than a predetermined speed.
1 Assignment
0 Petitions
Accused Products
Abstract
A command for application program is generated based on both a movement of a cursor on a display unit depending upon operation of a pointing device and a voice produced in parallel to the operation when the pointing device is operated to select an object being displayed on the display unit connected to a computer. Particularly, if a moving speed of the cursor in a region of the object has a local minimum value which is less than a predetermined speed, the object is selected as a referent candidate for the voice containing a demonstrative word. If a plurality of referent candidates each having the local minimum value less than the predetermined speed are present for the voice containing the demonstrative word, such object is recognized as a referent for the voice that a time period during when the cursor moves in a region overlaps at maximum with a time period during when the voice is produced.
84 Citations
9 Claims
-
1. A multimodal information inputting apparatus comprising:
-
display means; object-voice correspondence acquiring means for recognizing a pointed object based on both a movement of a cursor on said display means depending upon operation of said pointing device and a voice produced in parallel to said operation when a pointing device is operated to select objects being displayed on said display means; and command generating means for generating a command for an application program based on correspondence information between said object and said voice obtained by said object-voice correspondence acquiring means, wherein said object-voice correspondence acquiring means includes, spoken language recognizing means for recognizing a language included in said voice based on voice information associated with said voice produced and recognizing a starting time and an ending time of said language; reference recognizing means for recognizing objects as referent candidates for said voice containing a demonstrative word based on operation information associated with operation of said pointing device; and merging means for retrieving an object corresponding to said voice containing said demonstrative word from said referent candidates, and merging information associated with said object with information associated with said voice corresponding to said object, and wherein said referent recognizing means selects said object as said referent candidate for said voice containing said demonstrative word if a moving speed of a cursor in a region of said object has a local minimum value which is less than a predetermined speed. - View Dependent Claims (2)
-
-
3. A multimodal information inputting method comprising the step of:
generating a command for an application program based on both a movement of a cursor on a display means in compliance with operation of a pointing device and a voice produced in parallel to said operation when said pointing device is operated to select an object being displayed on said display means which is connected to a computer, wherein said object is selected as a referent candidate for said voice containing a demonstrative word if a moving speed of said cursor in a region of said object has a local minimum value and said local minimum value is being less than a predetermined speed. - View Dependent Claims (4, 5, 6, 7, 8, 9)
Specification