Multimodal information inputting method and apparatus for embodying the same

US 5,781,179 A
Filed: 09/05/1996
Issued: 07/14/1998
Est. Priority Date: 09/08/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A multimodal information inputting apparatus comprising:

display means;

object-voice correspondence acquiring means for recognizing a pointed object based on both a movement of a cursor on said display means depending upon operation of said pointing device and a voice produced in parallel to said operation when a pointing device is operated to select objects being displayed on said display means; and

command generating means for generating a command for an application program based on correspondence information between said object and said voice obtained by said object-voice correspondence acquiring means,wherein said object-voice correspondence acquiring means includes,spoken language recognizing means for recognizing a language included in said voice based on voice information associated with said voice produced and recognizing a starting time and an ending time of said language;

reference recognizing means for recognizing objects as referent candidates for said voice containing a demonstrative word based on operation information associated with operation of said pointing device; and

merging means for retrieving an object corresponding to said voice containing said demonstrative word from said referent candidates, and merging information associated with said object with information associated with said voice corresponding to said object, andwherein said referent recognizing means selects said object as said referent candidate for said voice containing said demonstrative word if a moving speed of a cursor in a region of said object has a local minimum value which is less than a predetermined speed.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A command for application program is generated based on both a movement of a cursor on a display unit depending upon operation of a pointing device and a voice produced in parallel to the operation when the pointing device is operated to select an object being displayed on the display unit connected to a computer. Particularly, if a moving speed of the cursor in a region of the object has a local minimum value which is less than a predetermined speed, the object is selected as a referent candidate for the voice containing a demonstrative word. If a plurality of referent candidates each having the local minimum value less than the predetermined speed are present for the voice containing the demonstrative word, such object is recognized as a referent for the voice that a time period during when the cursor moves in a region overlaps at maximum with a time period during when the voice is produced.

84 Citations

View as Search Results

9 Claims

1. A multimodal information inputting apparatus comprising:
- display means;
  
  object-voice correspondence acquiring means for recognizing a pointed object based on both a movement of a cursor on said display means depending upon operation of said pointing device and a voice produced in parallel to said operation when a pointing device is operated to select objects being displayed on said display means; and
  
  command generating means for generating a command for an application program based on correspondence information between said object and said voice obtained by said object-voice correspondence acquiring means,wherein said object-voice correspondence acquiring means includes,spoken language recognizing means for recognizing a language included in said voice based on voice information associated with said voice produced and recognizing a starting time and an ending time of said language;
  
  reference recognizing means for recognizing objects as referent candidates for said voice containing a demonstrative word based on operation information associated with operation of said pointing device; and
  
  merging means for retrieving an object corresponding to said voice containing said demonstrative word from said referent candidates, and merging information associated with said object with information associated with said voice corresponding to said object, andwherein said referent recognizing means selects said object as said referent candidate for said voice containing said demonstrative word if a moving speed of a cursor in a region of said object has a local minimum value which is less than a predetermined speed.
- View Dependent Claims (2)
- - 2. A multimodal information inputting apparatus recited in claim 1, wherein said merging means recognizes such a certain referent candidate as a referent for said voice that a time period during when said cursor moves in said region is overlapped at maximum with a time period during when said voice is produced if a plurality of referent candidates each having said local minimum value less than said predetermined speed are present for said voice containing said demonstrative word.

3. A multimodal information inputting method comprising the step of:
- generating a command for an application program based on both a movement of a cursor on a display means in compliance with operation of a pointing device and a voice produced in parallel to said operation when said pointing device is operated to select an object being displayed on said display means which is connected to a computer, wherein said object is selected as a referent candidate for said voice containing a demonstrative word if a moving speed of said cursor in a region of said object has a local minimum value and said local minimum value is being less than a predetermined speed.
- View Dependent Claims (4, 5, 6, 7, 8, 9)
- - 4. A multimodal information inputting method recited in claim 3, wherein such a certain referent candidate is recognized as a referent for said voice that a time period during when said cursor moves in said region is overlapped at maximum with a time period during when said voice is produced if a plurality of referent candidates each having said local minimum value less than said predetermined speed are present for said voice containing said demonstrative word.
  - 5. A multimodal information inputting method recited in claim 4, wherein information associated with said voice produced are stored in a queue in sequence and information associated with said object as said referent candidate are also stored in said queue in sequence, and said information associated with said voice is collated with said information associated with said voice containing said demonstrative word from the head of said queue to recognize said object serving as said referent.
  - 6. A multimodal information inputting method recited in claim 5, wherein said information associated with said voice produced is composed of a language as the result of language recognition, the number of object indicated by said language, a starting time of voiced sound area, and an ending time of voiced sound area.
  - 7. A multimodal information inputting method recited in claim 5, wherein said information associated with said object as said referent candidate is composed of a referent candidate, a region entering time of referent candidate, and a region leaving time of said referent candidate.
  - 8. A multimodal information inputting method recited in claim 4, wherein a merged result is derived by correlating information associated with said voice produced with information associated with said object recognized as a referent which can be correlated with said voice.
  - 9. A multimodal information inputting method recited in claim 8, wherein said merged result is stored in a stack unless a language associated with a command is included in said merged results, and a command for an application program is generated by means of concerned merged result as well as one or more merged results stored in said stack if said language associated with said command is included in said merged result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
Nakajima, Hideharu, Kato, Tsuneaki
Primary Examiner(s)
Liang, Regina

Application Number

US08/711,694
Time in Patent Office

677 Days
Field of Search

340/825.19, 345/145, 345/157, 345/163, 345/159, 345/158, 395/2.44, 395/2.55, 395/2.56, 395/2.57, 395/2.58, 395/2.59, 704/211, 704/246, 704/251, 704/253, 704/257
US Class Current

345/157
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/038   Control and interface arran...

G06F 3/167   Audio in a user interface, ...

G06F 9/451   Execution arrangements for ...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Multimodal information inputting method and apparatus for embodying the same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

84 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Multimodal information inputting method and apparatus for embodying the same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

84 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others