Multi-modal picture

US 20030112267A1
Filed: 12/06/2002
Published: 06/19/2003
Est. Priority Date: 12/13/2001
Status: Abandoned Application

First Claim

Patent Images

1. A system for presenting information concerning a picture to a user, the system comprising:

a data store for holding responses, specific to said picture, in respect of specific user queries concerning particular picture features;

a manually-operable feature-selection arrangement for enabling a user to select a feature in a displayed view of the picture, and for providing an output indication regarding what said particular feature, if any, the user has thereby selected;

a voice dialog input-output subsystem including a speech recogniser for interpreting queries from a user;

a control arrangement responsive to a user selecting a said particular feature and asking a specific query regarding that feature, to output the corresponding stored response.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for presenting a multi-modal picture includes picture presentation equipment for displaying an image of the picture and for enabling a user to interact with the picture by selecting a particular picture feature and asking a specific query relating to the feature. A voice browser system controlled according to dialog scripts associated with the picture, determines an appropriate response having regard to the spoken user query and the selected picture feature. Each picture can have multiple narrators associated with it and the can choose which narrator is currently active. Picture authoring apparatus is also provided.

Citations

34 Claims

1. A system for presenting information concerning a picture to a user, the system comprising:
- a data store for holding responses, specific to said picture, in respect of specific user queries concerning particular picture features;
  
  a manually-operable feature-selection arrangement for enabling a user to select a feature in a displayed view of the picture, and for providing an output indication regarding what said particular feature, if any, the user has thereby selected;
  
  a voice dialog input-output subsystem including a speech recogniser for interpreting queries from a user;
  
  a control arrangement responsive to a user selecting a said particular feature and asking a specific query regarding that feature, to output the corresponding stored response.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A system according to claim 1, wherein image-map data is associated with the picture image for mapping image coordinates to said particular features, the selection arrangement being arranged to use the image-map data to determine what picture feature is selected by the user.
  - 3. A system according to claim 1, wherein said image includes label data positioned in the region of a said particular picture feature to indicate the identity of that feature, the selection arrangement being arranged to read the label data to determine what picture feature is selected by the user.
  - 4. A system according to claim 1, further comprising a display subsystem for displaying said image provided to it in the form of digital image data.
  - 5. A system according to claim 1, wherein the picture image is a hard-copy image.
  - 6. A system according to claim 1, wherein the data store is arranged also to hold responses concerning general queries that are not associated with any particular picture feature, the control arrangement being arranged to respond to the user voice input of a general query by returning the appropriate response.
  - 7. A system according to claim 1, wherein the control arrangement comprises processing means for processing decision logic code associated with the picture.
  - 8. A system according to claim 7, wherein said decision logic code and said responses are included in a common file.
  - 9. A system according to claim 7, wherein the control arrangement comprises a dialog manager of a multi-modal voice browser, the voice browser including said voice dialog input-output subsystem.
  - 10. A system according to claim 9, wherein the selection arrangement is arranged to provide key,value pairs to said voice browser to indicate when a user has selected a said particular feature.
  - 11. A system according to claim 1, wherein the voice interface subsystem and control arrangement are arranged to cooperate in recognising multiple different queries in respect of a particular picture feature selected by the user.
  - 12. A system according to claim 1, wherein each of at least some of the responses is associated with a specified narrator, the system being arranged to permit a user to receive only the response of a user-selected narrator in respect of at least one query.
  - 13. A system according to claim 12, wherein user selection of a narrator is arranged to be effected by voice input, the control arrangement being arranged to respond to the selection of a particular specified narrator by user voice input, by using only the said responses associated with that narrator in providing a response to a user query.
  - 14. A system according to claim 13, further comprising means for displaying, along with said picture image, identifiers of narrators associated with the picture.
  - 15. A system according to claim 13, further comprising a display subsystem for displaying said image provided to it in the form of digital image data and identifiers of narrators associated with the picture;
    - the display subsystem being arranged to respond to selection of a particular specified narrator by user voice input by indicating on the displayed image the said particular features for which responses are available concerning that narrator.
  - 16. A system according to claim 12, further comprising a display subsystem for displaying said image provided to it in the form of digital image data, and identifiers of narrators associated with the picture;
    - user selection of a narrator being arranged to be effected by the user using said selection arrangement to select a displayed narrator identifier, and the control arrangement being arranged to respond to the selection of a particular narrator by using only the said responses associated with that narrator in providing a response to a user query.
  - 17. A system according to claim 16, wherein the display subsystem is arranged to respond to selection of a particular specified narrator to indicate on the displayed image the said particular features for which responses are available concerning the currently-selected narrator.
  - 18. A system according to claim 1, wherein said selection arrangement is a pointing arrangement usable by the user to point to a feature of interest in a displayed view of the picture.
  - 19. A system according to claim 16, wherein said selection arrangement is a pointing arrangement usable by the user to point to a feature of interest in a displayed view of the picture.

20. A multi-modal picture specified by data held on at least one data carrier, this data comprising:
- picture image data for displaying a picture image;
  
  response data indicative of voice responses intended to be given to specific user queries concerning particular picture features of the picture;
  
  first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature in the displayed image; and
  
  second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user using the selection arrangement, which said response is to be used to reply to the user query.
- View Dependent Claims (21, 22, 23)
- - 21. A multi-modal picture according to claim 20, wherein the first control data comprises image-map data mapping image coordinates to said particular features.
  - 22. A multi-modal picture according to claim 20, wherein the first control data comprises label data arranged to be positioned in the displayed image in the region of each said particular picture feature to indicate the identity of that feature.
  - 23. A multi-modal picture according to claim 20, wherein at least some of the said responses are associated with a narrator identified in the response data, the second control data enabling the determination of which said response is to be used to reply to the user query to be restricted to those responses associated with a said narrator that has been selected by the user.

24. A multi-modal picture according to claim 20, wherein the picture is of a non-topographic real-world scene.

24-1. A multi-modal picture comprising a hard-copy picture image, and data held on at least one data carrier, this data comprising:
- response data indicative of voice responses intended to be given to specific user queries concerning particular picture features;
  
  first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature of the image; and
  
  second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user using the selection arrangement, which said response is to be used to reply to the user query.

26. A multi-modal picture according to claim 25, wherein the first control data comprises image-map data mapping image coordinates to said particular features.

27. A multi-modal picture according to claim 25, wherein the first control data comprises label data arranged to be positioned in or on the image in the region of each said particular picture feature to indicate the identity of that feature.

28. A multi-modal picture according to claim 25, wherein at least some of the said responses are associated with a narrator identified in the response data, the second control data enabling the determination of which said response is to be used to reply to the user query to be restricted to those responses associated with a said narrator that has been selected by the user.

29. A multi-modal picture according to claim 25, wherein the picture is of a non-topographic real-world scene.

30. A method of conveying information about particular features in a picture, the method comprising the steps of:
- (a) creating the following specifically-associated data;
  
  picture image data for displaying a picture image;
  
  response data indicative of voice responses intended to be given to specific user queries concerning particular picture features of the picture;
  
  first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature in the displayed image; and
  
  second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user, which said response is to be used to reply to the user query;
  
  (b) using the image data to display an image of the picture;
  
  (c) having a user use a manually-operated selection arrangement to select a feature in the displayed image and using the first control data to determine which said particular feature in the picture image, if any, the user is selecting;
  
  (d) receiving and interpreting a spoken query from the user to determine if a said specific query is being asked; and
  
  (e) using the second control data to determine, on the basis of the said particular feature determined as being selected in step (c) and the said specific query determined as being asked in step (d), which said response is to be used to reply and thereupon using the response data to output the corresponding voice response.
- View Dependent Claims (31, 33)
- - 31. A method according to claim 30, wherein said selection arrangement is a pointing arrangement usable by the user to point to a feature of interest in a displayed view of the picture.
  - 33. A method according to claim 31, wherein said selection arrangement is a pointing arrangement usable by the user to point to a feature of interest in a displayed view of the picture.

32. A method of conveying information about particular features in a hard-copy picture, the method comprising the steps of:
- (a) creating the following specifically-associated data;
  
  response data indicative of voice responses intended to be given to specific user queries concerning particular picture features in said picture;
  
  first control data for enabling a determination to be made as to which said particular feature in the picture, if any, a user is selecting when using a selection arrangement to indicate a feature of the picture; and
  
  second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user, which said response is to be used to reply to the user query;
  
  (b) making the picture and data available to a user;
  
  (c) having the user use a manually-operated selection arrangement to select a feature in the picture and using the first control data to determine which said particular feature in the picture, if any, the user is selecting;
  
  (d) receiving and interpreting a spoken query from the user to determine if a said specific query is being asked; and
  
  (e) using the second control data to determine, on the basis of the said particular feature determined as being selected in step (c) and the said specific query determined as being asked in step (d), which said response is to be used to reply and thereupon using the response data to output the corresponding voice response.

34. Apparatus for authoring a multi-modal picture, comprising:
- a first tool for defining image hotspots associated with particular picture-image features;
  
  a second tool with speech recognition capability, for recording user responses input by voice, to user-specified queries each associated with a particular said picture-image feature; and
  
  means for automatically generating control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by a user, which said response is to be used to reply to the user query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Company (HP Inc.)
Inventors
Belrose, Guillaume

Application Number

US10/313,867
Publication Number

US 20030112267A1
Time in Patent Office

Days
Field of Search
US Class Current

345/728
CPC Class Codes

G06F 16/9558   Details of hyperlinks; Mana...

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/038   Control and interface arran...

G06F 3/16   Sound input; Sound output s...

Multi-modal picture

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-modal picture

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links