Multi-modal picture
First Claim
Patent Images
1. A system for presenting information concerning a picture to a user, the system comprising:
- a data store for holding responses, specific to said picture, in respect of specific user queries concerning particular picture features;
a manually-operable feature-selection arrangement for enabling a user to select a feature in a displayed view of the picture, and for providing an output indication regarding what said particular feature, if any, the user has thereby selected;
a voice dialog input-output subsystem including a speech recogniser for interpreting queries from a user;
a control arrangement responsive to a user selecting a said particular feature and asking a specific query regarding that feature, to output the corresponding stored response.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for presenting a multi-modal picture includes picture presentation equipment for displaying an image of the picture and for enabling a user to interact with the picture by selecting a particular picture feature and asking a specific query relating to the feature. A voice browser system controlled according to dialog scripts associated with the picture, determines an appropriate response having regard to the spoken user query and the selected picture feature. Each picture can have multiple narrators associated with it and the can choose which narrator is currently active. Picture authoring apparatus is also provided.
-
Citations
34 Claims
-
1. A system for presenting information concerning a picture to a user, the system comprising:
-
a data store for holding responses, specific to said picture, in respect of specific user queries concerning particular picture features;
a manually-operable feature-selection arrangement for enabling a user to select a feature in a displayed view of the picture, and for providing an output indication regarding what said particular feature, if any, the user has thereby selected;
a voice dialog input-output subsystem including a speech recogniser for interpreting queries from a user;
a control arrangement responsive to a user selecting a said particular feature and asking a specific query regarding that feature, to output the corresponding stored response. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A multi-modal picture specified by data held on at least one data carrier, this data comprising:
-
picture image data for displaying a picture image;
response data indicative of voice responses intended to be given to specific user queries concerning particular picture features of the picture;
first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature in the displayed image; and
second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user using the selection arrangement, which said response is to be used to reply to the user query. - View Dependent Claims (21, 22, 23)
-
-
24. A multi-modal picture according to claim 20, wherein the picture is of a non-topographic real-world scene.
-
24-1. A multi-modal picture comprising a hard-copy picture image, and data held on at least one data carrier, this data comprising:
-
response data indicative of voice responses intended to be given to specific user queries concerning particular picture features;
first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature of the image; and
second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user using the selection arrangement, which said response is to be used to reply to the user query.
-
-
26. A multi-modal picture according to claim 25, wherein the first control data comprises image-map data mapping image coordinates to said particular features.
-
27. A multi-modal picture according to claim 25, wherein the first control data comprises label data arranged to be positioned in or on the image in the region of each said particular picture feature to indicate the identity of that feature.
-
28. A multi-modal picture according to claim 25, wherein at least some of the said responses are associated with a narrator identified in the response data, the second control data enabling the determination of which said response is to be used to reply to the user query to be restricted to those responses associated with a said narrator that has been selected by the user.
-
29. A multi-modal picture according to claim 25, wherein the picture is of a non-topographic real-world scene.
-
30. A method of conveying information about particular features in a picture, the method comprising the steps of:
-
(a) creating the following specifically-associated data;
picture image data for displaying a picture image;
response data indicative of voice responses intended to be given to specific user queries concerning particular picture features of the picture;
first control data for enabling a determination to be made as to which said particular feature in the picture image, if any, a user is selecting when using a selection arrangement to indicate a feature in the displayed image; and
second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user, which said response is to be used to reply to the user query;
(b) using the image data to display an image of the picture;
(c) having a user use a manually-operated selection arrangement to select a feature in the displayed image and using the first control data to determine which said particular feature in the picture image, if any, the user is selecting;
(d) receiving and interpreting a spoken query from the user to determine if a said specific query is being asked; and
(e) using the second control data to determine, on the basis of the said particular feature determined as being selected in step (c) and the said specific query determined as being asked in step (d), which said response is to be used to reply and thereupon using the response data to output the corresponding voice response. - View Dependent Claims (31, 33)
-
-
32. A method of conveying information about particular features in a hard-copy picture, the method comprising the steps of:
-
(a) creating the following specifically-associated data;
response data indicative of voice responses intended to be given to specific user queries concerning particular picture features in said picture;
first control data for enabling a determination to be made as to which said particular feature in the picture, if any, a user is selecting when using a selection arrangement to indicate a feature of the picture; and
second control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by the user, which said response is to be used to reply to the user query;
(b) making the picture and data available to a user;
(c) having the user use a manually-operated selection arrangement to select a feature in the picture and using the first control data to determine which said particular feature in the picture, if any, the user is selecting;
(d) receiving and interpreting a spoken query from the user to determine if a said specific query is being asked; and
(e) using the second control data to determine, on the basis of the said particular feature determined as being selected in step (c) and the said specific query determined as being asked in step (d), which said response is to be used to reply and thereupon using the response data to output the corresponding voice response.
-
-
34. Apparatus for authoring a multi-modal picture, comprising:
-
a first tool for defining image hotspots associated with particular picture-image features;
a second tool with speech recognition capability, for recording user responses input by voice, to user-specified queries each associated with a particular said picture-image feature; and
means for automatically generating control data for determining, on the basis of a spoken user query and on which said particular picture feature is selected by a user, which said response is to be used to reply to the user query.
-
Specification