Information processing device and information processing method
First Claim
Patent Images
1. An information processing device comprising:
- a non-transitory computer-readable medium;
a transmitter configured to transmit image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area, and wherein the transmitter is further configured to transmit the requested voice data associated with the one object that corresponds to the display area; and
circuitry configured to generate a voice file including each of the voice data of the plurality of the objects,wherein the transmitter is further configured to transmit file specifying information used to specify the voice file of each of the objects and to transmit the voice file including the voice data associated with the one object corresponding to the display area, the voice data being requested based further on the file specifying information.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates to an information processing device and information processing method capable of recognizing an acquisition position of voice data on an image. A web server transmits image frame size information indicating image frame size of image data and audio position information indicating acquisition position of voice data. The present disclosure is applicable to an information processing system or other like system including file generation device, web server, and video playback terminal to perform tiled streaming using a manner compliant with moving picture experts group phase-dynamic adaptive streaming over HTTP (MPEG-DASH).
10 Citations
15 Claims
-
1. An information processing device comprising:
-
a non-transitory computer-readable medium; a transmitter configured to transmit image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area, and wherein the transmitter is further configured to transmit the requested voice data associated with the one object that corresponds to the display area; and circuitry configured to generate a voice file including each of the voice data of the plurality of the objects, wherein the transmitter is further configured to transmit file specifying information used to specify the voice file of each of the objects and to transmit the voice file including the voice data associated with the one object corresponding to the display area, the voice data being requested based further on the file specifying information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An information processing method comprising:
-
transmitting image frame size information and voice position information by an information processing device, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, and wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area; transmitting by the information processing device, the requested voice data associated with the one object that corresponds to the display area; and generating a voice file including each of the voice data of the plurality of the objects, wherein file specifying information used to specify the voice file of each of the objects, and the voice file including the voice data associated with the one object corresponding to the display area, are further being transmitted by the information processing device, and the voice data is requested based further on the file specifying information.
-
-
8. An information processing device comprising:
-
a non-transitory computer-readable medium; a receiver configured to receive image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles; and circuitry configured to determine the acquisition position of the voice data on the image corresponding to the image data based on the image frame size information of the image data and the voice position information received by the receiver, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area, wherein the information processing device is further configured to obtain the requested voice data associated with the one object that corresponds to the display area, wherein the circuitry is further configured to determine an acquisition position of the voice data of each of the objects on the image corresponding to the image data based on the image frame size information and the voice position information of each of the objects, wherein the circuitry is further configured to select voice data of the object corresponding to the display area that is the uniquely identifiable sub area to be displayed in the image corresponding to the image data based on the determined acquisition position of the voice data of each of the objects, wherein the receiver is further configured to obtain the voice data selected by the voice selector by receiving, at the receiver, the selected voice data, and wherein the voice data of the object is contained in a voice file including the voice data of a plurality of the objects and data position information indicating a position of each of the objects in the voice file of the voice data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An information processing method comprising:
-
receiving image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles; determining the acquisition position of the voice data on the image corresponding to the image data, based on the received image frame size information of the image data and the received voice position information, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area; obtaining the requested voice data associated with the one object that corresponds to the display area; determining an acquisition position of the voice data of each of the objects on the image corresponding to the image data based on the image frame size information and the voice position information of each of the objects; and selecting voice data of the object corresponding to the display area that is the uniquely identifiable sub area to be displayed in the image corresponding to the image data based on the determined acquisition position of the voice data of each of the objects, wherein the voice data selected by the voice selector is obtained by receiving the selected voice data, wherein the voice data of the object is contained in a voice file including the voice data of a plurality of the objects and data position information indicating a position of each of the objects in the voice file of the voice data.
-
Specification