Information processing device and information processing method

US 10,523,975 B2
Filed: 07/01/2014
Issued: 12/31/2019
Est. Priority Date: 07/19/2013
Status: Active Grant

First Claim

Patent Images

1. An information processing device comprising:

a non-transitory computer-readable medium;

a transmitter configured to transmit image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area, and wherein the transmitter is further configured to transmit the requested voice data associated with the one object that corresponds to the display area; and

circuitry configured to generate a voice file including each of the voice data of the plurality of the objects,wherein the transmitter is further configured to transmit file specifying information used to specify the voice file of each of the objects and to transmit the voice file including the voice data associated with the one object corresponding to the display area, the voice data being requested based further on the file specifying information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates to an information processing device and information processing method capable of recognizing an acquisition position of voice data on an image. A web server transmits image frame size information indicating image frame size of image data and audio position information indicating acquisition position of voice data. The present disclosure is applicable to an information processing system or other like system including file generation device, web server, and video playback terminal to perform tiled streaming using a manner compliant with moving picture experts group phase-dynamic adaptive streaming over HTTP (MPEG-DASH).

10 Citations

View as Search Results

15 Claims

1. An information processing device comprising:
- a non-transitory computer-readable medium;
  
  a transmitter configured to transmit image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area, and wherein the transmitter is further configured to transmit the requested voice data associated with the one object that corresponds to the display area; and
  
  circuitry configured to generate a voice file including each of the voice data of the plurality of the objects,wherein the transmitter is further configured to transmit file specifying information used to specify the voice file of each of the objects and to transmit the voice file including the voice data associated with the one object corresponding to the display area, the voice data being requested based further on the file specifying information.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The information processing device according to claim 1,wherein the image frame size information is configured as information indicating an angle of view in horizontal and vertical directions of the image data and a distance between a base point of the angle of view and an image plane.
  - 3. The information processing device according to claim 1,wherein the voice position information is configured as information indicating an angle in horizontal and vertical directions of a line connecting a position in which the voice data is acquired and a base point and a distance between the position in which the voice data is acquired and the base point.
  - 4. The information processing device according to claim 1, further comprising:
    - circuitry configured to;
      
      generate a metadata file of the voice data including the voice position information; and
      
      generate, as control information, information used to specify the image frame size information and the metadata file,wherein the transmitter is configured to transmit the generated control information and the generated metadata file.
  - 5. The information processing device according to claim 1, wherein the circuitry is further configured to:
    - generate data position information indicating a position of each of the objects in the voice file of the voice data.
  - 6. The information processing device according to claim 1,wherein the display area is composed of a number of tiles that is less than a total number of the tiles of the image.

7. An information processing method comprising:
- transmitting image frame size information and voice position information by an information processing device, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles, and wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area;
  
  transmitting by the information processing device, the requested voice data associated with the one object that corresponds to the display area; and
  
  generating a voice file including each of the voice data of the plurality of the objects,wherein file specifying information used to specify the voice file of each of the objects, and the voice file including the voice data associated with the one object corresponding to the display area, are further being transmitted by the information processing device, and the voice data is requested based further on the file specifying information.

8. An information processing device comprising:
- a non-transitory computer-readable medium;
  
  a receiver configured to receive image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles; and
  
  circuitry configured to determine the acquisition position of the voice data on the image corresponding to the image data based on the image frame size information of the image data and the voice position information received by the receiver,wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area,wherein the information processing device is further configured to obtain the requested voice data associated with the one object that corresponds to the display area,wherein the circuitry is further configured to determine an acquisition position of the voice data of each of the objects on the image corresponding to the image data based on the image frame size information and the voice position information of each of the objects,wherein the circuitry is further configured to select voice data of the object corresponding to the display area that is the uniquely identifiable sub area to be displayed in the image corresponding to the image data based on the determined acquisition position of the voice data of each of the objects, wherein the receiver is further configured to obtain the voice data selected by the voice selector by receiving, at the receiver, the selected voice data, andwherein the voice data of the object is contained in a voice file including the voice data of a plurality of the objects and data position information indicating a position of each of the objects in the voice file of the voice data.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The information processing device according to claim 8,wherein the image frame size information is configured as information indicating an angle of view in horizontal and vertical directions of the image data and a distance between a base point of the angle of view and an image plane.
  - 10. The information processing device according to claim 8,wherein the voice position information is configured as information indicating an angle in horizontal and vertical directions of a line connecting a position in which the voice data is acquired and a base point, and a distance between the position in which the voice data is acquired and the base point.
  - 11. The information processing device according to claim 8,wherein the receiver is further configured to receive the voice file including voice data of the object selected by the voice selector among voice files including each of voice data of the plurality of the objects.
  - 12. The information processing device according to claim 8, further comprising:
    - circuitry configured to synthesize voice data of the plurality of the objects based on the determined acquisition position of the voice data of each of the objects,wherein the receiver obtains the voice data selected by the voice selector by receiving, from the voice synthesis processor, the selected voice data.
  - 13. The information processing device according to claim 12, further comprising:
    - circuitry configured to convert a size of image data in the display area that is the uniquely identifiable sub area to be displayed in the image corresponding to the image data; and
      
      circuitry configured to determine acquisition position of the voice data of each of the objects on the image corresponding to image data of the display area having the size converted by the converter, based on the image frame size information of the image data, the voice position information of each of the objects, and image frame size information of the display area.
  - 14. The information processing device according to claim 13,wherein the image frame size information of the display area is configured as information indicating an angle of view in horizontal and vertical directions of the display area and a distance between a base point of the angle of view and an image plane.

15. An information processing method comprising:
- receiving image frame size information and voice position information, the image frame size information indicating an image frame size of image data, and the voice position information indicating an acquisition position of voice data corresponding to a display area that is a uniquely identifiable sub area to be displayed in an image corresponding to the image data, the uniquely identifiable sub area being a portion of the image that is less than a total area of the image, and the voice data being requested based on the image frame size information and the voice position information, wherein the image is partitioned into a plurality of tiles, and the display area is composed of one or more of the tiles;
  
  determining the acquisition position of the voice data on the image corresponding to the image data, based on the received image frame size information of the image data and the received voice position information, wherein the image includes a plurality of objects, and the voice data is associated with one object, of the plurality of the objects, that corresponds to the display area;
  
  obtaining the requested voice data associated with the one object that corresponds to the display area;
  
  determining an acquisition position of the voice data of each of the objects on the image corresponding to the image data based on the image frame size information and the voice position information of each of the objects; and
  
  selecting voice data of the object corresponding to the display area that is the uniquely identifiable sub area to be displayed in the image corresponding to the image data based on the determined acquisition position of the voice data of each of the objects, wherein the voice data selected by the voice selector is obtained by receiving the selected voice data,wherein the voice data of the object is contained in a voice file including the voice data of a plurality of the objects and data position information indicating a position of each of the objects in the voice file of the voice data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Hattori, Shinobu, Hirabayashi, Mitsuhiro, Nakagami, Ohji, Chinen, Toru, Shi, Runyu, Tsuji, Minoru, Yamamoto, Yuki
Primary Examiner(s)
Bruckart, Benjamin R
Assistant Examiner(s)
Doshi, Akshay

Application Number

US14/904,232
Publication Number

US 20160156944A1
Time in Patent Office

2,009 Days
Field of Search
US Class Current
CPC Class Codes

H04N 21/23418   involving operations for an...

H04N 21/23614   Multiplexing of additional ...

H04N 21/2368   Multiplexing of audio and v...

H04N 21/4728   for selecting a Region Of I...

H04N 21/8106   involving special audio dat...

H04N 21/816   involving special video dat...

H04N 21/84   Generation or processing of...

H04N 21/8456   by decomposing the content ...

H04N 21/85406   involving a specific file f...

H04N 5/765   Interface circuits between ...

Information processing device and information processing method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

10 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Information processing device and information processing method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links