Synthesis of information from multiple audiovisual sources

US 8,602,887 B2
Filed: 06/03/2010
Issued: 12/10/2013
Est. Priority Date: 06/03/2010
Status: Active Grant

First Claim

Patent Images

1. A method for synthesizing information for a scene from multiple sources, wherein the sources are capture devices, comprising:

a) receiving scene information from a first source and a second source, the first and second sources spatially separated from each other and the scene;

b) determining a position for each of the first and second sources from the scene information and one or more cues detected in common from the scene by the first and second sources;

c) creating a representation of the scene based on the positions of the first and second sources determined in said step b) and the scene information received from the first and second sources.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are disclosed for synthesizing information received from multiple audio and visual sources focused on a single scene. The system may determine the positions of capture devices based on a common set of cues identified in the image data of the capture devices. As a scene may often have users and objects moving into and out of the scene, data from the multiple capture devices may be time synchronized to ensure that data from the audio and visual sources are providing data of the same scene at the same time. Audio and/or visual data from the multiple sources may be reconciled and assimilated together to improve an ability of the system to interpret audio and/or visual aspects from the scene.

59 Citations

View as Search Results

20 Claims

1. A method for synthesizing information for a scene from multiple sources, wherein the sources are capture devices, comprising:
- a) receiving scene information from a first source and a second source, the first and second sources spatially separated from each other and the scene;
  
  b) determining a position for each of the first and second sources from the scene information and one or more cues detected in common from the scene by the first and second sources;
  
  c) creating a representation of the scene based on the positions of the first and second sources determined in said step b) and the scene information received from the first and second sources.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, said step a) of receiving scene information comprising receiving image data from the first and second sources.
  - 3. The method of claim 2, said step a) of receiving image data from the first and second sources comprising the step of time synchronizing the image data received from the first source to the image data received from the second source.
  - 4. The method of claim 2, further comprising the step d) of correcting lens distortion in the image data from the first and second sources prior to said step b) of determining the position for each of the first and second sources.
  - 5. The method of claim 1, said step c) further comprising the step of creating a representation of the scene based on translating the scene information from the first and second sources to a common frame of reference including the first and second sources.
  - 6. The method of claim 1, said step b) comprising the step of determining the position for each of the first and second sources from image depth data included within the scene information.
  - 7. The method of claim 1, said step b) comprising the step of determining the position for each of the first and second sources from RGB data included within the scene information.
  - 8. The method of claim 1, said step a) comprising the step of receiving an audio signal from the scene, the method further comprising the step e) of determining a location of the audio signal based in part on said step b) of determining a position for each of the first and second sources.
  - 9. The method of claim 1, wherein positions of the first and second sources are fixed with respect to each other.

10. A method for synthesizing information for a scene from multiple sources, wherein the sources are capture devices, comprising:
- a) receiving scene information from a first source and a second source, an initial position of the first source being unknown with respect to the second source, the first and second sources spatially separated from each other and the scene, the scene information including at least one of image depth data and RGB data;
  
  b) determining a position for each of the first and second sources from at least one of the image data and RGB data, together with the scene information shared in common from the scene by the first and second sources; and
  
  c) creating a representation of the scene based on the positions of the first and second sources determined in said step b) and the scene information received from the first and second sources.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, said step c) comprising the step of stitching together a first portion of the scene representation from the first source with a second portion of the scene representation from the second source.
  - 12. The method of claim 11, said step of stitching together a first portion of the scene representation with a second portion of the scene representation comprising the step of assigning a confidence value to the first and second portions of the scene and using the first or second portion having the higher confidence value for an area of overlap between the first portion and the second portion.
  - 13. The method of claim 10, said step a) of receiving image data from the first and second sources comprising the step of time synchronizing the image data received from the first source to the image data received from the second source.
  - 14. The method of claim 10, said step a) comprising the step of receiving an audio signal from the scene, the method further comprising the step e) of determining a location of the audio signal based in part on said step b) of determining a position for each of the first and second sources.

15. A method for synthesizing information for a play space in a gaming application from multiple capture devices, capture devices in the multiple capture devices including a depth camera, an RGB camera and at least one microphone, comprising:
- a) receiving image depth data and RGB depth data from a first capture device and a second capture device, the image depth data and the RGB depth data from the first and second capture devices being time synchronized together, the first and second capture devices spatially separated from each other and the play space;
  
  b) determining a position and orientation for each of the first and second capture devices from a combination of the synchronized image depth data and RGB data, together with a plurality of cues detected in common from the play space by the first and second capture devices;
  
  c) creating a representation of the play space based on the positions of the first and the second capture devices determined in said step b) and the image depth data and RGB depth data received from the first and second capture devices;
  
  d) stitching together a first portion of the play space representation from the first capture device with a second portion of the play space representation from the second capture device; and
  
  e) rendering the representation of the play space on a display associated with the first and second capture devices.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, further comprising the step of associating a voice detected by the first and second capture devices with a user by locating a source of the voice and determining the presence of the user at the source of the voice via the image depth data and/or the RGB data.
  - 17. The method of claim 15, further comprising the step of using the image depth data and RGB data from the first capture device to fill in an area of the representation of the play space which was occluded from view from the second capture device.
  - 18. The method of claim 15, said step of rendering the representation of the play space comprising the step of rendering the representation of the play space from a perspective not captured by the first or second capture device or any of the multiple capture devices.
  - 19. The method of claim 15, the first capture device and second capture device together enlarging an area of the play space that could be captured by the first and second capture devices alone.
  - 20. The method of claim 15, the first capture device capturing a wide angle view of the play space at a first resolution and the second capture device capturing an image of a select area of the play space at a second resolution, the second resolution higher than the first resolution.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Tardif, John A., Perez, Kathryn Stone, Kipman, Alex Aben-Athar, Yee, Dawson
Primary Examiner(s)
CUFF, MICHAEL A

Application Number

US12/792,961
Publication Number

US 20110300929A1
Time in Patent Office

1,286 Days
Field of Search

463/30, 463/31, 463/34, 463/40, 463/42
US Class Current

463/30
CPC Class Codes

A63F 13/213   comprising photodetecting m...

A63F 13/215   comprising means for detect...

A63F 13/22   Setup operations, e.g. cali...

A63F 13/424   involving acoustic input si...

A63F 13/428   involving motion or positio...

A63F 13/63   by the player, e.g. authori...

A63F 13/812   Ball games, e.g. soccer or ...

A63F 13/843   involving concurrently two ...

A63F 2300/1018   Calibration; Key and button...

A63F 2300/1087   comprising photodetecting m...

A63F 2300/5553   user representation in the ...

A63F 2300/6072   of an input signal, e.g. pi...

A63F 2300/6081   generating an output signal...

Synthesis of information from multiple audiovisual sources

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

59 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Synthesis of information from multiple audiovisual sources

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others