AUDIO, VIDEO, SIMULATION, AND USER INTERFACE PARADIGMS

US 20140347272A1
Filed: 08/13/2014
Published: 11/27/2014
Est. Priority Date: 09/15/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

acquiring activity data of a user located within range of one or more sensors, the one or more sensors associated with a display screen displaying video content, the displayed video content having a volume level, the activity data being in the form of one or more temporal audio or video samples;

analyzing, using at least one processor operatively coupled with a memory, one or more of the temporal video samples to determine if the user has looked away from the display screen for a first predetermined period of time by searching successive images of the temporal video samples to detect a presence or absence of a frontal face corresponding to the user; and

analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the display screen for a second predetermined period of time by searching a plurality of successive subsets of the temporal audio samples having a predetermined duration to detect a presence or absence of the user'"'"'s voice activity in each subset, and determining whether or not the user'"'"'s voice activity is present in a predetermined consecutive number of the subsets at a volume greater than a predetermined level.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.

Citations

20 Claims

1. A method comprising:
- acquiring activity data of a user located within range of one or more sensors, the one or more sensors associated with a display screen displaying video content, the displayed video content having a volume level, the activity data being in the form of one or more temporal audio or video samples;
  
  analyzing, using at least one processor operatively coupled with a memory, one or more of the temporal video samples to determine if the user has looked away from the display screen for a first predetermined period of time by searching successive images of the temporal video samples to detect a presence or absence of a frontal face corresponding to the user; and
  
  analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the display screen for a second predetermined period of time by searching a plurality of successive subsets of the temporal audio samples having a predetermined duration to detect a presence or absence of the user'"'"'s voice activity in each subset, and determining whether or not the user'"'"'s voice activity is present in a predetermined consecutive number of the subsets at a volume greater than a predetermined level.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the one or more sensors include a microphone.
  - 3. The method of claim 1, wherein the one or more sensors include a video camera.
  - 4. The method of claim 1, further comprising:
    - automatically pausing the video content based upon a determination that the user has looked away from the display screen for the first predetermined period of time and has not had an emotional response relative to the activity on the screen for the second predetermined period of time.
  - 5. The method of claim 1, wherein analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the screen for the second predetermined period of time uses a noise and speaker recognition module configured to receive audio information from a microphone.
  - 6. The method of claim 1, further comprising:
    - automatically adjusting the volume level of the video content based upon a determination that the user has looked away from the display screen for the first predetermined period of time.
  - 7. The method of claim 1, further comprising:
    - automatically adjusting the volume level of the video content based upon a determination that the user has not had an emotional response relative to the activity on the display screen for the second predetermined period of time.

8. A system comprising:
- one or more sensors configured for acquiring activity data of a user located within range of the one or more sensors associated with a display screen displaying video content, the displayed video content having a volume level, the activity data being in the form of one or more temporal audio or video samples;
  
  a processor configured for analyzing one or more of the temporal video samples to determine if the user has looked away from the display screen for a first predetermined period of time by searching successive images of the temporal video samples to detect a presence or absence of a frontal face corresponding to the user; and
  
  the processor configured for analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the display screen for a second predetermined period of time by searching a plurality of successive subsets of the temporal audio samples having a predetermined duration to detect a presence or absence of the user'"'"'s voice activity in each subset, and determining whether or not the user'"'"'s voice activity is present in a predetermined consecutive number of the subsets at a volume greater than a predetermined level.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the one or more sensors include a microphone.
  - 10. The system of claim 8, wherein the one or more sensors include a video camera.
  - 11. The system of claim 8, wherein the processor is further configured for:
    - automatically pausing the video content based upon a determination by the processor that the user has looked away from the display screen for the first predetermined period of time and has not had an emotional response relative to the activity on the screen for the second predetermined period of time.
  - 12. The system of claim 8, wherein the processor is configured for analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the screen for the second predetermined period of time using a noise and speaker recognition module configured to receive audio information from a microphone.
  - 13. The system of claim 8, wherein the processor is further configured for:
    - automatically adjusting the volume level of the video content based upon a determination by the processor that the user has looked away from the display screen for the first predetermined period of time.
  - 14. The system of claim 8, wherein the processor is further configured for:
    - automatically adjusting the volume level of the video content based upon a determination by the processor that the user has not had an emotional response relative to the activity on the display screen for the second predetermined period of time.

15. A machine-readable non-transitory medium embodying information indicative of instructions for causing one or more machines to perform operations comprising:
- acquiring activity data of a user located within range of one or more sensors, the one or more sensors associated with a display screen displaying video content, the displayed video content having a volume level, the activity data being in the form of one or more temporal audio or video samples;
  
  analyzing one or more of the temporal video samples to determine if the user has looked away from the display screen for a first predetermined period of time by searching successive images of the temporal video samples to detect a presence or absence of a frontal face corresponding to the user; and
  
  analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the display screen for a second predetermined period of time by searching a plurality of successive subsets of the temporal audio samples having a predetermined duration to detect a presence or absence of the user'"'"'s voice activity in each subset, and determining whether or not the user'"'"'s voice activity is present in a predetermined consecutive number of the subsets at a volume greater than a predetermined level.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The medium of claim 15 wherein the one or more sensors include a microphone.
  - 17. The medium of claim 15 wherein the one or more sensors include a video camera.
  - 18. The medium of claim 15 further comprising instructions for:
    - automatically pausing the video content based upon a determination that the user has looked away from the display screen for the first predetermined period of time and has not had an emotional response relative to the activity on the screen for the second predetermined period of time.
  - 19. The medium of claim 15 wherein analyzing one or more of the temporal audio samples to determine if the user has had an emotional response relative to the activity on the screen for the second predetermined period of time uses a noise and speaker recognition module configured to receive audio information from a microphone.
  - 20. The medium of claim 15 further comprising instructions for:
    - automatically adjusting the volume level of the video content based upon a determination that the user has not had an emotional response relative to the activity on the display screen for the second predetermined period of time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Interactive Entertainment Inc. (Sony Group Corp.)
Original Assignee
Sony Computer Entertainment Incorporated (Sony Group Corp.)
Inventors
Hernandez-Abrego, Gustavo, Menendez-Pidal, Xavier, Osman, Steven, Chen, Ruxin, Deshpande, Rishi, Michaud-Wideman, Care, Marks, Richard, Larsen, Eric J., Mao, Xiaodong

Granted Patent

US 9,405,363 B2
Time in Patent Office

Days
Field of Search
US Class Current

345/156
CPC Class Codes

A63F 13/213   comprising photodetecting m...

A63F 13/217   using environment-related i...

A63F 13/428   involving motion or positio...

G06F 2203/011   Emotion or mood input deter...

G06F 3/005   Input arrangements through ...

G06F 3/012   Head tracking input arrange...

G06F 3/015   Input arrangements based on...

G10L 17/00   Speaker identification or v...

G10L 17/04   Training, enrolment or mode...

AUDIO, VIDEO, SIMULATION, AND USER INTERFACE PARADIGMS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUDIO, VIDEO, SIMULATION, AND USER INTERFACE PARADIGMS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links