Audio volume control device, control method and program

US 9,398,247 B2
Filed: 07/19/2012
Issued: 07/19/2016
Est. Priority Date: 07/26/2011
Status: Active Grant

First Claim

Patent Images

1. An information processing apparatus comprising:

an input circuit for reception of capture image data and captured sound data corresponding to an environment in which content is reproduced;

a processor that;

processes the captured image data and the captured sound data corresponding to the environment in which content is reproduced;

detects a user based on the captured image data;

analyzes a situation of the environment based on a result of the detection and the captured sound data;

determines a direction in the captured image data to a source of the captured sound data;

determines if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data; and

controls an audio volume corresponding to reproduced content based on a result of the analyzing,whereinwhen a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value,the processor controls the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is coincident with the location of the face detected in the captured image data, andthe processor controls the audio volume corresponding to the reproduced content to increase when the processor determines that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data,when the processor increases the audio volume when the processor determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, the processor determines a volume increase amount based on the captured image data of a distance between the location of the detected user and the source of the captured sound data, andin an event of a manual adjustment of a setting, the processor once an environmental situation is over automatically returns to a previous setting before the environmental situation occurred.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information processing apparatus includes a processor that receives captured image data and captured sound data corresponding to an environment in which content is reproduced and detects a user based on the captured image data and analyzes a situation of the environment based on a result of the detection and the captured sound data and controls an audio volume corresponding to reproduced content based on a result of the analyzing.

62 Citations

View as Search Results

20 Claims

1. An information processing apparatus comprising:
- an input circuit for reception of capture image data and captured sound data corresponding to an environment in which content is reproduced;
  
  a processor that;
  
  processes the captured image data and the captured sound data corresponding to the environment in which content is reproduced;
  
  detects a user based on the captured image data;
  
  analyzes a situation of the environment based on a result of the detection and the captured sound data;
  
  determines a direction in the captured image data to a source of the captured sound data;
  
  determines if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data; and
  
  controls an audio volume corresponding to reproduced content based on a result of the analyzing,whereinwhen a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value,the processor controls the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is coincident with the location of the face detected in the captured image data, andthe processor controls the audio volume corresponding to the reproduced content to increase when the processor determines that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data,when the processor increases the audio volume when the processor determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, the processor determines a volume increase amount based on the captured image data of a distance between the location of the detected user and the source of the captured sound data, andin an event of a manual adjustment of a setting, the processor once an environmental situation is over automatically returns to a previous setting before the environmental situation occurred.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The information processing apparatus of claim 1, wherein the processor receives the captured image data from a camera positioned in the environment in which content is reproduced and detects the face based on the captured image data.
  - 3. The information processing apparatus of claim 2, whereinthe processor detects a position corresponding to the detected face based on the captured image data.
  - 4. The information processing apparatus of claim 2, whereinthe processor detects a plurality of faces based on the captured image data.
  - 5. The information processing apparatus of claim 2 whereinthe processor determines face information corresponding to the detected face, the face information including at least one of an individual, age and gender.
  - 6. The information processing apparatus of claim 1, whereinthe processor receives the sound data from a microphone positioned in the environment in which content is reproduced.
  - 7. The information processing apparatus of claim 1, whereinthe processor determines a sound level corresponding to the captured sound data.
  - 8. The information processing apparatus of claim 1, whereinthe processor determines whether the captured sound data is a human'"'"'s voice or a sound other than a human'"'"'s voice.
  - 9. The information processing apparatus of claim 1, whereinthe processor controls the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the level is less than the predetermined threshold value.
  - 10. The information processing apparatus of claim 1, whereinthe processor determines whether the captured sound data is a human'"'"'s voice or a sound other than a human'"'"'s voice when it is determined that the level is greater than the predetermined threshold value.
  - 11. The information processing apparatus of claim 10, whereinthe processor controls the audio volume corresponding to the reproduced content to be lowered when it is determined that the captured sound data is a human'"'"'s voice and a face is not detected based on the captured image data.
  - 12. The information processing apparatus of claim 10, whereinthe processor determines a direction corresponding to a source of the captured sound data when it is determined that the captured sound data is a human'"'"'s voice and a face is detected based on the captured image data.
  - 13. The information processing apparatus of claim 10, whereinthe processor determines whether the captured sound data corresponds to an environmental sound registered in advance when it is determined that the captured sound data is determined to be a sound other than a human'"'"'s voice.
  - 14. The information processing apparatus of claim 13, whereinthe processor controls the audio volume corresponding to the reproduced content to increase when it is determined that the captured sound data corresponds to an environmental sound that is registered in advance.
  - 15. The information processing apparatus of claim 13, whereinthe processor controls the audio volume corresponding to the reproduced content based on previously stored settings corresponding to the environmental sound when it is determined that the captured sound data corresponds to the environmental sound stored in advance.
  - 16. The information processing apparatus of claim 1, wherein the processor determines an age of the detected user and, when the processor controls the audio volume to increase, the processor applies an increased gain to a predetermined audio frequency band.

17. A method performed by an information processing apparatus, the method comprising:
- receiving captured image data and captured sound data corresponding to an environment in which content is reproduced;
  
  detecting a user based on the captured image data;
  
  determining a direction in the captured image data to the source of the captured sound data;
  
  determining if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data;
  
  analyzing a situation of the environmental based on a result of the detection and the captured sound data; and
  
  controlling an audio volume corresponding to reproduced content based on a result of the analyzing, whereinwhen a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value,the controlling includes controlling the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is at human voice is coincident with the location of the face detected in the captured image data, andthe controlling includes controlling the audio volume corresponding to the reproduced content to increase when the direction in the captured image data corresponding to the source of the captured sound data which is human voice is not coincident with the location of the face detected in the captured image data,when the controlling includes controlling the audio volume to increase when the direction in the captured image date corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, the controlling further includes determining a volume increase amount based on the captured image data of a distance between the location of the detected user and the source of the captured sound data, andin an event of a manual adjustment of a setting, the controlling once an environmental situation is over automatically returns to a previous setting before the environmental situation occurred.
- View Dependent Claims (18)
- - 18. The method of claim 17, further comprising determining an age of the detected user and, when the controlling includes controlling the audio volume to increase, the controlling includes applying an increased gain to a predetermined audio frequency band.

19. A non-transitory computer-readable medium including computer-program instructions, which when executed by an information processing apparatus, cause the information processing apparatus to perform a method comprising:
- receiving captured image data and captured sound data corresponding to an environment in which content is reproduced;
  
  detecting a user based on the captured image data;
  
  determining a direction in the captured image data to the source of the captured sound data;
  
  determining if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data;
  
  analyzing a situation of the environment based on a result of the detection ad the captured sound data; and
  
  controlling an audio volume corresponding to reproduced content based on a result of the analyzing, whereinwhen a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value,the controlling includes controlling the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is coincident with the location of the face detected in the captured image data, andthe controlling includes controlling the audio volume corresponding to the reproduced content to increase when the direction corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, andwhen the controlling includes controlling the audio volume to increase when the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, the controlling further includes determining a volume increased amount based on the captured image data of a distance between the location of the detected user and the source of the captured sound data, andin an event of a manual adjustment of a setting, the controlling once an environmental situation is over automatically returns to a previous setting before the environmental situation occurred.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable medium according to claim 19, further comprising determining an age of the detected user and, when the controlling includes controlling the audio volume to increase, the controlling includes applying an increased gain to a predetermined audio frequency band.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Tateishi, Kazuya
Primary Examiner(s)
YENKE, BRIAN P

Application Number

US14/127,772
Publication Number

US 20140313417A1
Time in Patent Office

1,461 Days
Field of Search

725/10, 725/12, 348/734, 381/104
US Class Current

1/1
CPC Class Codes

H04N 21/42203   sound input device, e.g. mi...

H04N 21/4223   Cameras H04N23/00 takes pre...

H04N 21/4394   involving operations for an...

H04N 21/4396   by muting the audio signal

H04N 21/44008   involving operations for an...

H04N 21/44218   Detecting physical presence...

H04N 21/4532   involving end-user characte...

H04N 5/60   for the sound signals

Audio volume control device, control method and program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

62 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Audio volume control device, control method and program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others