Noise reduction based on mouth area movement recognition

US 9,263,044 B1
Filed: 06/27/2012
Issued: 02/16/2016
Est. Priority Date: 06/27/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

capturing video information using a camera of a computing device, the video information showing at least a portion of a mouth area of a user of the computing device;

capturing audio information using a microphone of the computing device, the audio information including voice data generated by the user and an amount of noise;

processing the video information to determine a movement of the portion of the mouth area of the user;

applying noise reduction to the audio information to generate modified audio information that corresponds to a reduction of at least a portion of the noise;

transmitting, over a communication network, the modified audio information;

determining that the movement of the portion of the mouth area does not correspond to user speech; and

causing at least one of capturing the audio information, applying the noise reduction, or transmitting the modified audio information to cease being performed for at least a period of time.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computing device can capture video data of at least a portion of a mouth area (e.g., mouth, lips, tongue, chin, jaw) of a user of the device. The computing device can also capture sound data including a voice of the user as well as noise (e.g. background noise). The video data can be processed to detect a movement of the portion of the mouth area. The movement of the portion of the mouth area can be analyzed and compared with mouth area movement models characteristic of oral communication (e.g., speech, song). If the movement of the portion of the mouth area corresponds to at least one model characteristic of oral communication, then the movement indicates that the user is likely engaging in oral communication. Noise reduction can be applied and/or increased on the captured sound data to reduce noise and in turn enhance the user'"'"'s voice.

78 Citations

View as Search Results

25 Claims

1. A computer-implemented method, comprising:
- capturing video information using a camera of a computing device, the video information showing at least a portion of a mouth area of a user of the computing device;
  
  capturing audio information using a microphone of the computing device, the audio information including voice data generated by the user and an amount of noise;
  
  processing the video information to determine a movement of the portion of the mouth area of the user;
  
  applying noise reduction to the audio information to generate modified audio information that corresponds to a reduction of at least a portion of the noise;
  
  transmitting, over a communication network, the modified audio information;
  
  determining that the movement of the portion of the mouth area does not correspond to user speech; and
  
  causing at least one of capturing the audio information, applying the noise reduction, or transmitting the modified audio information to cease being performed for at least a period of time.
- View Dependent Claims (2, 3)
- - 2. The computer-implemented method of claim 1, wherein determining that the movement of the portion of the mouth area does not correspond to the user speech includes:
    - comparing the video information to one or more mouth area movement models that are characteristic of speech.
  - 3. The computer-implemented method of claim 1, wherein the portion of the mouth area includes at least one of a mouth, a lip, a tongue, a chin, a jaw, a tooth, or facial hair of the user.

4. A computer-implemented method, comprising:
- receiving image information showing at least a portion of a face of a user of a computing device;
  
  receiving audio information corresponding to the image information;
  
  processing the image information to determine a movement of the portion of the face of the user;
  
  applying noise reduction to the audio information to generate modified audio information;
  
  determining that the movement of the portion of the face of the user does not correspond to communication; and
  
  causing at least one of receiving the audio information or applying the noise reduction to cease being performed for at least a period of time.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 5. The computer-implemented method of claim 4, wherein the portion of the face of the user comprises a portion of a mouth area of the user.
  - 6. The computer-implemented method of claim 4, further comprising:
    - transmitting, over a communication network, the modified audio information.
  - 7. The computer-implemented method of claim 4, wherein the image information is received from at least one image capture component of the computing device and the audio information is received from at least one audio capture component of the computing device.
  - 8. The computer-implemented method of claim 5, further comprising:
    - reducing an audio level of the audio information based on determining that the movement of the portion of the mouth area does not correspond to oral communication.
  - 9. The computer-implemented method of claim 5, wherein determining that the movement of the portion of the mouth area does not correspond to oral communication includes:
    - comparing the image information to one or more mouth area movement models that are characteristic of oral communication.
  - 10. The computer-implemented method of claim 9, wherein the one or more mouth area movement models are generated based at least in part upon one or more historical movements of the user.
  - 11. The computer-implemented method of claim 4, wherein the noise reduction is automatically adjusted based at least in part on quality of the audio information relative to quality of noise in the audio information.
  - 12. The computer-implemented method of claim 5, further comprising:
    - performing voice transcription based at least in part upon comparing the movement of the portion of the mouth area to one or more mouth area movement models that are each characteristic of one or more phonemes.
  - 13. The computer-implemented method of claim 5, wherein a beginning of the movement of the portion of the mouth area of the user correlates to a beginning of sound from a mouth of the user.
  - 14. The computer-implemented method of claim 4, wherein the oral communication is associated with at least one of a phone call, a video chat, a voice message, speech recognition, voice transcription, or voice dictation.
  - 15. The computer-implemented method of claim 4, wherein the oral communication utilizes at least one of a headset mode, a handset mode, a speaker-phone mode, or a hands-free mode.
  - 16. The computer-implemented method of claim 4, wherein processing the video information and applying the noise reduction are performed via cloud computing.

17. A computing device, comprising:
- at least one image capture component configured to capture image information;
  
  at least one audio capture component configured to capture audio information;
  
  a processor; and
  
  a memory device including instructions that, upon being executed by the processor, cause the computing device to;
  
  receive image information showing at least a portion of a face of a user of the computing device from the at least one image capture component;
  
  receive audio information corresponding to the image information from the at least one audio capture component;
  
  process the image information to determine a movement of the portion of the face of the user;
  
  apply noise reduction to the audio information to generate modified audio information;
  
  determine that the movement of the portion of the face of the user does not correspond to oral communication; and
  
  cause at least one of ceasing to receive the audio information or ceasing to apply the noise reduction for at least a period of time.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The computing device of claim 17, wherein the portion of the face of the user comprises a portion of a mouth area of the user.
  - 19. The computing device of claim 18, wherein the instructions that cause the computing device to determine that the movement of the portion of the mouth area does not correspond to oral communication include causing the computing device to:
    - compare the image information to one or more mouth area movement models that are characteristic of oral communication.
  - 20. The computing device of claim 19, further comprising:
    - a model library configured to store the one or more mouth area movement models that are characteristic of oral communication.
  - 21. The computing device of claim 17, further comprising:
    - a speaker configured to play audio outputted by the computing device, wherein the audio outputted by the computing device via the speaker contributes to noise in the audio information received from the at least one audio capture component.

22. A non-transitory computer-readable storage medium including instructions that, upon being executed by a processor of a computing device, cause the computing device to:
- receive image information showing at least a portion of a face of a user of the computing device;
  
  receive audio information corresponding to the image information;
  
  process the image information to determine a movement of the portion of the face of the user;
  
  apply noise reduction to the audio information to generate modified audio information;
  
  determine that the movement of the portion of the face of the user does not correspond to oral communication; and
  
  cause at least one of ceasing to receive the audio information or ceasing to apply the noise reduction for at least a period of time.
- View Dependent Claims (23, 24, 25)
- - 23. The non-transitory computer-readable storage medium of claim 22, wherein the portion of the face of the user comprises a portion of a mouth area of the user.
  - 24. The non-transitory computer-readable storage medium of claim 23, wherein the instructions that cause the computing device to determine that the movement of the portion of the mouth area does not correspond to oral communication include causing the computing device to:
    - compare the image information to one or more mouth area movement models that are characteristic of oral communication.
  - 25. The non-transitory computer-readable storage medium of claim 23, wherein the noise reduction is automatically adjusted based at least in part on quality of the audio information relative to quality of noise in the audio information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Watanabe, Yuzo, Noble, Isaac S., Cassidy, Ryan H.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Kovacek, David

Application Number

US13/534,388
Time in Patent Office

1,329 Days
Field of Search

704200-201, 704226-245, 704251-255, 704260-261, 704/266, 704270-271, 704/276, 704E15001-E1505, 704E21001-E2102, 704E11001-E11007
US Class Current

1/1
CPC Class Codes

G06F 18/00   Pattern recognition

G06F 18/22   Matching criteria, e.g. pro...

G06V 10/75   Organisation of the matchin...

G06V 40/171   Local features and componen...

G06V 40/176   Dynamic expression

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

G10L 21/0208   Noise filtering

Noise reduction based on mouth area movement recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

78 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Noise reduction based on mouth area movement recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

78 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links