Low bit rate audio-visual communication system having integrated perceptual speech and video coding

US 5,550,580 A
Filed: 05/31/1995
Issued: 08/27/1996
Est. Priority Date: 04/06/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:

analyzing said audio signal to determine if there is audio activity;

analyzing said video signal to determine if said lips of said person in said view are moving; and

encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a low bit rate audio and video communication system which employs an integrated encoding system that dynamically allocates available bits among the audio and video signals to be encoded based on the content of the audio and video information and the manner in which the audio and video information will be perceived by a viewer. A dynamic bit allocation and encoding process will evaluate the current content of the audio and video information and allocate the available bits among the audio and video signals to be encoded. In addition, an appropriate audio encoding technique is dynamically selected based on the current content of the audio signal. A face location detection subroutine will detect and model the location of faces in each video frame, in order that the facial regions may be more accurately encoded than other portions of the video frame. A lip motion detection subroutine will detect the location and movement of the lips of a person present in a video scene, in order to determine when a person is speaking and to encode the lip regions more accurately. The audio and video signals generated by a second part to a communication are monitored to determine if the second party is paying attention to the audio and video information transmitted by the first party to the communication.

Citations

9 Claims

1. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:
- analyzing said audio signal to determine if there is audio activity;
  
  analyzing said video signal to determine if said lips of said person in said view are moving; and
  
  encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity.
- View Dependent Claims (6, 7, 8, 9)
- - 6. The method of claim 1, wherein the step of analyzing said audio signals further comprises determining if the amount of audio energy exceeds a predetermined threshold.
  - 7. The method of claim 1, wherein a predetermined limited number of bits are available for encoding both the audio and video information, and further comprising the step of dynamically allocating relatively few of the predetermined limited number of bits for encoding the audio signal when said analyzing steps determine that audio activity is not correlated with lip movement.
  - 8. The method of claim 1, wherein a predetermined limited number of bits are available for encoding both the audio and video information, and further comprising the step of dynamically allocating a higher number of the predetermined limited number of bits for encoding the audio signal when said analyzing steps determine that audio activity is correlated with lip movement.
  - 9. The method of claim 1 further comprising the step of dynamically allocating an appropriate number of bits for audio and video information encoding.

2. An apparatus for selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said apparatus comprising the steps of:
- means for analyzing said audio signal to determine if there is audio activity;
  
  means for analyzing said video signal to determine if said lips of said person in said view are moving;
  
  a speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are moving while there is audio activity; and
  
  a non-speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are not moving while there is audio activity.

3. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:
- analyzing said audio signal to determine if there is audio activity;
  
  analyzing said video signal to determine if said lips of said person in said view are moving;
  
  encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity; and
  
  encoding said audio signal with a comfort noise encoding technique if said step of analyzing said audio signal determines that there is no audio activity.
- View Dependent Claims (4)
- - 4. The method of claim 3 wherein a bit rate of about 1 kilobits per second is used for the comfort noise encoding technique.

5. An apparatus for selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said apparatus comprising the steps of:
- means for analyzing said audio signal to determine if there is audio activity;
  
  means for analyzing said video signal to determine if said lips of said person in said view are moving;
  
  a speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are moving while there is audio activity;
  
  a non-speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are not moving while there is audio activity; and
  
  a comfort noise audio encoder for encoding said audio signal if said means for analyzing said audio signal determines that there is no audio activity.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Zhou, Yong
Primary Examiner(s)
Kostak, Victor R.

Application Number

US08/455,378
Time in Patent Office

454 Days
Field of Search

348/15, 348/17-20, 348/462, 348/484, 348/152, 348/161, 382/115, 382/117, 382/118, 381/36-42, 381/34, 381/29
US Class Current

348/14.1
CPC Class Codes

G06V 40/161   Detection; Localisation; No...

G10L 19/012   Comfort noise or silence co...

H04N 19/10   using adaptive coding

H04N 19/115   Selection of the code volum...

H04N 19/132   Sampling, masking or trunca...

H04N 19/137   Motion inside a coding unit...

H04N 19/14   Coding unit complexity, e.g...

H04N 19/146   Data rate or code amount at...

H04N 19/149   by estimating the code amou...

H04N 19/15   by monitoring actual compre...

H04N 19/164   Feedback from the receiver ...

H04N 19/17   the unit being an image reg...

H04N 19/176   the region being a block, e...

H04N 19/20   using video object coding

H04N 19/61   in combination with predict...

H04N 19/62   by frequency transforming i...

H04N 19/63   using sub-band based transf...

Low bit rate audio-visual communication system having integrated perceptual speech and video coding

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Low bit rate audio-visual communication system having integrated perceptual speech and video coding

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links