Low bit rate audio-visual communication system having integrated perceptual speech and video coding
First Claim
1. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:
- analyzing said audio signal to determine if there is audio activity;
analyzing said video signal to determine if said lips of said person in said view are moving; and
encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a low bit rate audio and video communication system which employs an integrated encoding system that dynamically allocates available bits among the audio and video signals to be encoded based on the content of the audio and video information and the manner in which the audio and video information will be perceived by a viewer. A dynamic bit allocation and encoding process will evaluate the current content of the audio and video information and allocate the available bits among the audio and video signals to be encoded. In addition, an appropriate audio encoding technique is dynamically selected based on the current content of the audio signal. A face location detection subroutine will detect and model the location of faces in each video frame, in order that the facial regions may be more accurately encoded than other portions of the video frame. A lip motion detection subroutine will detect the location and movement of the lips of a person present in a video scene, in order to determine when a person is speaking and to encode the lip regions more accurately. The audio and video signals generated by a second part to a communication are monitored to determine if the second party is paying attention to the audio and video information transmitted by the first party to the communication.
-
Citations
9 Claims
-
1. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:
-
analyzing said audio signal to determine if there is audio activity; analyzing said video signal to determine if said lips of said person in said view are moving; and encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity. - View Dependent Claims (6, 7, 8, 9)
-
-
2. An apparatus for selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said apparatus comprising the steps of:
-
means for analyzing said audio signal to determine if there is audio activity; means for analyzing said video signal to determine if said lips of said person in said view are moving; a speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are moving while there is audio activity; and a non-speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are not moving while there is audio activity.
-
-
3. A method of selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said method comprising the steps of:
-
analyzing said audio signal to determine if there is audio activity; analyzing said video signal to determine if said lips of said person in said view are moving; encoding said audio signal with a speech specific audio encoding technique if said analyzing steps determine that said lips are moving while there is audio activity and encoding said audio signal with a non-speech specific audio encoding technique if said analyzing steps determine that said lips are not moving while there is audio activity; and encoding said audio signal with a comfort noise encoding technique if said step of analyzing said audio signal determines that there is no audio activity. - View Dependent Claims (4)
-
-
5. An apparatus for selecting an audio encoding technique for encoding an audio signal by an audio-visual communication system that contains audio and video information, said video information including a view of a facial region of at least one person, said facial region including lips, said apparatus comprising the steps of:
-
means for analyzing said audio signal to determine if there is audio activity; means for analyzing said video signal to determine if said lips of said person in said view are moving; a speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are moving while there is audio activity; a non-speech specific audio encoder for encoding said audio signal if said means for analyzing said audio and video signals determine that said lips are not moving while there is audio activity; and a comfort noise audio encoder for encoding said audio signal if said means for analyzing said audio signal determines that there is no audio activity.
-
Specification