Voice responsive image tracking system
First Claim
1. A camera tracking system that tracks sound emitting objects, the system comprising:
- image reception means for generating video signals representative of an image;
audio sensing means for generating audio signals representative of speech and other sounds; and
interface means coupled to the audio sensing means and to the image reception means, the interface means sensing characteristics of the audio signals generated by the audio sensing means for automatically, digitally cropping and scaling the image to allow display of a framed video image including a sound emitting object.
0 Assignments
0 Petitions
Accused Products
Abstract
A camera tracking system that continuously tracks sound emitting objects is provided. A sound activation feature of the system enables a video camera to track speakers in a manner similar to the natural transition that occurs when people turn their eyes toward different sounds. The invented system is well suited for video-phone applications. The invented tracking system comprises a video camera for transmitting an image from its remote location, a screen for receiving images, and microphones for directing the camera. The camera may be coupled to the microphones via an interface for processing information transmitted from the microphones for directing the camera. The system may utilize the translucent properties of LCD screens by disposing a video camera behind such a screen and enabling persons at each remote location to look directly into the screen and at the camera. The interface enables intelligent framing of a speaker without mechanically repositioning the camera. The microphones are positioned using triangulation techniques. Characteristics of audio signals are processed by the interface for determining movement of the speaker for directing the camera. As the characteristics sensed by the microphones change, the interface directs the camera toward the speaker. The interface continuously directs the camera, until the change in the characteristics stabilizes, thus precisely directing the camera toward the speaker.
-
Citations
41 Claims
-
1. A camera tracking system that tracks sound emitting objects, the system comprising:
-
image reception means for generating video signals representative of an image;
audio sensing means for generating audio signals representative of speech and other sounds; and
interface means coupled to the audio sensing means and to the image reception means, the interface means sensing characteristics of the audio signals generated by the audio sensing means for automatically, digitally cropping and scaling the image to allow display of a framed video image including a sound emitting object. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A camera tracking system that tracks sound emitting objects, the system comprising:
-
a plurality of audio sensing means for generating audio signals representative of speech and other sounds, each of the audio sensing means generating an audio indicating signal indicative of sound sensed thereby and emitted by a sound emitting object;
image reception means for generating video signals representative of an image; and
interface means coupled to the plurality of audio sensing means and to the image reception means, the interface means continuously sensing the indicating signals generated by the audio sensing means for determining any change in the indicating signals for continuously determining a mobile location of a sound emitting object and directing the image reception means toward the sound emitting object, the interface means creating a framed video image including the sound emitting object. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
the interface means further comprising scaling and cropping means for selecting a portion of the image containing the sound emitting object and for framing the image in the selected portion creating the framed video image of the sound emitting object. -
13. The system of claim 12 wherein the image reception means comprises a desired one of a video camera and a CCD camera.
-
14. The system of claim 13 further comprising:
-
at least one additional system located at at least one additional remote location, and signal transmission means for transmitting the audio indicative and framed video image signals to the remote locations and receiving audio and framed video signals from the remote locations for enabling communication between the locations.
-
-
15. The system of claim 6 wherein the interface means further includes means for sensing and differentiating tone and for tracking a sound emitting object having a selected tone.
-
16. The system of claim 6 wherein the interface means further includes filter means for sensing and tracking a particular sound emitting object in the presence of ambient noise and for distinguishing a speaker'"'"'s voice in the presence of the voices of others.
-
17. The system of claim 6 wherein the interface means comprises a computing means.
-
18. The camera tracking system of claim 6 further comprising an image display means for displaying an image.
-
19. The camera tracking system of claim 6 wherein said image reception means is stationary.
-
-
20. A camera tracking system that tracks sound emitting objects, the system comprising:
-
image display means for displaying an image;
a plurality of audio sensing means for generating audio signals representative of speech and other sounds, each of the audio sensing means generating an audio indicating signal indicative of sound sensed thereby and emitted by a sound emitting object;
image reception means for generating video signals representative of an image, the image reception means being a desired one of a video camera and a CCD camera and capturing a wide field of view that includes an image of the sound emitting object; and
interface means coupled to the plurality of audio sensing means and to the image reception means, the interface means continuously sensing the indicating signals generated by the audio sensing means for determining any change in the indicating signals for continuously directing the image reception means toward a sound emitting object, wherein when the interface means determines that the indicating signal generated by at least one of the audio sensing means has changed, the interface means redirects the image reception means until the change in the indicating signals stabilizes indicating that the image reception means is directed toward the sound emitting object, the interface means having scaling and cropping means for selecting a portion of the field of view containing the image of the sound emitting means and for framing the image in the selected portion to transmit video signals representative of the image, wherein the camera is disposed behind the image display means and retained in a housing thereof, the image display means comprising a screen configured to enable the camera to capture the image of a user while enabling the user to look directly at the screen and gaze directly into the camera, wherein at least one user at each remote location may look directly into the screen, with the camera capturing their image so that users at each remote location are looking directly at the other users at the other remote locations. - View Dependent Claims (21, 22)
-
-
23. A camera tracking system that tracks sound emitting objects, the system comprising:
-
image display means for displaying an image;
a plurality of audio sensing means for generating audio signals representative of speech and other sounds, each of the audio sensing means generating an audio indicating signal indicative of sound sensed thereby and emitted by a sound emitting object;
image reception means for generating video signals representative of an image;
interface means coupled to the plurality of audio sensing means and to the image reception means, the interface means continuously sensing the indicating signals generated by the audio sensing means for determining any change in the indicating signals for continuously directing the image reception means toward a sound emitting object, wherein when the interface means determines that the indicating signal generated by at least one of the audio sensing means has changed, the interface means redirects the image reception means until the change in the indicating signals stabilizes indicating that the image reception means is directed toward the sound emitting object; and
a digital microphone worn on a deaf user'"'"'s person for indicating to the user any change in the location of the sound emitting object, the microphone including means for delivering a tapping sensation to the user for indicating to the user the direction of the sound emitting object, the amplitude of the tapping indicating the amplitude of the sound sensed by the digital microphone.
-
-
24. A method for digitally framing an image by tracking sound emitting objects, the method comprising the steps of:
-
placing at least two audio sensing means at known positions relative to an image reception means;
sensing sound waves emitted by a sound emitting object, the sound waves representative of speech and other sounds;
generating audion signals representative of the sound waves sensed;
processing the audio signals using triangulation techniques to determine the position of the sound emitting object;
capturing a wide field-of-view image including the sound emitting object;
genereating a framed video image by automatically digitally scaling and cropping the wide field-of-view image;
continuing to process the audio signal to continuously determine the location of the sound emitting object. - View Dependent Claims (25, 26, 27, 28, 29, 30)
transmitting the audio signals and the framed video image to a remote location; and
receiving the audio signals and the framed video image at the remote location.
-
-
29. The method of claim 24 further comprising the step of maintaining the sound emitting object within the framed image.
-
30. The method of claim 29 wherein maintaining the sound emitting object within the framed image is accomplished by rescaling and recropping the wide field-of-view image when the sound emitting object moves.
-
31. A sound tracking system for automatically reframing a video image, comprising:
-
a camera having a fixed position and generating a video image, at least two microphones having known relative positions with respect to the camera, an interface means for processing input from the at least two microphones to utilize triangulation to determine the position of an audio source and for creating a framed video image of the audio source from the video image generated by the camera, and image processing means for cropping and scaling the video image generated by the camera to create the framed video image. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
Specification