Visual speech detection using facial landmarks
First Claim
1. A data processing apparatus for detecting a probability of speech based on video data, the data processing apparatus comprising:
- at least one processor;
a non-transitory computer-readable storage medium including instructions executable by the at least one processor, wherein execution of the instructions by the at least one processor causes the data processing apparatus to execute;
a visual speech detector configured to receive a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal;
the visual speech detector configured to calculate a short-term value representing short-term characteristics of the coordinated-based signal and a long-term value representing long-term characteristics of the coordinate-based signal,the visual speech detector configured to compute a probability of speech of the person based on a comparison of the short-term value and the long-term value, wherein, when the short-term value is greater than the long-term value, the visual speech detector computes the probability of speech as a value indicating that speech as occurred.
2 Assignments
0 Petitions
Accused Products
Abstract
A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal.
-
Citations
20 Claims
-
1. A data processing apparatus for detecting a probability of speech based on video data, the data processing apparatus comprising:
-
at least one processor; a non-transitory computer-readable storage medium including instructions executable by the at least one processor, wherein execution of the instructions by the at least one processor causes the data processing apparatus to execute; a visual speech detector configured to receive a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal; the visual speech detector configured to calculate a short-term value representing short-term characteristics of the coordinated-based signal and a long-term value representing long-term characteristics of the coordinate-based signal, the visual speech detector configured to compute a probability of speech of the person based on a comparison of the short-term value and the long-term value, wherein, when the short-term value is greater than the long-term value, the visual speech detector computes the probability of speech as a value indicating that speech as occurred. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for detecting a probability of speech based on video data,
the method comprising: -
receiving, by at least one processor, a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal; building, by the at least one processor, a histogram of the coordinate-based signal; determining, by the at least one processor, an active range of the person based on the histogram; validating, by the at least one processor, the coordinate-based signal based on the active range; calculating, by the at least one processor, a short-term value representing short-term characteristics of the validated coordinated-based signal and a long-term value representing long-term characteristics of the validated coordinate-based signal; and computing, by the at least one processor, a probability of speech of the person based on a comparison of the short-term value and the long-term value of the validated coordinate-based signal. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing instructions that when executed cause at least one processor to detect a probability of speech based on video data, the instructions comprising instructions to:
-
receive a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal; partition the coordinate-based signal into a plurality of bins; calculate a short-term value representing short-term characteristics of the partitioned coordinated-based signal and a long-term value representing long-term characteristics of the partitioned coordinate-based signal; select a long-term bin from the plurality of bins that corresponds to the long-term value and a short-term bin from the plurality of bins that corresponds to the short-term value; compare the long-term bin with the short-term bin; and calculate a probability of speech of the person based on the comparison. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification