Behavior recognition system and method by combining image and speech
First Claim
1. A behavior recognition system by combining an image and a speech, comprising:
- a database, for storing a plurality of image-and-speech relation modules, wherein each of the image-and-speech relation modules comprises a feature extraction parameter and an image-and-speech relation parameter;
a data analyzing module, for substituting a gesture image and a speech data corresponding to each other into each feature extraction parameter to obtain a plurality of image feature sequences and a plurality of speech feature sequences, and substituting each image feature sequence and each speech feature sequence corresponding to a same image-and-speech relation module into each image-and-speech relation parameter, so as to calculate a plurality of image-and-speech status parameters, wherein each image feature sequence comprises a plurality of image frame data, and the image frame data forms a plurality of image frame status combinations;
each speech feature sequence comprises a plurality of speech frame data, and the speech frame data forms a plurality of speech frame status combinations, when the data analyzing module calculates each one of the image-and-speech status parameters, the data analyzing module substitutes each image frame status combination and each speech frame status combination into the image-and-speech relation parameter corresponding to the same image-and-speech relation module to calculate a plurality of image-and-speech sub-status parameters and selects one image-and-speech sub-status parameter from the plurality of image-and-speech sub-status parameters to serve as the image-and-speech status parameter corresponding to the image-and-speech relation module; and
a calculating module, for using the image feature sequences, the speech feature sequences, and the image-and-speech status parameters to calculate a recognition probability corresponding to each of the image-and-speech relation modules, and taking a target parameter from the recognition probabilities.
1 Assignment
0 Petitions
Accused Products
Abstract
A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters. The calculating module uses the image-and-speech status parameters, the image feature sequences, and the speech feature sequences to calculate a recognition probability corresponding to each image-and-speech relation parameter, so as to take a maximum value among the recognition probabilities as a target parameter.
-
Citations
17 Claims
-
1. A behavior recognition system by combining an image and a speech, comprising:
-
a database, for storing a plurality of image-and-speech relation modules, wherein each of the image-and-speech relation modules comprises a feature extraction parameter and an image-and-speech relation parameter; a data analyzing module, for substituting a gesture image and a speech data corresponding to each other into each feature extraction parameter to obtain a plurality of image feature sequences and a plurality of speech feature sequences, and substituting each image feature sequence and each speech feature sequence corresponding to a same image-and-speech relation module into each image-and-speech relation parameter, so as to calculate a plurality of image-and-speech status parameters, wherein each image feature sequence comprises a plurality of image frame data, and the image frame data forms a plurality of image frame status combinations;
each speech feature sequence comprises a plurality of speech frame data, and the speech frame data forms a plurality of speech frame status combinations, when the data analyzing module calculates each one of the image-and-speech status parameters, the data analyzing module substitutes each image frame status combination and each speech frame status combination into the image-and-speech relation parameter corresponding to the same image-and-speech relation module to calculate a plurality of image-and-speech sub-status parameters and selects one image-and-speech sub-status parameter from the plurality of image-and-speech sub-status parameters to serve as the image-and-speech status parameter corresponding to the image-and-speech relation module; anda calculating module, for using the image feature sequences, the speech feature sequences, and the image-and-speech status parameters to calculate a recognition probability corresponding to each of the image-and-speech relation modules, and taking a target parameter from the recognition probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A behavior recognition method by combining an image and a speech, comprising:
-
obtaining a gesture image and a speech data corresponding to each other; providing a plurality of image-and-speech relation modules, wherein each of the image-and-speech relation modules comprises a feature extraction parameter and an image-and-speech relation parameter; obtaining a plurality of image feature sequences and a plurality of speech feature sequences, wherein the gesture image and the speech data are individually substituted into the feature extraction parameters, so as to calculate the image feature sequences and the speech feature sequences, wherein each image feature sequence comprises a plurality of image frame data, and the image frame data forms a plurality of image frame status combinations;
each speech feature sequence comprises a plurality of speech frame data, and the speech frame data forms a plurality of speech frame status combinations;calculating a plurality of image-and-speech status parameters, wherein each image feature sequence and each speech feature sequence corresponding to a same image-and-speech relation module are substituted into each image-and-speech relation parameter, so as to obtain the image-and-speech status parameters, wherein the step of calculating each one of the image-and-speech status parameters comprises; obtaining a plurality of image-and-speech sub-status parameters, wherein each image frame status combination and each speech frame status combination are substituted into the image-and-speech relation parameter corresponding to the same image-and-speech relation module, so as to calculate the image-and-speech sub-status parameters; and selecting one image-and-speech sub-status parameter from the image-and-speech sub-status parameters to serve as the image-and-speech status parameter corresponding to the image-and-speech relation module; calculating a plurality of recognition probabilities, wherein the image feature sequences, the speech feature sequences, and the image-and-speech status parameters are used to calculate a recognition probability corresponding to each of the image-and-speech relation modules; and taking a target parameter from the recognition probabilities. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
Specification