INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
First Claim
1. An information processing device comprising:
- an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken;
an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information in a unit of user;
an audio-image-combined speech recognition score calculating unit which is input with the word information from the audio-based speech recognition processing unit and input with the mouth movement information in a unit of user from the image-based speech recognition processing unit, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process in a unit of user; and
an information integration processing unit which is input with the score and executes a speaker specification process based on the input score.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing device includes an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken, an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information, an audio-image-combined speech recognition score calculating unit which is input with the word information and the mouth movement information, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process, and an information integration processing unit which is input with the score and executes a speaker specification process.
-
Citations
10 Claims
-
1. An information processing device comprising:
-
an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken; an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information in a unit of user; an audio-image-combined speech recognition score calculating unit which is input with the word information from the audio-based speech recognition processing unit and input with the mouth movement information in a unit of user from the image-based speech recognition processing unit, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process in a unit of user; and an information integration processing unit which is input with the score and executes a speaker specification process based on the input score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An information processing method which is implemented in an information processing device comprising the steps of:
-
processing audio-based speech recognition in which an audio-based speech recognition processing unit is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken; processing image-based speech recognition in which an image-based speech recognition processing unit is input with image information as observation information of a real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information in a unit of user; calculating an audio-image-combined speech recognition score in which an audio-image-combined speech recognition score calculating unit is input with the word information from the audio-based speech recognition processing unit and input with the mouth movement information in a unit of user from the image-based speech recognition processing unit, executes a score setting process in which a mouth movement close to the word information is set with a high score, and thereby executing a score setting process in a unit of user; and processing information integration in which an information integration processing unit is input with the score and executes a speaker specification process based on the input score.
-
-
10. A program which causes an information processing device to execute an information process comprising the steps of:
-
processing audio-based speech recognition in which an audio-based speech recognition processing unit is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken; processing image-based speech recognition in which an image-based speech recognition processing unit is input with image information as observation information of a real space, analyzes mouth movements of each user included in the input image, and thereby generating mouth movement information in a unit of user; calculating an audio-image-combined speech recognition score in which an audio-image-combined speech recognition score calculating unit is input with the word information from the audio-based speech recognition processing unit and input with the mouth movement information in a unit of user from the image-based speech recognition processing unit, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process in a unit of user; and processing information integration in which an information integration processing unit is input with the score and executes a speaker specification process based on the input score.
-
Specification