SPEECH RECOGNITION ANALYSIS VIA IDENTIFICATION INFORMATION
First Claim
1. In a computing system comprising a microphone array and an image sensor, a method of operating a speech recognition input system, the method comprising:
- receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value;
receiving image data comprising visual locational information related to a location of each person located in a field of view of the image sensor;
comparing the acoustic locational data to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor; and
adjusting the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the field of view of the image sensor.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
174 Citations
20 Claims
-
1. In a computing system comprising a microphone array and an image sensor, a method of operating a speech recognition input system, the method comprising:
-
receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value; receiving image data comprising visual locational information related to a location of each person located in a field of view of the image sensor; comparing the acoustic locational data to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor; and adjusting the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the field of view of the image sensor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17, 18)
-
-
9. An interactive entertainment system, comprising:
-
a depth-sensing camera; a microphone array comprising a plurality of microphones; and a computing device comprising a processor and memory comprising instructions stored thereon that are executable by the processor to; receive speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value; receive image data comprising visual locational information related to a location of each of each person located in a field of view of the depth-sensing camera; compare the acoustic locational data to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor; and adjust the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the field of view of the depth-sensing camera. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium comprising instructions stored thereon that are executable by a computing device to:
-
receive speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value; receive image data comprising visual locational information related to a location of each of each person located in a field of view of the depth-sensing camera; compare the acoustic locational data to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor; adjust the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the field of view of the depth-sensing camera; if is determined that the recognized speech segment originated from a person in the field of view of the image sensor, then determine whether a face of the person is facing the image sensor; and adjusting the confidence data based upon whether the face of the person is facing the image sensor. - View Dependent Claims (19, 20)
-
Specification