Environment adaptive speech recognition method and device
First Claim
1. A speech recognition method, comprising:
- receiving, by a speech recognition device, an input speech, wherein the speech recognition device comprises a noise type detection engine, a storage area and a speech engine;
dividing the input speech, by the speech recognition device, into detection speech at a beginning of the input speech and a to-be-recognized speech following the detection speech, wherein a length of speech data comprised in the detection speech is less than a length of speech data comprised in the to-be-recognized speech;
selecting, by the noise type detection engine based on comparing the detection speech with a plurality of speech training samples under a plurality of different sample environments, a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the detection speech, as a detection environment type, wherein the plurality of sample environments comprises a quiet environment and a noise environment;
detecting, by the speech recognition device, a storage area;
outputting, by the speech recognition device, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between the detection environment type and the previous environment type, wherein the previous environment type comprises a quiet environment or a noise environment;
controlling, by the speech engine according to the speech correction instruction, correction on the to-be-recognized speech, and outputting an initial recognition result;
separately comparing, by the noise type detection engine, the received to-be-recognized speech with the plurality of the speech training samples, and selecting a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the to-be-recognized speech, as a current environment type;
storing, by the speech recognition device, the current environment type to the storage area, and abandoning the current environment type after a preset duration; and
outputting, by the speech recognition device, a final recognition result after a confidence value of the initial recognition result is adjusted according to the current environment type.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method, a speech recognition device, and an electronic device. In this method, first determining is performed by using a sample environment corresponding to a detection speech and a previous environment type, so as to output a corresponding speech correction instruction to a speech engine; then, a to-be-recognized speech is input to the speech engine and a noise type detection engine at the same time, and the speech engine corrects the to-be-recognized speech by using the speech correction instruction, so that quality of an original speech is not impaired by noise processing, and a corresponding initial recognition result is output; the noise type detection engine determines a current environment type by using the to-be-recognized speech and a speech training sample under a different environment; finally, confidence of the initial recognition result is adjusted by using the current environment type.
12 Citations
17 Claims
-
1. A speech recognition method, comprising:
-
receiving, by a speech recognition device, an input speech, wherein the speech recognition device comprises a noise type detection engine, a storage area and a speech engine; dividing the input speech, by the speech recognition device, into detection speech at a beginning of the input speech and a to-be-recognized speech following the detection speech, wherein a length of speech data comprised in the detection speech is less than a length of speech data comprised in the to-be-recognized speech; selecting, by the noise type detection engine based on comparing the detection speech with a plurality of speech training samples under a plurality of different sample environments, a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the detection speech, as a detection environment type, wherein the plurality of sample environments comprises a quiet environment and a noise environment; detecting, by the speech recognition device, a storage area; outputting, by the speech recognition device, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between the detection environment type and the previous environment type, wherein the previous environment type comprises a quiet environment or a noise environment; controlling, by the speech engine according to the speech correction instruction, correction on the to-be-recognized speech, and outputting an initial recognition result; separately comparing, by the noise type detection engine, the received to-be-recognized speech with the plurality of the speech training samples, and selecting a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the to-be-recognized speech, as a current environment type; storing, by the speech recognition device, the current environment type to the storage area, and abandoning the current environment type after a preset duration; and outputting, by the speech recognition device, a final recognition result after a confidence value of the initial recognition result is adjusted according to the current environment type. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech recognition device, comprising:
-
a processor, configured to; acquire a detection speech and a to-be-recognized following the detection speech by sampling an input speech, and input the detection speech and the to-be-recognized speech into a noise type detection engine and a speech engine at the same time; detect a storage area, and output, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between a detection environment type output by the noise type detection engine in response to receiving the detection speech and the to-be-recognized speech and the previous environment type; and output a final recognition result after a confidence value of an initial recognition result output by the engine is adjusted according to a current environment type output by the noise type detection engine, wherein a length of speech data comprised in the detection speech is less than a length of speech data comprised in the to-be-recognized speech, and the previous environment type is a quiet environment or a noise environment; the noise type detection engine interfaced to the processor, configured to; separately compare the detection speech and the to-be-recognized speech that are output by the processor with a plurality of speech training samples under a plurality of different sample environments; select a sample environment corresponding to a speech training sample that has a minimum difference with the detection speech, as a detection environment type; select a sample environment corresponding to a speech training sample that has a minimum difference with the to-be-recognized speech, as a current environment type; and store the current environment type to the storage area, and abandon the current environment type after preset duration; and the speech engine interfaced to the noise type detection engine and the processor, configured to receive the speech correction instruction from the processor and control correction on the received to-be-recognized speech according to the speech correction instruction output by the processor, and output an initial recognition result. - View Dependent Claims (9, 10)
-
-
11. An electronic device, comprising a speech recognition device, a speech recording device connected to the speech recognition device, and a microphone connected to the recording device;
-
wherein the speech recording device is configured to collect and record an input speech by using the microphone, and is configured to input the recorded input speech to the speech recognition device; wherein the speech recognition device is configured to; receive an input speech; divide the input into a detection speech at the beginning of the input speech and a to-be-recognized speech following the detection speech, wherein a length of speech data comprised in the detection speech is less than a length of speech data comprised in the to-be-recognized speech; select, after comparing the detection speech with a plurality of speech training samples under a plurality of different sample environments, a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the detection speech, as a detection environment type, wherein the sample environment comprises a quiet environment and a noise environment; detect a storage area in the speech recognition device; output, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between the detection environment type and the previous environment type, wherein the previous environment type is a quiet environment or a noise environment; control, according to the speech correction instruction, correction on the to-be-recognized speech, and output an initial recognition result; separately compare the received to-be-recognized speech with the plurality of speech training samples, and select sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the to-be-recognized speech, as a current environment type; store the current environment type to the storage area, and abandon the current environment type after preset duration; and output a final recognition result after a confidence value of the initial recognition result is adjusted according to the current environment type. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification