Apparatus and method to classify sound to detect speech
First Claim
1. A method of operation in a speech recognition system, the method comprising:
- analyzing each of a plurality of frames of audio by a sound classifier;
classifying a first number of the frames of audio as speech by the sound classifier;
classifying a second number of the frames of audio as non-transient background noise by the sound classifier;
classifying a third number of the frames of audio as transient noise events by the sound classifier; and
providing signals indicative at least of the classifications of the frames of audio to a speech recognizer.
1 Assignment
0 Petitions
Accused Products
Abstract
Audio frames are classified as either speech, non-transient background noise, or transient noise events. Probabilities of speech or transient noise event, or other metrics may be calculated to indicate confidence in classification. Frames classified as speech or noise events are not used in updating models (e.g., spectral subtraction noise estimates, silence model, background energy estimates, signal-to-noise ratio) of non-transient background noise. Frame classification affects acceptance/rejection of recognition hypothesis. Classifications and other audio related information may be determined by circuitry in a headset, and sent (e.g., wirelessly) to a separate processor-based recognition device.
-
Citations
45 Claims
-
1. A method of operation in a speech recognition system, the method comprising:
-
analyzing each of a plurality of frames of audio by a sound classifier; classifying a first number of the frames of audio as speech by the sound classifier; classifying a second number of the frames of audio as non-transient background noise by the sound classifier; classifying a third number of the frames of audio as transient noise events by the sound classifier; and providing signals indicative at least of the classifications of the frames of audio to a speech recognizer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of operation in a speech recognition system, the method comprising:
-
analyzing each of a plurality of frames of audio by a sound classifier; classifying a first number of the frames of audio as speech by the sound classifier; classifying a second number of the frames of audio as non-speech; forming a hypothesis based on the audio; adjusting a threshold at which a recognized hypothesis is either rejected or accepted based at least in part on the first and the second numbers of frames. - View Dependent Claims (21)
-
-
22. A method of operation in a speech recognition system, the method comprising:
-
analyzing a first segment of audio by a sound classifier; determining at least two confidences among the following three confidences; a first confidence that the first segment of audio is speech; a second confidence that the first segment of audio is non-transient background noise;
ora third confidence that the first segment of audio is transient background noise; generating a hypothesis for a second segment of audio that includes the first segment of audio; and adjusting a threshold at which the hypothesis is either rejected or accepted based at least in part on the at least two confidences. - View Dependent Claims (23)
-
-
24. A speech recognition system, comprising:
a sound classifier that includes at least one non-transitory processor-readable medium and at least one processor communicatively coupled to the at least one non-transitory processor-readable medium, and that analyzes each of a plurality of frames of audio, classifies a first number of the frames of audio as speech by the sound classifier, classifies a second number of the frames of audio as non-transient background noise by the sound classifier, classifies a third number of the frames of audio as transient noise events by the sound classifier, and provides signals indicative at least of the classifications of the frames of audio. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
42. A speech recognition system, comprising:
-
a sound classifier that includes at least one non-transitory processor-readable medium and at least one processor communicatively coupled to the at least one non-transitory processor-readable medium, and that analyzes each of a plurality of frames of audio, classifies a first number of the frames of audio as speech, classifies a second number of the frames of audio as non-speech, forms a hypothesis based on the audio; and adjusts a threshold at which a recognized hypothesis is either rejected or accepted based at least in part on the first and the second numbers of frames. - View Dependent Claims (43)
-
-
44. A speech recognition system, comprising:
-
a sound classifier that includes at least one non-transitory processor-readable medium and at least one processor communicatively coupled to the at least one non-transitory processor-readable medium, and that analyzes a first segment of audio, determines at least two confidences among the following three confidences; a first confidence that the first segment of audio is speech; a second confidence that the first segment of audio is non-transient background noise;
ora third confidence that the first segment of audio is transient background noise; generates a hypothesis for a second segment of audio that includes the first segment of audio; and adjusts a threshold at which the hypothesis is either rejected or accepted based at least in part on the at least two confidences. - View Dependent Claims (45)
-
Specification