Systems and methods for providing acoustic classification
First Claim
1. A speech recognition system, comprising:
- audio classification logic configured to;
receive an audio stream, and detect a plurality of audio features from the audio stream by;
decoding the audio stream to generate a phone-class sequence relating to the audio stream, and processing the phone-class sequence to cluster speakers together, identify known ones of the speakers, and identify language used in the audio stream; and
speech recognition logic configured to recognize speech in the audio stream using the audio features.
8 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system receives an audio signal and detects various features of the audio signal. For example, the system classifies the audio signal into speech and non-speech portions, genders of speakers corresponding to the speech portions, and channel bandwidths used by the speakers. The system detects speaker turns based on changes in the speakers and assigns labels to the speaker turns. The system verifies the genders of the speakers and the channel bandwidths used by the speakers and identifies one or more languages associated with the audio signal. The system recognizes the speech portions of the audio signal based on the various features of the audio signal.
-
Citations
37 Claims
-
1. A speech recognition system, comprising:
-
audio classification logic configured to;
receive an audio stream, and detect a plurality of audio features from the audio stream by;
decoding the audio stream to generate a phone-class sequence relating to the audio stream, and processing the phone-class sequence to cluster speakers together, identify known ones of the speakers, and identify language used in the audio stream; and
speech recognition logic configured to recognize speech in the audio stream using the audio features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. The system 1, wherein when identifying known ones of the speakers, the audio classification logic is configured to:
identify names of the known ones of the speakers.
-
10. A method for providing speech recognition, comprising:
-
receiving an audio signal;
detecting a plurality of audio features from the audio signal by;
generating a phone-class sequence relating to the audio signal, and processing the phone-class sequence to cluster speakers together, identify known ones of the speakers, and identify language associated with the audio signal; and
recognizing speech in the audio signal using the audio features. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. The method 10, wherein the identifying known ones of the speakers includes:
identifying a name of the known ones of the speakers.
-
19. An audio classifier, comprising:
-
speech event classification logic configured to;
receive an audio stream, classify the audio stream into speech and non-speech portions, identify genders of speakers corresponding to the speech portions, and identify channel bandwidths used by the speakers;
speaker change detection logic configured to identify speaker turns based on changes in the speakers;
speaker clustering logic configured to generate labels for the speaker turns;
bandwidth refinement logic configured to refine the identification of channel bandwidths by the speech event classification logic;
gender refinement logic configured to refine the identification of genders by the speech event classification logic; and
language identification logic configured to identify one or more languages associated with the audio stream. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. An audio classifier, comprising:
-
means for receiving an audio stream;
means for classifying the audio stream into speech and non-speech portions, genders of speakers corresponding to the speech portions, and channel bandwidths used by the speakers;
means for detecting speaker turns based on changes in the speakers;
means for generating identifiers for the speaker turns;
means for verifying genders of the speakers and channel bandwidths used by the speakers; and
means for identifying one or more languages associated with the audio stream.
-
-
29. A method for detecting features of an audio signal, comprising:
-
receiving an audio signal;
classifying the audio signal into speech and non-speech portions, genders of speakers corresponding to the speech portions, and channel bandwidths used by the speakers;
detecting speaker turns based on changes in the speakers;
labeling the speaker turns;
verifying genders of the speakers and channel bandwidths used by the speakers; and
identifying one or more languages associated with the audio signal. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A speech recognition system, comprising:
-
audio classification logic configured to;
receive an audio signal, identify speech and non-speech portions of the audio signal, determine speaker turns based on the speech and non-speech portions of the audio signal, and determine one or more languages associated with the speaker turns; and
speech recognition logic configured to recognize the speech portions of the audio signal based on the speaker turns and the one or more languages.
-
-
36. A speech recognition system, comprising:
-
audio classification logic configured to;
receive an audio signal, identify speech and non-speech portions of the audio signal, determine speaker turns based on the speech and non-speech portions of the audio signal, assign labels to the speaker turns, and identify known speakers associated with the speaker turns; and
speech recognition logic configured to recognize the speech portions of the audio signal based on the speaker turns, the labels, and the known speakers.
-
-
37. A method for recognizing speech, comprising:
-
receiving an audio signal containing speech;
determining genders of speakers associated with the speech;
determining channel bandwidths used by the speakers;
identifying speaker turns based on changes in the speakers;
refining the determination of genders by selecting a majority one of the genders when more than one of the genders was associated with one of the speaker turns; and
refining the determination of channel bandwidths by selecting a majority one of the channel bandwidths when more than one of the channel bandwidths was associated with one of the speaker turns.
-
Specification