Automatic language identification/verification system
First Claim
1. A Language Verification System comprising:
- means for processing spoken text entered into the system whereby spoken text is converted into frames of speech, and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated;
means for detecting and extracting phonetic speech features that are syllabic nuclei, from said frames of speech;
matching means for comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to degree of similarity between said phonetic speech features and said stored reference phonetic speech features; and
,decision means for identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, said decision means encompasses a scoring methodology wherein multiple matched speakers within sand across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score.
1 Assignment
0 Petitions
Accused Products
Abstract
A language identification and verification system is described whereby language identification is determined by finding the closest match of a speech utterance to multiple speaker sets. The language identification and verification system is implemented through use of a speaker identification/verification system as a baseline to find a set of well matched speakers in each of a plurality of languages. A comparison of unknown speech to speech features from such well-matched speakers is then made and a language decision is arrived on based on a closest match between the unknown speech features and speech features for such well matched reference speakers in a particular language. To avoid a problem associated with prior-art language identification systems, wherein speech feature are based on short-term spectral features determined at a system frame rate--thereby seriously limiting the resolution and accuracy of such prior-art systems, the invention uses speech features derived from vocalic or syllabic nuclei, from which related phonetic speech features may then be extracted. Detection of such vocalic centers or syllabic nuclei is accomplished using a trained back-error propagation multi-level neural network.
-
Citations
29 Claims
-
1. A Language Verification System comprising:
-
means for processing spoken text entered into the system whereby spoken text is converted into frames of speech, and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated; means for detecting and extracting phonetic speech features that are syllabic nuclei, from said frames of speech; matching means for comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to degree of similarity between said phonetic speech features and said stored reference phonetic speech features; and
,decision means for identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, said decision means encompasses a scoring methodology wherein multiple matched speakers within sand across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 21, 22)
-
-
10. In a Language Verification System comprising a means for processing spoken text into frames of speech, a means for detecting and extracting speech features from said frames of speech, matching means for comparing said speech features with stored references speech features and establishing a matched score for said comparison proportional to a degree similarity between said speech features and said stored reference speech features, and decision means for identifying input speech to said system as corresponding to one of a plurality of languages, the improvement therewith comprising:
-
means operable with said means for detecting and extracting speech features that are syllabic nuclei, to identify phonetic speech; means operable with said matching means to establish a match score proportional to a degree of similarity between the phonetic speech features and stored reference phonetic speech features; and
,means operable with said decision means whereby said language identification for said input speech is established on the basis of a comparison of said matched scores with at least one predetermined threshold score associated with one of said plurality of languages, said means operable with said decision means encompasses a scoring methodology wherein multiple matched speakers within and across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for automatically identifying the language of a speaker as corresponding to one of a plurality of languages, including the steps of:
-
processing spoken text, whereby said spoken text is converted into frames of speech and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated; detecting and extracting phonetic features that are syllabic nuclei from said frames of input speech; comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to a degree of similarity between said phonetic speech features and said stored references phonetic speech features; and identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, wherein said match score and said at least one predetermined threshold score are both of a type selected from the group consisting of a minimum score, an average score and a combination minimum-average score. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
Specification