Automatic language identification/verification system

US 5,689,616 A
Filed: 06/28/1996
Issued: 11/18/1997
Est. Priority Date: 11/19/1993
Status: Expired due to Term

First Claim

Patent Images

1. A Language Verification System comprising:

means for processing spoken text entered into the system whereby spoken text is converted into frames of speech, and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated;

means for detecting and extracting phonetic speech features that are syllabic nuclei, from said frames of speech;

matching means for comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to degree of similarity between said phonetic speech features and said stored reference phonetic speech features; and

,decision means for identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, said decision means encompasses a scoring methodology wherein multiple matched speakers within sand across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language identification and verification system is described whereby language identification is determined by finding the closest match of a speech utterance to multiple speaker sets. The language identification and verification system is implemented through use of a speaker identification/verification system as a baseline to find a set of well matched speakers in each of a plurality of languages. A comparison of unknown speech to speech features from such well-matched speakers is then made and a language decision is arrived on based on a closest match between the unknown speech features and speech features for such well matched reference speakers in a particular language. To avoid a problem associated with prior-art language identification systems, wherein speech feature are based on short-term spectral features determined at a system frame rate--thereby seriously limiting the resolution and accuracy of such prior-art systems, the invention uses speech features derived from vocalic or syllabic nuclei, from which related phonetic speech features may then be extracted. Detection of such vocalic centers or syllabic nuclei is accomplished using a trained back-error propagation multi-level neural network.

Citations

29 Claims

1. A Language Verification System comprising:
- means for processing spoken text entered into the system whereby spoken text is converted into frames of speech, and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated;
  
  means for detecting and extracting phonetic speech features that are syllabic nuclei, from said frames of speech;
  
  matching means for comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to degree of similarity between said phonetic speech features and said stored reference phonetic speech features; and
  
  ,decision means for identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, said decision means encompasses a scoring methodology wherein multiple matched speakers within sand across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 21, 22)
- - 2. The Language Verification system of claim 1, wherein said detection of syllabic nuclei occurs through processing said input speech by a trained back error propagation multi-layer neural network.
  - 3. The Language Verification System of claim 1, wherein feature information respecting said speech input is realized by encoding a plurality of frames of said input speech from a sequence of said frames proximate to the syllabic nuclei.
  - 4. The Language Verification System of claim 1, wherein said matching means is implemented as a determination of nearest-neighbor Euclidian distances between said phonetic speech features and said stored reference phonetic speech features.
  - 5. The Language Verification System of claim 1, wherein said matching means is implemented in a multi-level artificial neural network using back error propagation.
  - 6. The Language Verification System of claim 5, wherein said multi-level artificial neural network operates at a first level to discriminate each of a plurality of languages at a single syllable level and at a second level to further separate said plurality of languages by sequential patterns.
  - 7. The Language Verification System of claim 5, wherein said back-error propagation neural network is implemented through a multiplicity of levels for a language discrimination, thereby causing said language verification system to be speaker independent.
  - 8. The Language Verification System of claim 1, wherein said means for extracting phonetic speech features and said matching means, in combination, are implemented in a Hidden Markov Model wherein a separate model is defined for each of a plurality of languages.
  - 9. The Language Verification System of claim 8, wherein said Hidden Markov Model is based on syllabic speech features and structure, and wherein said model is speaker independent.
  - 11. The Language Verification System of claim 3, wherein said detection of syllabic nuclei occurs through processing said input speech by a trained back error propagation multi-level neural network.
  - 12. The Language Verification System of claim 3, wherein feature information respecting said input speech is realized by encoding a plurality of frames of said input speech from a sequence of said frames proximate to said syllabic nuclei.
  - 21. The automatic language identification method of claim 2, wherein the detection of syllabic nuclei occurs through processing said input speech by a trained back error propagation multi-level neural network.
  - 22. The automatic language identification method of claim 2, wherein feature information respecting said input speech is realized by encoding a plurality of frames of said input speech from a sequence of said frames proximate to said syllabic nuclei.

10. In a Language Verification System comprising a means for processing spoken text into frames of speech, a means for detecting and extracting speech features from said frames of speech, matching means for comparing said speech features with stored references speech features and establishing a matched score for said comparison proportional to a degree similarity between said speech features and said stored reference speech features, and decision means for identifying input speech to said system as corresponding to one of a plurality of languages, the improvement therewith comprising:
- means operable with said means for detecting and extracting speech features that are syllabic nuclei, to identify phonetic speech;
  
  means operable with said matching means to establish a match score proportional to a degree of similarity between the phonetic speech features and stored reference phonetic speech features; and
  
  ,means operable with said decision means whereby said language identification for said input speech is established on the basis of a comparison of said matched scores with at least one predetermined threshold score associated with one of said plurality of languages, said means operable with said decision means encompasses a scoring methodology wherein multiple matched speakers within and across a multiplicity of languages are identified as to a language spoken based on a score selected from the group consisting of a minimum score, an average score and a combination minimum-average score.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The Language Verification System of claim 10, wherein said matching means is implemented as a determination of nearest-neighbor Euclidian distances between said phonetic speech features and said stored reference phonetic speech features.
  - 14. The Language Verification System of claim 10, wherein said matching means is implemented in a multi-level artificial neural network using back error propagation.
  - 15. The Language Verification System of claim 14, wherein said multi-level artificial neural network operates at a first level to discriminate each of a plurality of languages at a single syllable level and at a second level to further separate said plurality of languages by sequential patterns.
  - 16. The Language Verification System of claim 14, wherein said back-error propagation neural network is implemented through a multiplicity of levels for a language discrimination, thereby causing said language verification system to be speaker independent.
  - 17. The Language Verification System of claim 10, wherein said means for extracting phonetic speech features and said matching means, in combination, are implemented in a Hidden Markov Model wherein a separate model is defined for each of a plurality of languages.
  - 18. The Language Verification System of claim 17, wherein said Hidden Markov Model is based on syllabic speech features and structure, and wherein said model is speaker independent.
  - 19. The Language Verification System of claim 10, wherein said decision means encompasses a scoring methodology wherein multiple matched speakers within and across a multiplicity of languages are identified as to a language spoken based on a minimum score, an average score or a combination thereof.

20. A method for automatically identifying the language of a speaker as corresponding to one of a plurality of languages, including the steps of:
- processing spoken text, whereby said spoken text is converted into frames of speech and wherein variations in input speech signals extrinsic to those introduced by a speaker'"'"'s vocal tract are attenuated;
  
  detecting and extracting phonetic features that are syllabic nuclei from said frames of input speech;
  
  comparing said phonetic speech features with stored reference phonetic speech features and establishing a match score for said comparison proportional to a degree of similarity between said phonetic speech features and said stored references phonetic speech features; and
  
  identifying said input speech as corresponding to one of a plurality of languages, whereby said language identification for said input speech is established on the basis of a comparison of said match score with at least one predetermined threshold score associated with at least one of said plurality of languages, wherein said match score and said at least one predetermined threshold score are both of a type selected from the group consisting of a minimum score, an average score and a combination minimum-average score.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
- - 23. The automatic language identification method of claim 20, wherein said comparing step is implemented as a determination of nearest-neighbor Euclidian distances between said phonetic speech features and said stored reference phonetic speech features.
  - 24. The automatic language identification method of claim 20, wherein said comparing step is implemented in a multi-level artificial neural network using back error propagation.
  - 25. The automatic language identification method of claim 24, wherein said multi-level artificial neural network operates at a first level to discriminate each of a plurality of languages at a single syllable level and at a second level to further separate said plurality of languages by sequential patterns.
  - 26. The automatic language identification method of claim 24, wherein said syllabic-delayed back-error propagation neural network is implemented through a multiplicity of levels for a language discrimination, thereby causing said language verification system to be speaker independent.
  - 27. The automatic language identification method of claim 20, wherein the step of extracting phonetic speech features and the comparing step, in combination, are implemented in a Hidden Markov Model wherein a separate model is defined for each of a plurality of languages.
  - 28. The automatic language identification method of claim 27, wherein said Hidden Markov Model is based on syllabic speech features and structure, and wherein said model is speaker independent.
  - 29. The automatic language identification method of claim 20, wherein said identifying step encompasses a scoring methodology wherein multiple matched speakers within and across a multiplicity of languages are identified as to a language spoken based on a minimum score, an average score or a combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Micron Technology, Inc.
Original Assignee
ITT Corporation (ITT, Inc.)
Inventors
Li, Kung-Pu
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US08/672,306
Time in Patent Office

508 Days
Field of Search

395/2.4, 395/2.41, 395/2.52, 395/2.63, 395/2.64, 395/2.79
US Class Current

704/232
CPC Class Codes

G10L 15/005 Language recognition

G10L 25/30 using neural networks

Automatic language identification/verification system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic language identification/verification system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links