Automatic spoken language identification based on phoneme sequence patterns
First Claim
1. A language identification engine, comprising:
- a front-end module having an input to receive an audio stream;
a universal phoneme decoder to identify phonemes and phoneme sequences in the audio stream in each of two or more candidate languages;
a run-time language identifier module to receive the phonemes and phoneme sequences identified by the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify a particular human language being spoken in the received audio stream from the two or more candidate languages based on the confidence ratings; and
a processor to implement the modules making up the language identification engine.
2 Assignments
0 Petitions
Accused Products
Abstract
A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
-
Citations
20 Claims
-
1. A language identification engine, comprising:
-
a front-end module having an input to receive an audio stream; a universal phoneme decoder to identify phonemes and phoneme sequences in the audio stream in each of two or more candidate languages; a run-time language identifier module to receive the phonemes and phoneme sequences identified by the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify a particular human language being spoken in the received audio stream from the two or more candidate languages based on the confidence ratings; and a processor to implement the modules making up the language identification engine. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method to identify spoken words in a human language with a language identification engine, comprising:
-
receiving an audio stream; identifying, by a universal phoneme decoder, phonemes in the audio stream in each of two or more languages; generating as an output from the universal phoneme decoder one or more streams of identified phonemes for each of the two or more languages with an associated confidence rating on an accuracy of the identification of the language for each stream, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more languages, and a second stream of phonemes from the identified phonemes for a second of the two or more languages; and identifying a most likely particular human language being spoken in the received audio stream in the one or more streams of phonemes outputted from the universal phoneme decoder based on a set of unique phoneme patterns created for each language by the universal phoneme decoder and the confidence ratings. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system including a continuous speech recognition engine hosted on a server that cooperates with a language identification engine, comprising:
-
an input to receive supplied audio files from a client machine over a wide area network to the server hosting the continuous speech recognition engine; and wherein the language identification engine includes a front end module having an input to receive the supplied audio files, a universal phoneme decoder to identify phonemes and phoneme sequences in the audio files in each of two or more candidate languages, and a run-time language identifier module to receive the phonemes and phoneme sequences from the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify at least one of a particular spoken human language and a specific dialect of a spoken human language being spoken in the supplied audio files based on the confidence ratings. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification