AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS
First Claim
1. A language identification engine, comprising:
- a front end module having an input configured to receive an audio stream consisting of a spoken language of at least one of a set of two or more potential languages being spoken in the audio stream under analysis;
a universal phoneme decoder that contains a universal phoneme set representing both
1) all phonemes occurring in the set of two or more spoken languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio stream in the set of two or more potential languages in which the universal phoneme decoder was trained on;
one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular spoken language based on an identified sequence of phonemes, wherein each statistical language model uses linguistic features from the identified phonemes from the universal phoneme decoder including the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of two or more spoken languages;
a bank of human language specific databases for the one or more statistical language models to reference, where of the databases were filled with phoneme and phoneme sequences being trained on for a particular language in the set of two or more spoken languages, and each of the human language specific databases received the phoneme and phoneme sequences from a phone output from the same universal phoneme decoder independent of which spoken language in the set of two or more potential languages was being trained on; and
the run-time language identifier module identifies a particular human language being spoken in the received audio stream in the set of two or more potential languages by utilizing the one or more statistical language models.
2 Assignments
0 Petitions
Accused Products
Abstract
A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language models (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the one or more SLMs that are based on the set of unique phoneme patterns created for each language.
103 Citations
21 Claims
-
1. A language identification engine, comprising:
-
a front end module having an input configured to receive an audio stream consisting of a spoken language of at least one of a set of two or more potential languages being spoken in the audio stream under analysis; a universal phoneme decoder that contains a universal phoneme set representing both
1) all phonemes occurring in the set of two or more spoken languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio stream in the set of two or more potential languages in which the universal phoneme decoder was trained on;one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular spoken language based on an identified sequence of phonemes, wherein each statistical language model uses linguistic features from the identified phonemes from the universal phoneme decoder including the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of two or more spoken languages; a bank of human language specific databases for the one or more statistical language models to reference, where of the databases were filled with phoneme and phoneme sequences being trained on for a particular language in the set of two or more spoken languages, and each of the human language specific databases received the phoneme and phoneme sequences from a phone output from the same universal phoneme decoder independent of which spoken language in the set of two or more potential languages was being trained on; and the run-time language identifier module identifies a particular human language being spoken in the received audio stream in the set of two or more potential languages by utilizing the one or more statistical language models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A continuous speech recognition engine hosted on a server cooperating with a language identification engine that improves an accuracy of probability estimates, comprising:
-
an input to receive supplied audio files from a client machine over a wide area network to the server hosting the continuous speech recognition engine;
a user interface to the continuous speech recognition engine; anda language identification engine, wherein the continuous speech recognition engine receives an input from the language identification engine to identify the human language being spoken in the received audio files;
wherein the a language identification engine at least includesa front end module having an input configured to receive an audio files consisting of a spoken language of at least one of a set of two or more potential languages being spoken in the audio files under analysis, a universal phoneme decoder that contains a universal phoneme set representing both
1) all phonemes occurring in the set of two or more spoken languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the universal phoneme decoder was trained on,one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular spoken language based on an identified sequence of phonemes, wherein each statistical language model uses linguistic features from the identified phonemes from the universal phoneme decoder including the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of two or more spoken languages, a bank of human language specific databases for the one or more statistical language models to reference, where of the databases were filled with phoneme and phoneme sequences being trained on for a particular language in the set of two or more spoken languages, and each of the human language specific databases received the phoneme and phoneme sequences from a phone output from the same universal phoneme decoder independent of which spoken language in the set of two or more potential languages was being trained on, and the run-time language identifier module identifies a particular human language being spoken in the received audio files in the set of two or more potential languages by utilizing the linguistic probabilities supplied by the one or more statistical language models that are based on the set of unique phoneme patterns created for each language. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A method of for language identification, comprising:
-
receiving an audio stream consisting of a spoken language of at least one of a set of two or more potential languages being spoken in the audio stream under analysis; identifying a most likely phoneme occurring each time in the audio stream under analysis in the set of two or more potential languages with a universal phoneme decoder that contains a universal phoneme set representing both
1) all phonemes occurring in the set of two or more spoken languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated;supply probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular spoken language based on an identified sequence of phonemes from one or more statistical language models, wherein each statistical language model uses linguistic features from the identified phonemes from the universal phoneme decoder including the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of two or more spoken languages; referencing a first human language specific database with a first statistical language models, where of the first human language specific database was filled with phoneme and phoneme sequences being trained on for a particular language in the set of two or more spoken languages during a training phase, and each of the human language specific databases that includes the first human language specific database received the phoneme and phoneme sequences from a phone output from the same universal phoneme decoder independent of which spoken language in the set of two or more potential languages was being trained on, however, each human language specific database was filled one at a time on a per specific language basis; and at run-time, identifying a particular human language being spoken in the received audio stream in the set of two or more potential languages by utilizing the linguistic probabilities supplied by the one or more statistical language models that are based on the set of unique phoneme patterns created for each language. - View Dependent Claims (19, 20, 21)
-
Specification