Automatic spoken language identification based on phoneme sequence patterns
First Claim
1. A language identification engine, comprising:
- a front-end module having an input configured to receive an audio stream that corresponds to at least one of a set of two or more candidate languages being spoken in the audio stream under analysis;
a universal phoneme decoder that contains a universal phoneme set that
1) represents all phonemes occurring in the set of two or more candidate languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring for phonemes in the audio stream in the set of two or more candidate languages;
one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular candidate language based on an identified sequence of phonemes;
wherein the run-time language identifier module identifies a particular human language being spoken in the received audio stream from the set of two or more candidate languages by utilizing the one or more statistical language models, which have been trained by the universal phoneme decoder; and
wherein the modules making up the language identification engine are implemented in electronic circuitry, software coding, and any combination of the two, where portions implemented in software coding are stored in an executable format by a processor on a non-transitory machine-readable medium.
4 Assignments
0 Petitions
Accused Products
Abstract
A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
-
Citations
18 Claims
-
1. A language identification engine, comprising:
-
a front-end module having an input configured to receive an audio stream that corresponds to at least one of a set of two or more candidate languages being spoken in the audio stream under analysis; a universal phoneme decoder that contains a universal phoneme set that
1) represents all phonemes occurring in the set of two or more candidate languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring for phonemes in the audio stream in the set of two or more candidate languages;one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular candidate language based on an identified sequence of phonemes; wherein the run-time language identifier module identifies a particular human language being spoken in the received audio stream from the set of two or more candidate languages by utilizing the one or more statistical language models, which have been trained by the universal phoneme decoder; and wherein the modules making up the language identification engine are implemented in electronic circuitry, software coding, and any combination of the two, where portions implemented in software coding are stored in an executable format by a processor on a non-transitory machine-readable medium. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method to identify spoken words in a human language with a language identification engine, comprising:
-
receiving an audio stream that includes a spoken language of at least one or more unidentified human languages being spoken in the audio stream under analysis; identifying a most likely phoneme occurring each time in the audio stream under analysis with a universal phoneme decoder that
1) contains a universal phoneme set representing phonemes occurring in a set of two or more spoken languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated;generating as an output from the universal phoneme decoder one or more streams of identified phonemes with associated confidence ratings on an accuracy of the identification, where a first stream of identified phonemes is customized to at least one of
1) a particular spoken human language and
2) a specific dialect of a spoken human language, and the first stream contains one or more estimations for identifying the spoken phonemes in that particular spoken human language or specific dialect along with a confidence rating, where a second stream of identified phonemes is customized to at least one of
1) a particular spoken human language and
2) a specific dialect of a spoken human language, and the spoken human language or dialect chosen for the second stream is different than the first stream; andat run-time, identifying a most likely particular human language being spoken in the received audio stream in the one or more streams of phonemes outputted from the universal phoneme decoder by utilizing linguistic probabilities supplied by one or more statistical language models that are based on the set of unique phoneme patterns created for each language by the universal phoneme decoder. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A system including a continuous speech recognition engine hosted on a server that cooperates with a language identification engine in order to improve an accuracy of probability estimates, comprising:
-
an input to receive supplied audio files from a client machine over a wide area network to the server hosting the continuous speech recognition engine; and wherein the language identification engine at least includes a front end module having an input configured to receive the supplied audio files that include a spoken language of at least one of a set of two or more candidate languages being spoken in the supplied audio files under analysis, a universal phoneme decoder that
1) contains a universal phoneme set representing the phonemes occurring in the set of two or more spoken languages, and
2) captures phoneme correspondences between languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more candidate languages in which the universal phoneme decoder was trained on,one or more statistical language models having logic configured to supply to a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular spoken language based on an identified sequence of phonemes, and where the run-time language identifier module is configured to identify from the set of two or more candidate languages at least one of
1) a particular spoken human language and
2) a specific dialect of a spoken human language being spoken in the supplied audio files by utilizing the linguistic probabilities supplied by the one or more statistical language models that are based on the set of unique phoneme patterns created for each language by the universal phoneme decoder, wherein the modules making up the language identification engine are implemented in electronic circuits, software coding, and any combination of the two, where portions implemented in software coding are stored in an executable format by a processor on a non-transitory machine-readable medium. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computing device assisted method to identify spoken words in a human language with a language identification engine, comprising:
-
receiving in the language identification engine an audio stream that corresponds to at least one of a set of two or more candidate languages being spoken in the audio stream under analysis; identifying a most likely phoneme occurring each time in the audio stream under analysis with a universal phoneme decoder that uses a universal phoneme set that
1) represents all phonemes occurring in the set of two or more candidate languages, and
2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring for phonemes in the audio stream in the set of two or more candidate languages;supplying a run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from a particular candidate language based on an identified sequence of phonemes from one or more statistical language models; using a run-time language identifier module in the language identification engine to identify a particular human language being spoken in the received audio stream from the set of two or more candidate languages by utilizing the one or more statistical language models, which have been trained by the universal phoneme decoder; and wherein any modules making up the language identification engine are implemented in electronic circuitry, software coding, and any combination of the two, where portions implemented in software coding are stored in an executable format by a processor on a non-transitory machine-readable medium.
-
Specification