Automatic spoken language identification based on phoneme sequence patterns

US 8,781,812 B2
Filed: 03/18/2013
Issued: 07/15/2014
Est. Priority Date: 08/04/2009
Status: Active Grant

First Claim

Patent Images

1. A language identification engine, comprising:

a front-end module having an input to receive an audio stream;

a universal phoneme decoder to identify phonemes and phoneme sequences in the audio stream in each of two or more candidate languages;

a run-time language identifier module to receive the phonemes and phoneme sequences identified by the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify a particular human language being spoken in the received audio stream from the two or more candidate languages based on the confidence ratings; and

a processor to implement the modules making up the language identification engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.

Citations

20 Claims

1. A language identification engine, comprising:
- a front-end module having an input to receive an audio stream;
  
  a universal phoneme decoder to identify phonemes and phoneme sequences in the audio stream in each of two or more candidate languages;
  
  a run-time language identifier module to receive the phonemes and phoneme sequences identified by the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify a particular human language being spoken in the received audio stream from the two or more candidate languages based on the confidence ratings; and
  
  a processor to implement the modules making up the language identification engine.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The language identification engine of claim 1, wherein, to identify the particular human language being spoken in the audio stream, the run-time language identifier module is to query one or more statistical language models cooperating with human language specific databases filled in a training process to observe enough phoneme sequences that correspond to spoken audio so that the run-time language identifier module is able to identify one of the two or more candidate languages.
  - 3. The language identification engine of claim 1, further comprising:
    - a language ID trainer coupled to the universal phoneme decoder to analyze the phonemes and phoneme sequences identified by the universal phoneme decoder, and fill human language specific databases used by one or more statistical language models for each candidate language on a per language basis.
  - 4. The language identification engine of claim 1, wherein the universal phoneme decoder during a training phase is applied to each candidate language in the two or more candidate languages to identify phonemes and phoneme sequences.
  - 5. The language identification engine of claim 4, wherein the phonemes and phoneme sequences identified by the universal phoneme decoder in the training phase are modeled using discrete Markov models.
  - 6. The language identification engine of claim 1 comprising:
    - statistical language models to supply to the run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from one of the candidate languages, wherein the particular human language being spoken is identified based on the statistical language models.

7. A method to identify spoken words in a human language with a language identification engine, comprising:
- receiving an audio stream;
  
  identifying, by a universal phoneme decoder, phonemes in the audio stream in each of two or more languages;
  
  generating as an output from the universal phoneme decoder one or more streams of identified phonemes for each of the two or more languages with an associated confidence rating on an accuracy of the identification of the language for each stream, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more languages, and a second stream of phonemes from the identified phonemes for a second of the two or more languages; and
  
  identifying a most likely particular human language being spoken in the received audio stream in the one or more streams of phonemes outputted from the universal phoneme decoder based on a set of unique phoneme patterns created for each language by the universal phoneme decoder and the confidence ratings.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The method of claim 7, further comprising:
    - identifying a most likely phoneme sequence in the audio stream for each of the two or more languages and dialects being trained on with the universal phoneme decoder, where the universal phoneme decoder during a training phase outputs phonemes and phoneme sequences for that language or dialect being trained on and those phonemes and phoneme sequences are stored into an associated human language specific database.
  - 9. The method of claim 7, further comprising:
    - converting the received audio stream into time coded feature frames for language identification,recognizing, by the universal phoneme decoder, the time coded feature frames as a sequence of phonemes, together with start/end time associated with each feature frame, andidentifying, by the universal phoneme detector, the phonemes uttered in each of the two or more languages.
  - 10. The method of claim 9, further comprising:
    - supplying a run-time language identifier module with the phoneme sequence from the universal phoneme decoder in the time coded feature frames, and determining a most probable candidate language based on a language identifying algorithm making use of a set of unique phoneme patterns to each candidate language.
  - 11. The method of claim 7, further comprising:
    - loading language identification parameters for each of the two or more languages to be identified into a run-time language identifier module during a run-time language identification phase,wherein a new utterance is compared to language-dependent statistical models, and a likelihood that a spoken language of uttered phonemes and phoneme sequences matches the two or more languages used to train the language-dependent statistical models is calculated by the run-time language identifier module.
  - 12. The method of claim 11, further comprising:
    - selecting one of the two or more languages as a match to an unknown language being spoken in the audio stream.
  - 13. The method of claim 7,wherein the first stream of phonemes is customized to at least one of the first candidate language and a specific dialect of the first candidate language, andthe second stream of phonemes is customized to at least one of the second candidate language and a specific dialect of the second candidate language, wherein the language or dialect of the second stream is different from the language or dialect of the first stream.
  - 14. The method of claim 7, comprising:
    - storing statistical language models to supply probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from one of the languages, wherein identifying a most likely particular human language being spoken comprises based on an identified sequence of phonemes, wherein the particular human language being spoken is identified based on the statistical language models.

15. A system including a continuous speech recognition engine hosted on a server that cooperates with a language identification engine, comprising:
- an input to receive supplied audio files from a client machine over a wide area network to the server hosting the continuous speech recognition engine; and
  
  wherein the language identification engine includesa front end module having an input to receive the supplied audio files,a universal phoneme decoder to identify phonemes and phoneme sequences in the audio files in each of two or more candidate languages, anda run-time language identifier module to receive the phonemes and phoneme sequences from the universal phoneme decoder, generate as an output from the universal phoneme decoder a stream of the identified phonemes and phoneme sequences for each of the two or more candidate languages, wherein the streams include a first stream of phonemes from the identified phonemes for a first of the two or more candidate languages, and a second stream of phonemes from the identified phonemes for a second of the two or more candidate languages, determine a confidence rating on an accuracy of an identification of the first candidate language of the two or more candidate languages for the first stream and an accuracy of an identification of the second candidate language of the two or more candidate languages for the second stream, and identify at least one of a particular spoken human language and a specific dialect of a spoken human language being spoken in the supplied audio files based on the confidence ratings.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, further comprising:
    - a set of two or more human language specific databases, wherein the databases are to receive phoneme and phoneme sequences for a particular language in the two or more candidate languages from the universal phoneme decoder.
  - 17. The system of claim 15, wherein a language identification algorithm in the run-time language identifier module includes a second order discrete Markov model with a dialogue structure and branch logic, and the language identification algorithm uses the second order Markov model based on a set of phoneme and phoneme sequences associated with a particular language.
  - 18. The system of claim 15, further comprising:
    - a query input to receive query words of interest from a user of the client machine to a user interface of the continuous speech engine, and an intelligence engine to identify words from the query words and to return a hierarchical rank list of recognized words.
  - 19. The system of claim 15, wherein the continuous speech recognition engine further comprises:
    - a triggering and synchronization module to analyze call center audio conversations and identify when certain words of interest are spoken, wherein the triggering and synchronization module is to direct a user on the client machine to a time segment containing those words matching the words of interest and allow the user to listen to a segment of the supplied audio files associated with when those words of interest are spoken in the supplied audio files.
  - 20. The system of claim 15, comprising:
    - statistical language models to supply to the run-time language identifier module probabilities of how linguistically likely a particular uttered phoneme identified by the universal phoneme decoder comes from one of the candidate languages, wherein the particular human language being spoken is identified based on the statistical language models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Micro Focus IP Development Ltd. (Open Text Corporation)
Original Assignee
Longsand Limited (Open Text Corporation)
Inventors
Kadirkamanathan, Mahapathy, Waple, Christopher John
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/846,316
Publication Number

US 20130226583A1
Time in Patent Office

484 Days
Field of Search

704 2- 8
US Class Current

704/8
CPC Class Codes

G10L 15/005 Language recognition

G10L 15/187 Phonemic context, e.g. pron...

Automatic spoken language identification based on phoneme sequence patterns

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic spoken language identification based on phoneme sequence patterns

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links