Method for identifying the language of individual words
First Claim
1. A computer implemented method of determining if a word is from a target language comprising the steps of:
- decomposing the word into a plurality of non-overlapping n-grams covering the entire word without gaps and without crossing word boundaries and including a first n-gram, one or more following n-grams, if present, and a last n-gram, determining if the first n-gram, one or more of the following n-grams, if present, and the last n-gram match non-overlapping n-gram patterns characteristic of words in the target language, and identifying the word as from the target language if the plurality of non-overlapping n-grams match the non-overlapping n-gram patterns characteristic of words in the target language.
2 Assignments
0 Petitions
Accused Products
Abstract
The method of recognizing the language of a single word as to spelling and grammar correction (e.g., identifying the appropriate language resources on a document, paragraph, sentence or even individual word basis), the automatic invocation of transliteration software based on the language of the words (e.g., automatic ASCII to Kanji substitution without requiring the user to explicitly switch into a Kanji mode), the automatic invocation of appropriate machine translation tools when the document'"'"'s language is different from the user'"'"'s native tongue(s), the use of document language identification to eliminate from database or web search results any documents which are not written in the user'"'"'s native language and the automatic identification of user-appropriate languages for the user interface.
-
Citations
12 Claims
-
1. A computer implemented method of determining if a word is from a target language comprising the steps of:
-
decomposing the word into a plurality of non-overlapping n-grams covering the entire word without gaps and without crossing word boundaries and including a first n-gram, one or more following n-grams, if present, and a last n-gram, determining if the first n-gram, one or more of the following n-grams, if present, and the last n-gram match non-overlapping n-gram patterns characteristic of words in the target language, and identifying the word as from the target language if the plurality of non-overlapping n-grams match the non-overlapping n-gram patterns characteristic of words in the target language. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
repeating the steps for each word in a sequence of words of not more than about five words, and selecting an appropriate language resource if at least one word in the sequence of words is identified as being of the target language.
-
-
10. The method of claim 1, further including the steps of:
-
repeating the steps for each word in a sequence of words of not more than about five words, and selecting a language of a computer user interface if at least one word in the sequence of words is identified as being of the target language.
-
-
11. The method of claim 1, further including the steps of:
-
repeating the steps for each word in a sequence of words of not more than about five words, and selecting a source language of a computer translation program if at least one word in the sequence of words is identified as being of the target language.
-
-
12. The method of claim 1, further including the steps of:
-
repeating the steps for each word in a sequence of words of not more than about five words in a document query, and selecting a language of documents to be retrieved in an information retrieval system if at least one word from the document query is identified as being of the target language.
-
Specification