Fast, efficient hardware mechanism for natural language determination
First Claim
1. A system for identifying a language in which a document is written, comprising:
- a plurality of sets of word tables, each set of word tables for storing a selected set of most frequently used words in a respective candidate language according to letter pairs in the words in wherein each of the word tables is an N×
N bit table, wherein each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language;
a set of table access registers, each table access register for a respective candidate language, for accessing a respective set of word tables to compare words from the document to words stored in the word tables;
a set of word counting registers, each word counting register for counting a number of matches for a respective candidate language; and
a comparator for selecting a candidate language which corresponds to the word counting register having the highest count as the language in which the document is written.
1 Assignment
0 Petitions
Accused Products
Abstract
A language in which a document is written is identified by comparing the words of a document to the most frequently used words in a plurality of candidate languages. The words are stored in a plurality of sets of word tables, each set of word tables for storing a selected set of most frequently used words in a respective candidate language according to letter pairs in the words. In the preferred embodiment, each of the word tables is an N×N bit table, where each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language. A set of table access registers, is used for accessing a respective set of word tables to compare words from the document to words stored in the word tables; each table access register accesses word tables for a respective candidate language. One or more word counting registers count a number of matches for a respective candidate language. A comparator selects a candidate language which corresponds to the word counting register having the highest count as the language in which the document is written.
74 Citations
17 Claims
-
1. A system for identifying a language in which a document is written, comprising:
-
a plurality of sets of word tables, each set of word tables for storing a selected set of most frequently used words in a respective candidate language according to letter pairs in the words in wherein each of the word tables is an N×
N bit table, wherein each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language;a set of table access registers, each table access register for a respective candidate language, for accessing a respective set of word tables to compare words from the document to words stored in the word tables; a set of word counting registers, each word counting register for counting a number of matches for a respective candidate language; and a comparator for selecting a candidate language which corresponds to the word counting register having the highest count as the language in which the document is written. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for identifying a language in which a document is written, comprising:
-
a set of word tables for storing a selected set of most frequently used words in a candidate language according to letter pairs in the words, wherein each of the word tables is an N×
N bit table, wherein each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language;a table access register for accessing a respective set of word tables to compare words from the document to words stored in the word tables; and a word counting register for counting a number of matches for a candidate language. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method for identifying a language in which a document is written, comprising the steps of:
-
storing a plurality of sets of word tables, each set of word tables for storing a selected set of most frequently used words in a respective candidate language according to letter pairs in the words, wherein each of the word tables is an N×
N bit table, wherein each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language;accessing the word tables with a set of table access registers, each table access register for accessing a respective set of word tables to compare words from the document to words stored in the word tables; counting a number of matches with a set of word counting registers, each word counting register for counting a number of matches for a respective candidate language; and selecting a candidate language which corresponds to the word counting register having the highest count as the language in which the document is written. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification