Compact encoding of multi-lingual translation dictionaries
First Claim
1. A computer-implemented process for translating words or phrases in a first language to a corresponding word or phrase in a second language, comprising the steps:
- (a) defining an abstract language comprising a set of concept groups, each concept group in the set comprising a set of words and phrases of the abstract language that have related meanings,(b) separating the words and phrases of the first language and the second language into a set of concept groups corresponding in meaning to those of the abstract language, each word and phrase in the first and second languages being represented by a number,(c) determining the words or phrases of the first language that match the word or phrase to be translated,(d) determining the concept group of each word or phrase found in step (c),(e) determining a translation vector associated with each word or phrase found in step (c) in the concept group determined in step (d),(f) determining the concept group of the second language corresponding to that determined in step (d),(g) comparing the translation vector determined in step (e) with a translation vector of each of the words and phrases in the corresponding concept group of said second language determined in step (f),(h) reporting the word or phrase of said second language as the translation of the word or phrase in the first language when their associated translation vectors match at least in part,(i) steps (c) through (h) being implemented by a computer.
3 Assignments
0 Petitions
Accused Products
Abstract
A computerized multilingual translation dictionary includes a set of word and phrases for each of the languages it contains, plus a mapping that indicates for each word or phrase in one language what the corresponding translations in the other languages are. The set of words and phrases for each language are divided up among corresponding concept groups based on an abstract pivot language. The words and phrases are encoded as token numbers assigned by a word-number mapper laid out in sequence that can be searched fairly rapidly with a simple linear scan. The complex associations of words and phrases to particular pivot language senses are represented by including a list of pivot-language sense numbers with each word or phrase. The preferred coding of these sense numbers is by means of a bit vector for each word, where each bit corresponds to a particular pivot element in the abstract language, and the bit is ON if the given word is a translation of that pivot element. Then, to determine whether a word in language 1 translates to a word in language 2 only requires a bit-wise intersection of their associated bit-vectors. Each word or phrase is prefixed by its bit-vector token number, so the bit-vector tokens do double duty: they also act as separators between the tokens of one phrase and those of another. A pseudo-Huffman compression scheme is used to reduce the size of the token stream. Because of the frequency skew for the bit-vector tokens, this produces a very compact encoding.
142 Citations
23 Claims
-
1. A computer-implemented process for translating words or phrases in a first language to a corresponding word or phrase in a second language, comprising the steps:
-
(a) defining an abstract language comprising a set of concept groups, each concept group in the set comprising a set of words and phrases of the abstract language that have related meanings, (b) separating the words and phrases of the first language and the second language into a set of concept groups corresponding in meaning to those of the abstract language, each word and phrase in the first and second languages being represented by a number, (c) determining the words or phrases of the first language that match the word or phrase to be translated, (d) determining the concept group of each word or phrase found in step (c), (e) determining a translation vector associated with each word or phrase found in step (c) in the concept group determined in step (d), (f) determining the concept group of the second language corresponding to that determined in step (d), (g) comparing the translation vector determined in step (e) with a translation vector of each of the words and phrases in the corresponding concept group of said second language determined in step (f), (h) reporting the word or phrase of said second language as the translation of the word or phrase in the first language when their associated translation vectors match at least in part, (i) steps (c) through (h) being implemented by a computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented process for translating words or phrases in a first language to a corresponding word or phrase in a second language, comprising the steps:
-
(a) creating an abstract language comprising a set of concept groups, each concept group in the set comprising a set of words and phrases of the abstract language that have related meanings, (b) separating the words and phrases of the first language and the second language into a set of concept groups corresponding in meaning to those of the abstract language, (c) determining the words or phrases of the first language that match the word or phrase to be translated, (d) determining the concept group of each word or phrase found in step (c), (e) determining a first translation vector associated with each word or phrase found in step (c) in the concept group determined in step (d), (f) determining the concept group of the second language corresponding to that determined in step (d), (g) determining a plurality of second translation vectors each associated with one word or phrase of the concept group determined in step (f), (h) said translation vectors being bit vectors comprising at least one of a plurality of bits with each bit representing one possible sense of a set of senses associated with each concept group, (i) comparing the first translation vector determined in step (e) with each of the second translation vectors determined in step (g), (j) reporting the word or phrase of said second language as the translation of the word or phrase in the first language when their associated translation vectors match at least in part, (k) steps (c) through (j) being implemented by a computer. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer-implemented process for translating words or phrases in a source first language to a corresponding word or phrase in at least one other target second language, comprising the steps:
-
(a) defining an abstract third language comprising a set of concept groups, each concept group in the set comprising a set of words and phrases of the abstract language that have related meanings, (b) separating the words and phrases of the source first language and target second language into a set of concept groups corresponding in meaning to those of the abstract third language, (c) constructing and storing a number and a bit vector representing each word or phrase in said source first and said target second languages and a sense of each word or phrase, respectively, (d) comparing the bit vector of the word or phrase of the source first language to be translated with the bit vectors of the words and phrases in the corresponding concept group of said target second language, (e) reporting the word or phrase of said target second language as the translation of the word or phase in the source first language when their associated bit vectors have at least one corresponding matched bit, (f) steps (c) through (e) being implemented by a computer. - View Dependent Claims (18, 19, 20)
-
-
21. A computer-implemented process for translating words or phrases in a first language to a corresponding word or phrase in a second language, comprising the steps:
-
(a) providing first and second files containing words and phrases in the first and second languages, respectively, each said file representing the sense of each of said words and phrases by a respective first and second translation vector associated therewith, said words and phrases being divided among one or more corresponding concept groups that have related meanings, said translation vectors being bit vectors comprising at least one of a plurality of bits with each bit representing one possible sense of a set of senses associated with each concept group, (b) inputting the word or phrase in the first language to be translated into the word or phrase of the second language, (c) scanning the first file to determine a word or phrase therein matching the inputted word or phrase and the concept group of said matching word or phrase, (d) upon finding a matching word or phrase in the first file, determining its first translation vector, (e) scanning the corresponding concept group of the second file searching for a second translation vector that indicates a match to the first translation vector determined in step (d), (f) upon finding a second translation vector in the second file in step (e), reporting the word or phrase associated therewith as the translation of the inputted word or phrase, (g) steps (c) through (f) being implemented by a computer. - View Dependent Claims (22, 23)
-
Specification