Multilingual database creation system and method
First Claim
1. A method for creating a cross-idea association database comprising:
- providing a first document in a first language and a second document in a second language, wherein said documents include parallel or comparable text with respect to each other;
locating in the first document all occurrances of a recurring word string;
translating the recurring word string into the second language to produce a recurring word string tranlation;
defining initial testing ranges in the second document corresponding to occurrances of the recurring word string in the first document, wherein the initial testing ranges include a desired number of words;
comparing words in the recurring word string translation with words in the initial testing ranges to identify matching words; and
increasing the number of words in the initial testing ranges to form expanded testing ranges and comparing words in the recurring word string translation with words in the expanded testing ranges to identify matching words;
identifying the expanded testing range as the final range if the number of matching words in the expanded testing ranges is not greater than the number of matching words in the initial testing ranges.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for creating a cross-idea database for use in translating documents from a first language into a second language. The database associates words and word strings in the first language with words and word strings in the second language. The method for creating the database includes translating a word or a word string in a first document in the first language into the second language using a known translator. Then, the translated word or word string is compared with a range of words or word strings in a second document, the second document being in the second language. The database provides information on the frequency with which words in the first language are associated with words in the second language. The method includes adjusting the number of words in the range to obtain an optimal range size for efficiently and accurately creating the cross-idea database.
-
Citations
3 Claims
-
1. A method for creating a cross-idea association database comprising:
-
providing a first document in a first language and a second document in a second language, wherein said documents include parallel or comparable text with respect to each other;
locating in the first document all occurrances of a recurring word string;
translating the recurring word string into the second language to produce a recurring word string tranlation;
defining initial testing ranges in the second document corresponding to occurrances of the recurring word string in the first document, wherein the initial testing ranges include a desired number of words;
comparing words in the recurring word string translation with words in the initial testing ranges to identify matching words; and
increasing the number of words in the initial testing ranges to form expanded testing ranges and comparing words in the recurring word string translation with words in the expanded testing ranges to identify matching words;
identifying the expanded testing range as the final range if the number of matching words in the expanded testing ranges is not greater than the number of matching words in the initial testing ranges.
-
-
2. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
locating in a first document all occurrances of a recurring word string, wherein said first document is in a first language;
translating the recurring word string into a second language to produce a recurring word string tranlation;
defining initial testing ranges in a second document corresponding to occurrances of the recurring word string in the first document, wherein the second document is in the second language and includes parallel text or comparable text with respect to the first document, and wherein the initial testing ranges include a desired number of words;
comparing words in the recurring word string translation with words in the initial testing ranges to identify matching words; and
increasing the number of words in the initial testing ranges to form expanded testing ranges and comparing words in the recurring word string translation with words in the expanded testing ranges to identify matching words;
identifying the expanded testing range as the final range if the number of matching words in the expanded testing ranges is not greater than the number of matching words in the initial testing ranges.
-
-
3. A computer readable data storage medium having stored thereon a computer executable program for:
-
locating in a first document all occurrances of a recurring word string, wherein said first document is in a first language;
translating the recurring word string into a second language to produce a recurring word string tranlation;
defining initial testing ranges in a second document corresponding to occurrances of the recurring word string in the first document, wherein the second document is in the second language and includes parallel text or comparable text with respect to the first document, and wherein the initial testing ranges include a desired number of words;
comparing words in the recurring word string translation with words in the initial testing ranges to identify matching words; and
increasing the number of words in the initial testing ranges to form expanded testing ranges and comparing words in the recurring word string translation with words in the expanded testing ranges to identify matching words;
identifying the expanded testing range as the final range if the number of matching words in the expanded testing ranges is not greater than the number of matching words in the initial testing ranges.
-
Specification