Knowledge system method and appparatus
First Claim
Patent Images
1. A method for acquiring a knowledge base of associated ideas comprising the steps of:
- providing a pair of documents representing the same idea in two different languages, wherein the first of said pair of documents is expressed in a first language, and the second of said pair of documents is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first of said pair of documents to identify all occurrences of said query in said first of said pair of documents;
selecting a plurality of ranges of words in said second of said pair of documents, wherein said selected ranges correspond to the occurrences of said query in said first of said pair of documents;
calculating the frequency of words and word strings contained in said selected ranges;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for automating the acquisition, reconstruction, and generation of knowledgebases of associated ideas and using such knowledgebases in many applications including machine translation of human languages, search and retrieval of unstructured text, or other data, based on concept search, voice recognition, data compression, and artificial intelligence systems.
-
Citations
236 Claims
-
1. A method for acquiring a knowledge base of associated ideas comprising the steps of:
-
providing a pair of documents representing the same idea in two different languages, wherein the first of said pair of documents is expressed in a first language, and the second of said pair of documents is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first of said pair of documents to identify all occurrences of said query in said first of said pair of documents;
selecting a plurality of ranges of words in said second of said pair of documents, wherein said selected ranges correspond to the occurrences of said query in said first of said pair of documents;
calculating the frequency of words and word strings contained in said selected ranges;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (2)
-
-
3. A method for acquiring a knowledge base of associated ideas comprising the steps of:
-
providing a plurality of document pairs representing the same idea in two different languages, wherein one set of said plurality of document pairs is expressed in a first language, and a second set of said plurality of document pairs is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first set of said plurality pairs to identify all occurrences of said query in said first set;
selecting a plurality of ranges of words in said second set of said plurality pairs, wherein said selected ranges correspond to the occurrences of said query in said first set;
calculating the frequency of words and word strings contained in said selected ranges, tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (4)
-
-
5. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a pair of documents representing the same idea in two different languages, wherein the first of said pair of documents is expressed in a first language, and the second of said pair of documents is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first of said pair of documents to identify all occurrences of said query in said first of said pair of documents;
selecting a plurality of ranges of words in said second of said pair of documents, wherein said selected ranges correspond to the occurrences of said query in said first of said pair of documents;
calculating the frequency of words and word strings contained in said selected ranges;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulating frequency. - View Dependent Claims (6)
-
-
7. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a plurality of document pairs representing the same idea in two different languages, wherein one set of said plurality of document pairs is expressed in a first language, and a second set of said plurality of document pairs is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first set of said plurality pairs to identify all occurrences of said query in said first set;
selecting a plurality of ranges of words in said second set of said plurality pairs, wherein said selected ranges correspond to the occurrences of said query in said first set;
calculating the frequency of words and word strings contained in said selected ranges, wherein said frequency is based on occurrences of all unique words and word strings;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (8)
-
-
9. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a pair of documents representing the same idea in two different languages, wherein the first of said pair of documents is expressed in a first language, and the second of said pair of documents is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first of said pair of documents to identify all occurrences of said query in said first of said pair of documents;
selecting a plurality of ranges of words in said second of said pair of documents, wherein said selected ranges correspond to the occurrences of said query in said first of said pair of documents;
calculating the frequency of words and word strings contained in said selected ranges;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (10)
-
-
11. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a plurality of document pairs representing the same idea in two different languages, wherein one set of said plurality of document pairs is expressed in a first language, and a second set of said plurality of document pairs is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first set of said plurality pairs to identify all occurrences of said query in said first set;
selecting a plurality of ranges of words in said second set of said plurality pairs, wherein said selected ranges correspond to the occurrences of said query in said first set;
calculating the frequency of words and word strings contained in said selected ranges, tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (12)
-
-
13. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association; and
tokenizing said association by designating a token to be equal to said association;
wherein creating an association includes, providing a pair of documents representing the same idea in two different languages, wherein the first of said pair of documents is expressed in a first language, and the second of said pair of documents is expressed in a second language, receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first of said pair of documents to identify all occurrences of said query in said first of said pair of documents, selecting a plurality of ranges of words in said second of said pair of documents, wherein said selected ranges correspond to the occurrences of said query in said first of said pair of documents, calculating the frequency of words and word strings contained in said selected ranges omitting the occurrence of a word or word string if the word or word string is a subset of a longer word string that occurs in more than one of the selected ranges, tabulating said frequency based on occurrences of all unique words and word strings from said calculating step, and returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (14)
-
-
15. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association; and
tokenizing said association by designating a token to be equal to said association;
wherein creating an association includes,providing a plurality of document pairs representing the same idea in two different languages, wherein one set of said plurality of document pairs is expressed in a first language, and a second set of said plurality of document pairs is expressed in a second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word or word string;
analyzing said first set of said plurality pairs to identify all occurrences of said query in said first set;
selecting a plurality of ranges of words in said second set of said plurality pairs, wherein said selected ranges correspond to the occurrences of said query in said first set;
calculating the frequency of words and word strings contained in said selected ranges, omitting the occurrence of a word or word string if the word or word string is a subset of a longer word string that occurs in more than one of the selected ranges;
tabulating said frequency based on occurrences of all unique words and word strings from said calculating step; and
returning a list of occurrences of all unique words and word strings if said unique words and word strings occur in more than one of the selected ranges using said tabulated frequency. - View Dependent Claims (16, 19)
-
-
17. A method for creating a knowledge base of associated ideas involving a source language, a target language, and a third language, comprising the steps of:
-
receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
translating said query into a result expressed in said third language;
translating said result into a second result expressed in said target language; and
associating said query with said second result in said target language.
-
-
18. A method for creating a knowledge base of associated ideas involving a source language, a target language, and a plurality of third languages, comprising the steps of:
-
a. receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
b. translating said query into a result expressed in one of said plurality of third languages;
c. translating said result into a second result expressed in said target language;
d. repeating steps b. and c. for each of said plurality of third languages;
e. returning each of said second results; and
f. associating one or more of said second results and said query for all second results produced by two or more of said plurality of languages.
-
-
20. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
translating said query into a result expressed in said third language;
translating said result into a second result expressed in said target language; and
associating said query with said second result in said target language. - View Dependent Claims (22)
-
-
21. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
b. translating said query into a result expressed in one of said plurality of third languages;
c. translating said result into a second result expressed in said target language;
d. repeating steps b. and c. for each of said plurality of third languages;
e. returning each of said second results in said target langauge; and
f. associating one or more of said second results and said query for all second results produced by two or more of said plurality of languages.
-
-
23. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
translating said query into a result expressed in said third language;
translating said result into a second result expressed in said target language; and
associating said query with said second result in said target language. - View Dependent Claims (25)
-
-
24. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
b. translating said query into a result expressed in one of said plurality of third languages;
c. translating said result into a second result expressed in said target language;
d. repeating steps b. and c. for each of said plurality of third languages;
e. returning each of said second results; and
f. associating one or more of said second results and said query for all second results produced by two or more of said plurality of languages.
-
-
26. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association involving a source language, a target language, and a third language, using the following steps;
receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
translating said query into a result expressed in said third language;
translating said result into a second result expressed in said target language;
associating said query with said second result in said target language; and
tokenizing said association by designating a token to be equal to said association. - View Dependent Claims (27)
-
-
28. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association involving a source language, a target language, and a plurality of third languages, using the following steps;
a. receiving a query to be analyzed, wherein said query is expressed in a source language, and wherein said query consists of a word or word string;
b. translating said query into a result expressed in one of said plurality of third languages;
c. translating said result into a second result expressed in said target language;
d. repeating steps b. and c. for each of said plurality of third languages;
e. returning each of said second results;
f. associating one or more of said second results and said query for all second results produced by two or more of said plurality of languages; and
tokenizing said association by designating a token to be equal to said association. - View Dependent Claims (29)
-
-
30. A method for creating a knowledge base of associated ideas comprising the steps of:
-
providing a translation of words expressed in a first language to words and/or word strings expressed in a second language;
providing a corpus of documents expressed in said second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word string;
identifying for said query, all translations of each word comprising said word string query, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in a first language in said identifying step, wherein said analyzing only counts one translation for each of said words expressed in a first language; and
returning a list of said word strings expressed in said second language from said analysis of said corpus of documents as word string results. - View Dependent Claims (31, 32, 33, 45, 46, 47, 48)
-
-
34. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a translation of words expressed in a first language to words and/or word strings expressed in a second language;
providing a corpus of documents expressed in said second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word string;
identifying for said query, all translations of each word comprising said word string query, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in a first language in said identifying step, wherein said analyzing only counts one translation for each of said words expressed in a first language; and
returning a list of said word strings expressed in said second language from said analysis of said corpus of documents as word string results. - View Dependent Claims (35, 36, 37, 49, 50, 51, 52)
-
-
38. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a translation of words expressed in a first language to words and/or word strings expressed in a second language;
providing a corpus of documents expressed in said second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word string;
identifying for said query, all translations of each word comprising said word string query, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in a first language in said identifying step, wherein said analyzing only counts one translation for each of said words expressed in a first language; and
returning a list of said word strings expressed in said second language from said analysis of said corpus of documents as word string results. - View Dependent Claims (39, 40, 41, 53, 54, 55, 56)
-
-
42. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association; and
tokenizing said association by designating a token to be equal to said association;
wherein creating an association includes, providing a translation of words expressed in a first language to words and/or word strings expressed in a second language;
providing a corpus of documents expressed in said second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word string;
identifying for said query, all translations of each word comprising said word string query, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in a first language in said identifying step, wherein said analyzing only counts one translation for each of said words expressed in a first language;
returning a list of said word strings expressed in said second language from said analysis of said corpus of documents as a result. - View Dependent Claims (43, 44)
-
-
57. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association; and
tokenizing said association by designating a token to be equal to said association;
wherein creating an association includes, providing a translation of words expressed in a first language to words and/or word strings expressed in a second language;
providing a corpus of documents expressed in said second language;
receiving a query to be analyzed, wherein said query is expressed in said first language, and wherein said query consists of a word string;
identifying for said query, all translations of each word comprising said word string query, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in a first language in said identifying step, wherein said analyzing only counts one translation for each of said words expressed in a first language;
returning a list of said word strings expressed in said second language from said analysis of said corpus of documents as a result;
providing a corpus of documents expressed in said first language;
identifying a user defined number of occurrences of said query in said corpus of documents expressed in said first language;
analyzing a user defined number of words and/or word strings to the left and to the right of each of said occurrences of said query and identifying word strings comprising the user defined number of words and/or word strings to the left of said query, said query, and the user defined number of words and/or word strings to the right of said query;
creating a list of returned word strings comprising the results of said analyzing step;
analyzing each returned word string individually and identifying all translations of each word comprising each of said returned word strings, to said second language utilizing said provided translation;
analyzing said corpus of documents for word strings expressed in said second language, wherein said analysis only identifies word strings having a user defined maximum number of words, and wherein said analysis only identifies word strings having translations obtained from a user defined minimum number of words expressed in the word string in a first language determined by said creating step, wherein said analyzing said corpus counts only one translation for each of said words expressed in said first language;
returning a list of said second word strings expressed in said second language from said analysis of said corpus of documents as a result;
analyzing said list of word strings and said list of second word strings to identify the number of occurrences wherein each word string on said list of word strings occurs as a word string subset of a word string on said list of second word strings;
returning a list based on said analyzing said list of word strings and said list of second word strings step. - View Dependent Claims (58, 59)
-
-
60. A method for acquiring a knowledge base of associated ideas comprising the steps of:
-
providing a translation of word strings expressed in a source language to word strings expressed in a target language;
receiving two segments of content expressed in said source language, wherein said first segment and said second segment have overlapping portions of said content;
translating, using said translation of word strings, said first segment of content to return a third segment expressed in said target language;
translating, using said translation of word strings, said second segment of content to return a fourth segment expressed in said target language;
analyzing said third segment and said fourth segment to determine if said third segment and said fourth segment have overlapping portions;
associating, if said third segment and said fourth segment have overlapping portions, the overlapping portions of said third segment and said fourth segment with the overlapping portions of said first segment and said second segment; and
associating, if said third segment and said fourth segment have overlapping portions, the combination of said third segment and said fourth segment as a single target language word string, merging said overlapping portions, with the combination of said first segment and said second segment as a single source word string, merging said overlapping portions.
-
-
61. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a translation of word strings expressed in a source language to word strings expressed in a target language;
receiving two segments of content expressed in said source language, wherein said first segment and said second segment have overlapping portions of said content;
translating, using said translation of word strings, said first segment of content to return a third segment expressed in said target language;
translating, using said translation of word strings, said second segment of content to return a fourth segment expressed in said target language;
analyzing said third segment and said fourth segment to determine if said third segment and said fourth segment have overlapping portions;
associating, if said third segment and said fourth segment have overlapping portions, the overlapping portions of said third segment and said fourth segment with the overlapping portions of said first segment and said second segment; and
associating, if said third segment and said fourth segment have overlapping portions, the combination of said third segment and said fourth segment as a single target language word string, merging said overlapping portions, with the combination of said first segment and said second segment as a single source word string, merging said overlapping portions.
-
-
62. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a translation of word strings expressed in a source language to word strings expressed in a target language;
receiving two segments of content expressed in said source language, wherein said first segment and said second segment have overlapping portions of said content;
translating, using said translation of word strings, said first segment of content to return a third segment expressed in said target language;
translating, using said translation of word strings, said second segment of content to return a fourth segment expressed in said target language;
analyzing said third segment and said fourth segment to determine if said third segment and said fourth segment have overlapping portions;
associating, if said third segment and said fourth segment have overlapping portions, the overlapping portions of said third segment and said fourth segment with the overlapping portions of said first segment and said second segment; and
associating, if said third segment and said fourth segment have overlapping portions, the combination of said third segment and said fourth segment as a single target language word string, merging said overlapping portions, with the combination of said first segment and said second segment as a single source word string, merging said overlapping portions.
-
-
63. A method to tokenize associations for the efficient transfer of information, comprising the following steps:
-
creating an association; and
tokenizing said association by designating a token to be equal to said association;
wherein creating an association includes, providing a translation of word strings expressed in a source language to word strings expressed in a target language;
receiving two segments of content expressed in said source language, wherein said first segment and said second segment have overlapping portions of said content;
translating, using said translation of word strings, said first segment of content to return a third segment expressed in said target language;
translating, using said translation of word strings, said second segment of content to return a fourth segment expressed in said target language;
analyzing said third segment and said fourth segment to determine if said third segment and said fourth segment have overlapping portions;
associating, if said third segment and said fourth segment have overlapping portions, the overlapping portions of said third segment and said fourth segment with the overlapping portions of said first segment and said second segment;
associating, if said third segment and said fourth segment have overlapping portions, the combination of said third segment and said fourth segment as a single target language word string, merging said overlapping portions, with the combination of said first segment and said second segment as a single source word string, merging said overlapping portions. - View Dependent Claims (64)
-
-
65. A method for converting content and reconstructing a knowledge base comprising the steps of:
-
a. receiving content expressed in a first language;
b. parsing said content expressed in a first language into a plurality of segments;
c. selecting a first segment and a second segment, with said first segment having an overlapping portion of said content with said second segment;
d. accessing a first target segment of said content expressed in a second language, said first target segment corresponding to one of said first and second segments;
e. accessing a second target segment of said content expressed in the second language, said second target segment corresponding to the other one of said first and second segments and having an overlapping portion with said first target segment;
f. determining said content expressed in the second language based on combining said first target and second target segments, merging overlapping portions;
g. providing said content expressed in said second language; and
h. repeating steps c. through g. for all of said plurality of segments, wherein the second segment is designated as the first segment, and a next segment, with overlapping portions with the second segment, is designated as the second segment; and
i. repeating step h. for all next segments in said plurality of segments until all of said content is converted into said second language.
-
-
66. A method for converting content of a document by reconstructing a knowledge base comprising the steps of utilizing a database of segment associations between content in a first language and a second language wherein said conversion includes parsing and examining overlapping segments of content of the document in said first language with their respective translations that have overlapping segments of content in said second language, and merging overlapping segments from said examined first language content and said examined second language content, and associating the content of said first language content with said second language content after merging overlapping segments.
-
67. A method of converting a document and reconstructing a knowledge base, the method comprising the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a second segment in the second language associated with the selected second segment in the first language;
f. returning the two data segments in the first language and merging the overlapping portions as a single data segment in the first language;
g. returning, if the two data segments in the second language have overlapping portions, a single data segment in the second language merging the overlapping portions; and
h. associating said single data segment in said first language with said single data segment in said second language, thereby returning a conversion of said single data segment from said first language to said second language. - View Dependent Claims (68, 69, 70, 71)
-
-
72. A method of converting a document, the method comprising the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a second segment in the second language associated with the selected second segment in the first language that has an overlapping portion with the segment in the second language;
f. combining the two segments in the second language, merging the overlapping portions, to form a translation of the two segments in the first language, merging overlapping portions. - View Dependent Claims (73)
-
-
74. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. receiving content expressed in a first language;
b. parsing said content expressed in a first language into a plurality of segments;
c. selecting a first segment and a second segment, with said first segment having an overlapping portion of said content with said second segment;
d. accessing a first target segment of said content expressed in a second language, said first target segment corresponding to one of said first and second segments;
e. accessing a second target segment of said content expressed in the second language, said second target segment corresponding to the other one of said first and second segments and having an overlapping portion with said first target segment;
f. determining said content expressed in the second language based on combining said first target and second target segments, merging overlapping portions;
g. providing said content expressed in said second language; and
h. repeating steps c. through g. for all of said plurality of segments, wherein the second segment is designated as the first segment, and a next segment, with overlapping portions with the second segment, is designated as the second segment; and
i. repeating step h. for all next segments in said plurality of segments until all of said content is converted into a second language.
-
-
75. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a segment in the second language associated with the selected second segment in the first language;
f. returning the two data segments in the first language and merging the overlapping portions as a single data segment in the first language;
g. returning, if the two data segments in the second language have overlapping portions, a single data segment in the second language combining the overlapping portions; and
h. associating said single data segment in said first language with said single data segment in said second language, thereby returning a conversion of said single data segment from said first language to said second language. - View Dependent Claims (76, 77, 78, 79)
-
-
80. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a second segment in the second language associated with the selected second segment in the first language that has an overlapping portion with the segment in the second language;
f. combining the two segments in the second language, merging the overlapping portions, to form a translation of the two segments in the first language, merging overlapping portions. - View Dependent Claims (81)
-
-
82. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. receiving content expressed in a first language;
b. parsing said content expressed in a first language into a plurality of segments;
c. selecting a first segment and a second segment, with said first segment having overlapping portions of said content with said second segment;
d. accessing a first target segment of said content expressed in a second language, said first target segment corresponding to one of said first and second segments;
e. accessing a second target segment of said content expressed in the second language, said second target segment corresponding to the other one of said first and second segments and having an overlapping portion with said first target segment;
f. determining said content expressed in the second language based on combining said first target and second target segments, merging overlapping portions;
g. providing said content expressed in said second language; and
h. repeating steps c. through g. for all of said plurality of segments, wherein the second segment is designated as the first segment, and a next segment, with overlapping portions with the second segment, is designated as the second segment; and
i. repeating step h. for all next segments in said plurality of segments.
-
-
83. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a second segment in the second language associated with the selected second segment in the first language;
f. returning the two data segments in the first language and merging the overlapping portions as a single data segment in the first language;
g. returning, if the two data segments in the second language have overlapping portions, a single data segment in the second language combining the overlapping portions; and
h. associating said single data segment in said first language with said single data segment in said second language, thereby returning a conversion of said single data segment from said first language to said second language. - View Dependent Claims (84, 85, 86, 87)
-
-
88. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. providing content comprising data segments in a first language associated with data segments in a second language;
b. selecting from the document to be translated in a first language a data segment that begins with the first word of the document and exists in a database;
c. retrieving from the database a segment in the second language associated with the located first segment in the first language;
d. selecting at least a second delimited segment in the first language that has one or more overlapping portions with the previous delimited segment in the first language;
e. retrieving from the database a second segment in the second language associated with the selected second segment in the first language that has an overlapping portion with the segment in the second language;
f. combining the two segments in the second language, merging the overlapping portions, to form a translation of the two segments in the first language, merging overlapping portions. - View Dependent Claims (89)
-
-
90. A computer system for converting content and reconstructing a knowledge base, comprising:
-
a. a computing device that receives content expressed in a first language and parses said content into at least a first segment and a second segment, said first segment having a first portion, said second segment having a second portion, said first portion and said second portion having overlapping portions of said content;
b. wherein said computing device accesses third and fourth segments of said content that are each expressed in a second language, said third segment corresponding to one of said first and second segments, said fourth segment corresponding to the other of said first and second segments and having an overlapping portion with said third segment; and
c. wherein said computing device determines said content expressed in the second language based on said third and fourth segments having an overlapping portion and provides said content in the second language. - View Dependent Claims (91, 92)
-
-
93. A method for creating a frequency association database in a single language comprising:
-
providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for occurrences of said query;
creating a list of words and word strings occurring within a user-defined amount of words of said query; and
tabulating a list of frequency of occurrences of all recurring words and word strings occurring within a user-defined amount of words of said query. - View Dependent Claims (94, 95, 96, 105, 106, 107, 129)
-
-
97. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for occurrences of said query;
creating a list of words and word strings occurring within a user-defined amount of words of said query; and
tabulating a list of frequency of occurrences of all recurring words and word strings occurring within a user-defined amount of words of said query. - View Dependent Claims (98, 99, 100, 108, 109, 110, 130)
-
-
101. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a collection of documents, wherein said collection includes at least one document;
receiving from a user a word or word string query to be analyzed;
searching said collection of documents for occurrences of said query;
creating a list of words and word strings occurring within a user-defined amount of words of said query; and
tabulating a list of frequency of occurrences of all recurring words and word strings occurring within a user-defined amount of words of said query. - View Dependent Claims (102, 103, 104, 111, 112, 113, 131)
-
-
114. A method for associating words in a language comprising:
-
providing a collection of documents;
wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating all documents having occurrences of the first word or word string within a defined proximity range of the second word or word string, with said defined proximity range having an upper limit and a lower limit;
defining in the located documents a range, wherein the range is defined in relation to the first word or word string and the second word or word string;
searching said ranges for recurring words and word strings; and
associating the first word or word string and the second word or word string with recurring words and word strings based on frequency of occurrence of the recurring words and word strings within the ranges. - View Dependent Claims (115, 116, 117, 126)
-
-
118. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
providing a collection of documents;
wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating all documents having occurrences of the first word or word string within a defined proximity range of the second word or word string, with said defined proximity range having an upper limit and a lower limit;
defining in the located documents a range, wherein the range is defined in relation to the first word or word string and the second word or word string;
searching said ranges for recurring words and word strings; and
associating the first word or word string and the second word or word string with recurring words and word strings based on frequency of occurrence of the recurring words and word strings within the ranges. - View Dependent Claims (119, 120, 121, 127)
-
-
122. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
providing a collection of documents;
wherein said collection includes at least one document;
selecting a first word or word string, and a second word or word string;
locating all documents having occurrences of the first word or word string within a defined proximity range of the second word or word string, with said defined proximity range having an upper limit and a lower limit;
defining in the located documents a range, wherein the range is defined in relation to the first word or word string and the second word or word string;
searching said ranges for recurring words and word strings; and
associating the first word or word string and the second word or word string with recurring words and word strings based on frequency of occurrence of the recurring words and word strings within the ranges. - View Dependent Claims (123, 124, 125, 128)
-
-
132. A method for associating words and word strings in a single language comprising:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said words or word strings or both to the left of said query to be analyzed in said returned documents;
e. searching said collection of documents for each word and word string on said Left Signature List;
f. determining a user-defined amount of words or word strings or both to the right of said words or word strings or both comprising said Left Signature List and creating Left Anchor Lists comprising said words or word strings or both to the right of said words or word strings or both on said Left Signature List based on their frequency in a collection of documents;
g. determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said words or word strings or both to the right of said query to be analyzed in said returned documents based on their frequency;
h. searching said collection of documents for each word and word string on said Right Signature List;
i. determining a user-defined number of words or word strings or both to the left of said words or word strings or both comprising said Right Signature List and creating Right Anchor Lists comprising said words or word strings or both to the left of said words or word strings or both on said Right Signature List based on their frequency; and
j. ranking the results based on the frequency of each word or word string occurring on said Left Anchor Lists and the frequency of said word or word string occurring on said Right Anchor Lists. - View Dependent Claims (133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151)
-
-
152. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said words or word strings or both to the left of said query to be analyzed in said returned documents;
e. searching said collection of documents for each word and word string on said Left Signature List;
f. determining a user-defined amount of words or word strings or both to the right of said words or word strings or both comprising said Left Signature List and creating Left Anchor Lists comprising said words or word strings or both to the right of said words or word strings or both on said Left Signature List based on their frequency in a collection of documents;
g. determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said words or word strings or both to the right of said query to be analyzed in said returned documents based on their frequency;
h. searching said collection of documents for words or word strings or both on said Right Signature List;
i. determining a user-defined number of words or word strings or both to the left of said words or word strings or both comprising said Right Signature List and creating Right Anchor Lists comprising said words or word strings or both to the left of said words or words strings or both on said Right Signature List based on their frequency; and
j. ranking results based on the frequency of each word or word string occurring on said Left Anchor Lists and the frequency of said word or word string occurring on said Right Anchor Lists. - View Dependent Claims (153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171)
-
-
172. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said words or word strings or both to the left of said query to be analyzed in said returned documents;
e. searching said collection of documents for words or word strings or both on said Left Signature List;
f. determining a user-defined amount of words or word strings or both to the right of said words or word strings or both comprising said Left Signature List and creating Left Anchor Lists comprising said words or word strings or both to the right of said words or word strings or both on said Left Signature List based on their frequency in a collection of documents;
g. determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said words or word strings or both to the right of said query to be analyzed in said returned documents based on their frequency;
h. searching said collection of documents for words or word strings or both on said Right Signature List;
i. determining a user-defined number of words or word strings or both to the left of said words or word strings or both comprising said Right Signature List and creating Right Anchor Lists comprising said words or word strings or both to the left of said words or word strings or both on said Right Signature List based on their frequency; and
j. ranking results based on the frequency of each word or word string occurring in said Left Anchor Lists and the frequency of said word or word string occurring on said Right Anchor Lists. - View Dependent Claims (173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191)
-
-
192. A method for associating words and word strings in a language comprising:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined number of words or word strings of user-defined size or both to the left and right of the query in said returned documents containing the query to be analyzed;
e. returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings or both to the left and right of the query in said returned documents;
f. searching said collection of documents for said entry or plurality of entries in said returned list; and
g. returning a list of words or word strings of user defined size or both that occur most frequently between said determined words or word strings or both to the left and right of said query in said returned documents. - View Dependent Claims (193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205)
-
-
206. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined number of words or word strings of user-defined size or both to the left and right of the query in said returned documents containing the query to be analyzed;
e. returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings or both to the left and right of the query in said returned documents;
f. searching said collection of documents for said entry or plurality of entries in said returned list; and
g. returning a list of words or word strings of user defined size or both that occur most frequently between said determined words or word strings or both to the left and right of said query in said returned documents. - View Dependent Claims (207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219)
-
-
220. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. providing a collection of documents, wherein said collection includes at least one document;
b. receiving from a user a word or word string query to be analyzed;
c. searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed;
d. determining a user-defined number of words or word strings of user-defined size or both to the left and right of the query in said returned documents containing the query to be analyzed;
e. returning a list with an entry or plurality of entries, wherein said entry or said plurality of entries contain said determined words or word strings or both to the left and right of the query in said returned documents;
f. searching said collection of documents for said entry or plurality of entries in said returned list; and
g. returning a list of words or word strings of user defined size or both that occur most frequently between said determined words or word strings or both to the left and right of said query in said returned documents. - View Dependent Claims (221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233)
-
-
234. A method for content conversion within a single language comprising the following steps:
-
a. providing a first plurality of word strings;
b. providing a second plurality of word strings, wherein each of said word strings in said second plurality corresponds to one of said word strings in said first plurality in a synonymous or near synonymous manner;
c. receiving a word string query to be analyzed;
d. parsing said word string query into plurality of subset word strings, wherein a portion of each subset overlaps with a second portion of its adjoining subset or subsets;
e. analyzing each of said parsed subset word strings to identify, using said second plurality of word strings, synonymous word strings for each of said parsed subset word strings; and
f. replacing any parsed subset word string with a synonymous word string where it overlaps with said adjoining subsets.
-
-
235. A computer device including a processor, a memory coupled to the processor, and a program stored in the memory, wherein the computer is configured to execute the program and perform the steps of:
-
a. providing a first plurality of word strings;
b. providing a second plurality of word strings, wherein each of said word strings in said second plurality corresponds to one of said word strings in said first plurality in a synonymous or near synonymous manner;
c. receiving a word string query to be analyzed;
d. parsing said word string query into plurality of subset word strings, wherein a portion of each subset overlaps with a second portion of its adjoining subset or subsets;
e. analyzing each of said parsed subset word strings to identify, using said second plurality of word strings, synonymous word strings for each of said parsed subset word strings; and
f. replacing any parsed subset word string with a synonymous word string where it overlaps with said adjoining subsets.
-
-
236. A computer readable storage medium having stored thereon a program executable by a computer processor for performing the steps of:
-
a. providing a first plurality of word strings;
b. providing a second plurality of word strings, wherein each of said word strings in said second plurality corresponds to one of said word strings in said first plurality in a synonymous or near synonymous manner;
c. receiving a word string query to be analyzed;
d. parsing said word string query into plurality of subset word strings, wherein a portion of each subset overlaps with second portion of its adjoining subset or subsets;
e. analyzing each of said parsed subset word strings to identify, using said second plurality of word strings, synonymous word strings for each of said parsed subset word strings; and
f. replacing any parsed subset word string with a synonymous word string where it overlaps with said adjoining subsets.
-
Specification