System and method for searching databases employing user profiles
First Claim
Patent Images
1. A method for locating documents stored within a plurality of distributed databases, the method comprising the steps of:
- generating a user profile including a user focus of interest;
retrieving, in real-time, selected documents from the plurality of distributed databases by utilizing one of an automated query based on the user focus of interest, a user-formulated query, and a query formulated based on a user-specified reference;
extracting semantic information from the retrieved document;
sorting the semantic information to generate a list;
presenting the sorted list of semantic information to the user; and
terminating the retrieval of the documents when one of the query of the plurality of distributed databases is exhausted, an amount of the presented semantic information exceeds a user-specified maximum amount, and a termination condition predefined by the user is satisfied.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer program method enables a user to find the most relevant documents by searching of distributed databases, i.e., the World Wide Web. The program employs the user'"'"'s profile, based on the user'"'"'s foci of interest, the user'"'"'s query and a semantic analysis of the query and documents. In one embodiment, the retrieved documents are ranked according to relevancy based on the user'"'"'s profile and query.
-
Citations
14 Claims
-
1. A method for locating documents stored within a plurality of distributed databases, the method comprising the steps of:
-
generating a user profile including a user focus of interest;
retrieving, in real-time, selected documents from the plurality of distributed databases by utilizing one of an automated query based on the user focus of interest, a user-formulated query, and a query formulated based on a user-specified reference;
extracting semantic information from the retrieved document;
sorting the semantic information to generate a list;
presenting the sorted list of semantic information to the user; and
terminating the retrieval of the documents when one of the query of the plurality of distributed databases is exhausted, an amount of the presented semantic information exceeds a user-specified maximum amount, and a termination condition predefined by the user is satisfied. - View Dependent Claims (2, 3, 4, 5, 6)
monitoring a selection of documents by the user; and
semantically analyzing one of a textual content supplied by the user, a content of a saved Uniform Resource Identifier in a user'"'"'s browser, a content of a Uniform Resource Identifier supplied by the user, a text document in a recognized digital format, and a text document formulated in a natural language, by;
removing at least one syncategoremic expression therefrom, removing at least one morphological inflection therefrom, segmenting each of the selected documents into a plurality of segments, wherein each segment includes a set of data delineated by a punctuation character, segmenting each of the plurality of segments into a plurality of words, coupling each word of a first segment to each of a plurality of proximate words separated therefrom by no more than a predetermined number of words to generate a plurality of word pair couples, maintaining an incremental counter for each word and each word couple from the selected documents, sorting the word couples in descending order from a top to a bottom as a function of sorting criteria, wherein the sorting criteria include the following in the following order;
a frequency of occurrence of the word couple, a frequency of occurrence of the most frequent word in the word couple, a frequency of occurrence of the least frequent word in the word couple, an appearance of the word couple in a title of the document, and random selection of word couples, selecting a predefined subset from an initial portion of the sorted word couples, updating the user focus of interest, and manually modifying the selected subset of the sorted word couples.
-
-
3. The method according to claim 2, wherein the updating step includes the following substeps:
-
monitoring a record of dates on which the user accesses each of the selected documents for a first focus of interest, comparing a most recent date with a current date, and prompting the user to inspect and remove the first focus of interest if a difference between the most recent date and the current date exceeds a predefined constant.
-
-
4. The method according to claim 1, wherein the extracting step includes the following substeps:
-
removing at least one syncategoremic expression from a document selected by the user, removing at least one morphological inflection from the selected document, dividing the selected document into a plurality of segments, wherein each segment comprises a set of data delineated by a punctuation character, dividing each of the plurality of segments into a plurality of words, coupling each word of a first segment to each of a plurality of proximate words separated therefrom by no more than a predetermined number of words to generate a plurality of word pair couples, selecting a predetermined number of the subsets of word couples for use in representing the selected document, wherein the predetermined number of subsets are selected from the top of the sort, and determining the relevance of each of the subsets to one of the query based on the user focus of interest, the user-formulated query, and the query formulated from a user-specified reference.
-
-
5. The method according to claim 4, wherein the determining step includes the following substeps:
-
sorting the plurality of subsets in descending order as a function of;
a number of documents in which the word couples of the subset occur, a number of document titles in which the word couples of the subset occurs, and a total frequency of occurrence of the words of each of the word couples of the subset collectively in all documents, extracting the topmost available subset from the sorted subsets;
comparing the topmost available subset to the rest of the subsets; and
extracting from the plurality of subsets each subset containing at least one word of the topmost available subset.
-
-
6. The method according to claim 1, wherein the presenting step includes the following substeps:
-
maintaining a counter for each word couple contained in the initial portion of the sorted subsets detected in each of the searched documents, and presenting to the user each document for which the word couple counter is at least a predetermined number.
-
-
7. A method for monitoring a user'"'"'s activities to aid in a search for documents from a plurality of distributed databases, the method comprising the steps of:
-
generating a user profile including a user focus of interest;
recording a content of a user query;
recording a particular database from which a document was retrieved based on the user query;
recording a domain of the document from a corresponding Uniform Resource Identifier; and
recording a type of information contained in the retrieved document. - View Dependent Claims (8, 9, 10, 11, 12)
removing morphological inflections from the topic of interest;
coupling words within the topic of interest to create word couples;
adding the word couples to a user vocabulary;
maintaining a counter of an occurrence frequency of each of the word couples; and
prompting the user to approve a first one of the word couples as a focus of interest when a frequency of occurrence of the first word couples exceeds a predetermined constant.
-
-
9. The method according to claim 8, further comprising the steps of:
-
comparing the recorded information to the user focus of interest;
recording dates on which the user focus of interest is accessed;
comparing a most recent date on which the user focus of interest was accessed with a current date; and
prompting the user to inspect and remove the focus of interest if a difference between the most recent date and the current date exceeds a predefined constant.
-
-
10. The method according to claim 8, further comprising the steps of:
-
maintaining a counter for each database from which a document is retrieved;
sorting the databases in descending order of frequency to generate a list; and
adding the list to the user profile, the list being indicative of user'"'"'s preference of the database.
-
-
11. The method according to claim 8, further comprising the steps of:
-
maintaining a counter for each user visit to a domain of a retrieved document; and
specifying the domain as a domain-restricted information acquisition when the frequency of visits to the domain exceeds a predefined value.
-
-
12. The method according to claim 8, further comprising the step of maintaining a counter for each of a plurality of types of information contained in retrieved documents, wherein the counter indicates the preferred type of information.
-
13. An independently operating computer system for finding documents from a plurality of distributed databases comprising:
-
a memory arrangement;
a communication arrangement; and
a processor generating a user profile including a user focus of interest to be stored in the memory arrangement, the processor retrieving, in real-time, documents, using the communication arrangement, from the plurality of distributed databases by utilizing one of;
(i) an automated query based on the user focus of interest, (ii) a user-formulated query, and (iii) a query formulated from a user-specified reference, wherein the processor extracts semantic information from each retrieved document and sorts the semantic information to generate a list which is stored in the memory arrangement, the processor presenting the sorted list of semantic information to the user, the processor terminating the retrieval of the documents when one of the query of the plurality of distributed databases is exhausted, the amount of the semantic information presented exceeds a user-specified maximum amount, and a predefined termination condition is satisfied, the processor monitoring the user'"'"'s activities to update the user profile.
-
-
14. A method for aiding a user in finding documents within a plurality of distributed databases, the method comprising the steps of:
-
generating a user profile including a user focus of interest;
retrieving, in real-time, documents from the plurality of distributed databases by utilizing one of;
(i) an automated query based on the user focus of interest, (ii) a user-formulated query, and (iii) a query formulated from a user-specified reference;
extracting semantic information from the retrieved documents;
sorting the semantic information to generate a list;
presenting the sorted list of semantic information to the user; and
terminating the retrieval of documents when one of the query of the plurality of distributed databases is exhausted, the amount of the semantic information presented exceeds a user-specified maximum amount, and a predefined termination condition is satisfied;
monitoring the user'"'"'s activities including the steps of;
recording a content of one of an automated query based on the user focus of interest, a user-formulated query, and a query formulated from a user-specified reference;
recording databases from which documents are retrieved by the user;
recording a domain of retrieved documents from corresponding Uniform Resource Identifiers; and
recording a type of information contained in each of the retrieved documents.
-
Specification