Category based, extensible and interactive system for document retrieval
First Claim
1. An interactive document retrieval system (100) designed to search for documents after receiving a search query from a requestor, said system comprising:
- a knowledge database (200) containing at least one data structure (202, 208, 210, 212, 214, 216 and/or 218) that relates text patterns to topics, and a query processor (400) that, in response to the receipt of a search query from a requester, performs the following steps;
searching for and trying to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s text pattern to the text patterns in the knowledge database (200), and if a document'"'"'s text pattern is similar to a text pattern in the knowledge database (200), assigning to that document the similar word pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requester, and asking the requester to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search, and granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
1 Assignment
0 Petitions
Accused Products
Abstract
In information retrieval (IR) systems with high-speed access, especially to search engines applied to the Internet and/or corporate intranet domains for retrieving accessible documents automatic text categorization techniques are used to support the presentation of search query results within high-speed network environments.
An integrated, automatic and open information retrieval system (100) comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge database (1408) linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
-
Citations
74 Claims
-
1. An interactive document retrieval system (100) designed to search for documents after receiving a search query from a requestor, said system comprising:
- a knowledge database (200) containing at least one data structure (202, 208, 210, 212, 214, 216 and/or 218) that relates text patterns to topics, and a query processor (400) that, in response to the receipt of a search query from a requester, performs the following steps;
searching for and trying to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s text pattern to the text patterns in the knowledge database (200), and if a document'"'"'s text pattern is similar to a text pattern in the knowledge database (200), assigning to that document the similar word pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requester, and asking the requester to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search, and granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 72)
- a knowledge database (200) containing at least one data structure (202, 208, 210, 212, 214, 216 and/or 218) that relates text patterns to topics, and a query processor (400) that, in response to the receipt of a search query from a requester, performs the following steps;
-
24. An interactive method of searching for and retrieving documents after receiving a search query from a requestor, said method comprising the steps of:
-
providing a knowledge database (200) containing at least one data structure (202, 208, 210, 212, 214, 216 and/or 218) that relates text patterns to topics, in response to the receipt of a search query from a requester, searching for and attempting to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s text pattern to the text patterns in the knowledge database (200), and when a document'"'"'s word pattern is similar to a text pattern in the knowledge database (200), assigning to that document the similar text pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requester, and asking the requester to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search, and granting the requestor access to the subset of captured and categorized documents to which topics designated by the requester have been assigned, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71)
-
-
73. An interactive document retrieval system, comprising
a knowledge database (1408) for relating identifications of analyzed documents to topics, a user interface (1402) for inputting a search query, a search engine (1406) for searching a resource for documents essentially matching an input search query and for outputting identifications of documents as a search result, a finding machine (1404) being supplied with the search result of the search engine (1406), for accessing the knowledge database (1408) to check whether a document identified in the search result has already been analyzed before in relation with other search terms than the present search term, forwarding the identification of a document along with its related topic as retrieved from the knowledge database (1408) to the user interface (1402) in case the document has already been analyzed before and its identification been stored together with its related topic in the knowledge database (1408), and analyzing the identified document in case the document has not yet been analyzed before to relate a topic to the identification of the document and forwarding the identification of the document along with its related topic to the user interface (1402).
-
74. An interactive document retrieval method, the method comprising the steps of
relating (1408) identifications of analyzed documents to topics in a database, inputting (1402) a search term by means of an user interface, searching (1406) a resource for documents essentially matching an input search query and outputting identifications of documents as a search result, accessing the database (1408) to check whether a document identified in the search result has already been analyzed before in relation with other search terms than the present search term, forwarding the identification of a document along with its related topic as retrieved from the knowledge database (1408) to the user interface (1402) in case the document has already been analyzed before and its identification been stored together with its related topic in the knowledge database (1408), and analyzing the identified document in case the document has not yet been analyzed before to relate a topic to the identification of the document and forwarding the identification of the document along with its related topic to the user interface (1402).
Specification