Indexed, extensible, interactive document retrieval system
First Claim
1. An interactive document retrieval system designed to search for documents after receiving a search query from a requestor, said system comprising:
- a knowledge database containing at least one data structure that relates word patterns to topics; and
a query processor that, in response to the receipt of a search query from a requestor, does the following searching for and trying to capture documents containing at least one term related to the search query, if any documents are captured, analyzes the captured documents to determine their word patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patterns in the database, and when a document'"'"'s word pattern is similar to a word pattern in the database, assigning to that document the similar word pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requestor, and asking the requestor to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search, and granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned;
wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
5 Assignments
0 Petitions
Accused Products
Abstract
An Internet or intranet based document retrieval system contains a database that relates document word-pair patterns to topics. In response to a word submitted by a requestor, the system retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requestor, and the requestor designates the relevant topics. The requestor is then granted access only to documents assigned to relevant topics. A knowledge base linking search terms to documents and documents to topics is established and maintained to speed future searches.
329 Citations
64 Claims
-
1. An interactive document retrieval system designed to search for documents after receiving a search query from a requestor, said system comprising:
-
a knowledge database containing at least one data structure that relates word patterns to topics; and
a query processor that, in response to the receipt of a search query from a requestor, does the following searching for and trying to capture documents containing at least one term related to the search query, if any documents are captured, analyzes the captured documents to determine their word patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patterns in the database, and when a document'"'"'s word pattern is similar to a word pattern in the database, assigning to that document the similar word pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requestor, and asking the requestor to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search, and granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned;
wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
the word patterns determined by analysis are commonly occurring and searchable phrases. -
3. A document retrieval system in accordance with claim 1 wherein
the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words. -
4. A document retrieval system in accordance with claim 3 wherein
one word in each pairing occurs frequently within the document and the other word in each pairing occurs near the one word frequently within the document. -
5. A document retrieval system in accordance with claim 1 wherein
the knowledge base is initially constructed by analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document. -
6. A document retrieval system in accordance with claim 5 wherein
the word patterns determined by analysis are commonly occurring and searchable phrases. -
7. A document retrieval system in accordance with claim 5 wherein
the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words. -
8. A document retrieval system in accordance with claim 7 wherein
one word in each pairing occurs frequently within the document and the other word in each pairing occurs near the one word frequently within the document. -
9. A document retrieval system in accordance with claim 1 wherein
the search query contains a phrase, and the term searched for is that phrase. -
10. A document retrieval system in accordance with claim 1 wherein
the search query contains at least one word, and the term searched for is at least one searchable word taken from the search query. -
11. A document retrieval system in accordance with claim 1 wherein
the search query contains several words, the term searched for is a searchable word taken from the search query, and several words in the search query are searched for in separate searches. -
12. A document retrieval system in accordance with claim 1 wherein
the search query contains at least one operator and at least one word, and the presentation of documents to the requestor scope is limited by the search query. -
13. A document retrieval system in accordance with claim 1 wherein
the knowledge database retains a record of words previously searched for, the documents captured by such previous searches, and the index terms assigned to the captured documents, and the knowledge database also retains linkages between the words previously searched for and the documents captured by such previously-conducted searches, such that the search, analysis, and categorizing steps may be bypassed when a word previously searched for is encountered in a later search query. -
14. A document retrieval system in accordance with claim 13 wherein
the knowledge base is initially constructed by analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document. -
15. A document retrieval system in accordance with claim 13 wherein
the knowledge base is maintained by periodically checking to see if documents entered into the knowledge base have changed or been deleted from the searchable universe of documents, and if they have, then deleting all reference to such documents, as well as the words searched for that caused their capture, from the knowledge base, thereby forcing all searches for such words likely to capture such documents to be repeated anew if encountered in a later search query. -
16. A document retrieval system in accordance with claim 13 wherein
the knowledge base is maintained by periodically checking to see if documents entered into the knowledge base have been changed, and if so, reanalyzing and recategorizing such documents and also removing from the knowledge base linkages between such documents and words that they no longer contain. -
17. A document retrieval system in accordance with claim 1 wherein
the knowledge base is updated by periodically checking for new documents at some locations within the searchable universe of documents, and analyzing and categorizing such documents prior to those documents being captured by a search. -
18. A document retrieval system in accordance with claim 1 wherein
said knowledge database includes a topic combination table containing replacement topics for certain combinations of other topics that may appear within a captured document and that are assigned to such a document as a replacement for said other topics to improve categorization. -
19. A document retrieval system in accordance with claim 1 wherein
plural topics are assigned to at least some documents during categorization and are arranged hierarchically and linked to the at least some documents in the knowledge database, and wherein as many lists of topics as there are hierarchical topics associated with the categorized documents are presented to the requestor in sequence, such that the requestor designates multiple topics and subtopics, and such that search precision is improved by eliminating documents irrelevant to the requestor'"'"'s designated topics from those to which the requestor is granted access. -
20. A document retrieval system in accordance with claim 19 wherein
The presentation of topics to the requestor at any given hierarchical level is suppressed when all the documents are associated with the same topic at that level. -
21. A document retrieval system in accordance with claim 1 wherein analysis includes the following steps:
-
reduce the document data to a list of words;
address inflection and synonym problems;
eliminate non-searchable words;
select the most frequently occurring words; and
select frequently-occurring pairings of those words with adjacent words in the document.
-
-
22. A document retrieval system in accordance with claim 21 wherein
up to a predefined number of the most frequently occurring words are selected. -
23. A document retrieval system in accordance with claim 22 wherein
the predefined number is in the neighborhood of 30. -
24. A document retrieval system in accordance with claim 21 wherein
a word occurs frequently if the number of times it appears within a document divided by the total word content of the document exceeds a predetermined value. -
25. A document retrieval system in accordance with claim 24 wherein
the predetermined value is in the neighborhood of 0.001. -
26. A document retrieval system in accordance with claim 21 wherein
a pairing occurs frequently if the number of occurrences of a given pairing within a given document, divided by the number of occurrences of the frequently-occurring adjacent word of the pairing within the document, is greater than a predetermined value. -
27. A document retrieval system in accordance with claim 26 wherein
the predetermined value is in the neighborhood of 0.0001. -
28. An interactive document retrieval system in accordance with claim 1 wherein:
-
the query processor is installed in at least one web server connecting to the Internet or to an intranet;
the knowledge database is installed on a database engine accessible to the web server;
the requestor communicates with the web server using a computer having a browser also connecting to the Internet or to the same intranet; and
searches are performed by a search engine accessible to the web server and conducting searches on the Internet or on the same intranet.
-
-
29. A document retrieval system in accordance with claim 28 wherein
multiple web servers are employed, interconnected to the Internet or to an intranet by a router and a firewall; - and
the status of any given search procedure is maintained on the requestor'"'"'s computer and is resubmitted to one of the web servers each time a search query or designation is submitted by the requestor.
- and
-
-
30. An interactive method of searching for and retrieving documents after receiving a search query from a requestor, said method comprising the steps of:
-
providing a knowledge database containing at least one data structure that relates word patterns to topics;
in response to the receipt of a search query from a requestor searching for and attempting to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patterns in the knowledge database, and when a document'"'"'s word pattern is similar to a word pattern in the database, assigning to that document the similar word text pattern'"'"'s related topic;
presenting at least one list of the topics assigned to the categorized documents to the requestor, and asking the requestor to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search; and
granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned;
wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
the word patterns determined by analysis are commonly occurring and searchable phrases. -
32. A method of searching in accordance with claim 30 which further includes
determining at least some word patterns that contain two searchable words. -
33. A method of searching in accordance with claim 32 which further includes
having at least some word patterns contain one word that occurs frequently within the document and another word that frequently occurs near to the one word within the document. -
34. A method of searching in accordance with claim 30 which further includes
constructing the knowledge base by analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document. -
35. A method of searching in accordance with claim 34 which further includes
determining at least some word patterns that are commonly occurring and searchable phrases. -
36. A method of searching in accordance with claim 34 which further includes
determining at least some word patterns that are pairings of words, each pairing comprising two searchable words. -
37. A method of searching in accordance with claim 36 which further includes
determining at least some word pattern pairings in which one word in each pairing occurs frequently within the document and the other word in each pairing frequently occurs near to he one word frequently within the document. -
38. A method of searching in accordance with claim 30 which accepts
at search queries that contain a phrase and that search for the phrase. -
39. A method of searching in accordance with claim 30 which accepts search queries that contain at least one word and that search for the word.
-
40. A method of searching in accordance with claim 30 which accepts
search queries that contain several words and search for each word in separate searches. -
41. A method of searching in accordance with claim 30 which accept
at least some search queries that contain at least one operator and at least one word and that search for the word and later use the operator to limit the scope of the documents presented to the requestor. -
42. A method of searching in accordance with claim 30 which further includes
retaining in the knowledge database a record of words previously searched for, the documents captured by such previous searches, and the index terms assigned to the captured documents, and retaining within the knowledge database linkages between the words previously searched for and the documents captured by such previously-conducted searches, such that the search, analysis, and categorizing steps may be bypassed when a word previously searched for is encountered in a later search query. -
43. A method of searching in accordance with claim 42 which further includes
initially constructing the knowledge base by analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document. -
44. A method of searching in accordance with claim 42 which further includes
maintaining the knowledge base by periodically checking to see if documents entered into the knowledge base have changed or been deleted from the searchable universe of documents; - and if they have, then deleting all reference to such documents, as well as the words searched for that caused their capture, from the knowledge base, thereby forcing all searches for such words likely to capture such documents to be repeated anew if encountered in a later search query.
-
45. A method of searching in accordance with claim 42 which further includes
maintaining the knowledge base by periodically checking to see if documents entered into the knowledge base have been changed, and if so, reanalyzing and re-categorizing such documents and also removing from the knowledge base linkages between such documents and words that they no longer contain. -
46. A method of searching in accordance with claim 30 which further includes
updating the knowledge base by periodically checking for new documents at some locations within the searchable universe of documents, and analyzing and categorizing such documents prior to those documents being captured by a search. -
47. A method of searching in accordance with claim 30 which further includes
including in said knowledge database a topic combination table containing replacement topics for certain combinations of other topics that may appear within a captured document, and assigning a replacement topic to such a document as a replacement for said other topics to improve categorization. -
48. A method of searching in accordance with claim 30 which further includes
assigning plural topics to at least some documents during categorization, arranging them hierarchically, and linking them to the at least some documents in the knowledge database, and presenting to the requestor in hierarchical sequence as many lists of topics as there are hierarchical topics associated with the categorized documents, such that the requestor designates multiple topics and subtopics, and such that search precision is improved by eliminating documents irrelevant to the requestor'"'"'s designated topics from those to which the requestor is granted access. -
49. A method of searching in accordance with claim 48 which further includes
suppressing the presentation of topics to the requestor at any given hierarchical level when all the documents are associated with the same topic at that level. -
50. A method of searching in accordance with claim 30 which further includes
reducing the document data to a list of words; -
addressing inflection and synonym problems;
eliminating non-searchable words;
selecting the most frequently occurring words; and
selecting frequently-occurring pairings of those words with adjacent words in the document.
-
-
51. A method of searching in accordance with claim 50 which further includes
selecting up to a predefined number of the most frequently occurring words. -
52. A method of searching in accordance with claim 51 in which the number selected is in the neighborhood of 30.
-
53. A method of searching in accordance with claim 50 which further includes
determining whether a word occurs frequently by determining if the number of times the word appears within a document divided by the total word content of the document exceeds a predetermined value. -
54. A method of searching in accordance with claim 53 wherein the predetermined value is in the neighborhood of 0.001.
-
55. A method of searching in accordance with claim 50 which further includes
determining whether a pairing occurs frequently by determining whether the number of occurrences of a given pairing within a given document, divided by the number of occurrences of the adjacent word of the pairing within the document, is greater than a predetermined value. -
56. A method of searching in accordance with claim 55 wherein the predetermined value is in the neighborhood of 0.0001.
-
57. An interactive document retrieval system in accordance with claim 30 which further includes:
arranging for communication with the requestor to occur using the internet protocol.
-
58. A method of searching in accordance with claim 52 which further includes
maintaining the status of any given search procedure with the requestor.
-
-
59. An interactive document retrieval system designed to search for documents after receiving a search query from a requestor,
said system comprising a knowledge database containing at least one data structure that relates words patterns to topics, wherein the knowledge database is initially constructed by analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document, and wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
-
60. An interactive document retrieval system designed to search for documents after receiving a search query from a requestor, said system comprising:
-
a knowledge database containing at least one data structure that relates words patterns to topics and a query processor that categorizes the captured documents by comparing each document'"'"'s word pattern to the word pattern in the database, and when a document'"'"'s word patterns is similar to a word pattern in the database, assigns to that document the similar word pattern'"'"'s related topic, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
-
-
61. A method of constructing a knowledge database for use in an interactive document retrieval system designed to search for documents after receiving a search query from a requestor,
wherein said database contains at least one data structure that relates words patterns to topics, said method comprising the steps of: -
analyzing indexed documents to which topics have previously been assigned, thereby determining the indexed document'"'"'s word patterns, and then storing in the knowledge database these word patterns for the indexed documents and the topics assigned to these documents, and then relating the word pattern of an indexed document to the topics assigned to that same indexed document, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
-
-
62. A method of searching for and retrieving documents after receiving a search query from a requestor, said method comprising the steps of:
-
providing a knowledge database containing at least one data structure that relates word patterns to topics;
in response to the receipt of a search query from a requestor, searching for and attempting to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patters in the knowledge database, granting the requestor access to at least one subset of captured and categorized documents, wherein the word patterns determined by analysis are pairings of words, each pairing comprising two searchable words with one word occurring frequently within the document and the other word occurring near the one word frequently within the document.
-
-
63. An interactive document retrieval system designed to search for documents after receiving a search query from a requestor, said system comprising:
-
a knowledge database containing at least one data structure that relates word patterns to topics, the knowledge database containing a word combination table, a query word table, a document identification table, and a query linkage table and also a dictionary and synonyms;
a query processor that, in response to the receipt of a search query from a requestor, does the following searching for and trying to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their word patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patterns in the database, and when a document'"'"'s word pattern is similar to a word pattern in the database, assigning to that document the similar word pattern'"'"'s related topic, presenting at least one list of the topics assigned to the categorized documents to the requestor, and asking the requestor to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search; and
granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned.
-
-
64. An interactive method of searching for and retrieving documents after receiving a search query from a requestor, said method comprising the steps of:
-
providing a knowledge database containing at least one data structure that relates word patterns to topics;
in response to the receipt of a search query from a requestor searching for and attempting to capture documents containing at least one term related to the search query, if any documents are captured, analyzing the captured documents to determine their text patterns, categorizing the captured documents by comparing each document'"'"'s word pattern to the word patterns in the knowledge database, and when a document'"'"'s word pattern is similar to a word pattern in the database, assigning to that document the similar text pattern'"'"'s related topic;
presenting at least one list of the topics assigned to the categorized documents to the requestor, and asking the requestor to designate at least one topic from the list as a topic that is relevant to the requestor'"'"'s search;
granting the requestor access to the subset of captured and categorized documents to which topics designated by the requestor have been assigned; and
building into the knowledge database a word combination table, a query word table, a document address table, and a query linkage table and also a dictionary and synonyms.
-
Specification