Document knowledge base research and retrieval system
First Claim
1. A method for processing queries in a search and retrieval system, said method comprising the steps of:
- storing a plurality of themes for a repository of documents, wherein each theme for a document defines subject matter disclosed in a document, such that said themes stored for a document define the overall content for said document;
processing a query, which includes at least one query term, to select at least one document relevant to said at least one query term;
identifying said themes stored for said at least one document selected; and
selecting, in response to said query, at least one additional document, not previously selected, that comprises at least one theme in common with said themes identified in said documents selected.
2 Assignments
0 Petitions
Accused Products
Abstract
A knowledge base search and retrieval system, which includes factual knowledge base queries and concept knowledge base queries, is disclosed. A knowledge base stores associations among terminology/categories that have a lexical, semantical or usage association. Document theme vectors identify the content of documents through themes as well as through classification of the documents in categories that reflects what the documents are primarily about. The factual knowledge base queries identify, in response to an input query, documents relevant to the input query through expansion of the query terms as well as through expansion of themes. The concept knowledge base query does not identify specific documents in response to a query, but specifies terminology that identifies the potential existence of documents in a particular area.
455 Citations
25 Claims
-
1. A method for processing queries in a search and retrieval system, said method comprising the steps of:
-
storing a plurality of themes for a repository of documents, wherein each theme for a document defines subject matter disclosed in a document, such that said themes stored for a document define the overall content for said document;
processing a query, which includes at least one query term, to select at least one document relevant to said at least one query term;
identifying said themes stored for said at least one document selected; and
selecting, in response to said query, at least one additional document, not previously selected, that comprises at least one theme in common with said themes identified in said documents selected. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
storing a knowledge base comprising a directed graph that links terminology having a lexical, semantic or usage association;
generating an expanded set of query terms through use of said knowledge base by selecting additional terms having a lexical, semantic or usage association with said query terms; and
processing said query to select documents relevant to said expanded set of query terms.
-
-
3. The method as set forth in claim 1, wherein the step of storing a plurality of themes for a respiratory of documents comprises the steps of:
-
processing a plurality of documents to identify said themes for a document; and
classifying said documents, including themes identified for said documents, in categories so as to relate said themes to said categories.
-
-
4. The method as set forth in claim 1, further comprising the steps of:
-
storing, for reference, a knowledge base that comprises a plurality categories; and
storing document theme vectors that classify said documents and said themes identified for said documents in categories of said knowledge base.
-
-
5. The method as set forth in claim 4, wherein the step of selecting at least one additional document comprises the steps of:
-
mapping said query term to a category of said knowledge base;
selecting a plurality of documents classified for said category;
selecting themes for said documents as identified in said document theme vectors; and
selecting additional documents based on themes identified in said documents selected.
-
-
6. The method as set forth in claim 5, wherein the step of selecting additional documents comprises the step of generating at least one theme group that comprises themes common to more than one document.
-
7. The method as set forth in claim 6, further comprising the step of ranking said theme groups based on order of importance of said theme groups.
-
8. The method as set forth in claim 7, wherein the step of ranking said theme groups comprises the step of ranking theme groups based on the number of themes in a group.
-
9. The method as set forth in claim 7, wherein the step of ranking said theme groups comprises the step of ranking theme groups based on the highest total theme weight of themes in a theme group.
-
10. The method as set forth in claim 7, wherein the step of ranking said theme groups comprises the step of ranking theme groups based on document strength of documents in a theme group.
-
11. The method as set forth in claim 6, further comprising the step of ranking documents within each theme group.
-
12. The method as set forth in claim 11, wherein the step of ranking documents comprises the step of ranking documents based on theme weight of themes in a document.
-
13. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causes the computer to perform the steps of:
-
storing a plurality of themes for a repository of documents, wherein each theme for a document defines subject matter disclosed in a document, such that said themes stored for a document define the overall content for said document;
processing a query, which includes at least one query term, to select at least one document relevant to said at least one query term;
identifying said themes stored for said at least one document selected; and
selecting, in response to said query, at least one additional document, not previously selected, that comprises at least one theme in common with said themes identified in said documents selected. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
storing a knowledge base comprising a directed graph that links terminology having a lexical, semantic or usage association;
generating an expanded set of query terms through use of said knowledge base by selecting additional terms having a lexical, semantic or usage association with said query terms; and
processing said query to select documents relevant to said expanded set of query terms.
-
-
15. The computer readable medium as set forth in claim 13, wherein instructions for a plurality of themes for a repository of documents comprise instructions for:
-
processing a plurality of documents to identify said themes for a document; and
classifying said documents, including themes identified for said documents, in categories so as to relate said themes to said categories.
-
-
16. The computer readable medium as set forth in claim 13, further comprising instructions for:
-
storing, for reference, a knowledge base that comprises a plurality categories; and
storing document theme vectors that classify said documents and said themes identified for said documents in categories of said knowledge base.
-
-
17. The computer readable medium as set forth in claim 16, wherein instructions for selecting at least one additional document comprise instructions for:
-
mapping said query term to a category of said knowledge base;
selecting a plurality of documents classified for said category;
selecting themes for said documents as identified in said document theme vectors; and
selecting additional documents based on themes identified in said documents selected.
-
-
18. The computer readable medium as set forth in claim 17, wherein instructions for selecting additional documents comprise instructions for generating at least one theme group that comprises themes common to more than one document.
-
19. The computer readable medium as set forth in claim 18, further comprising instructions for ranking said theme groups based on order of importance of said theme groups.
-
20. The computer readable medium as set forth in claim 19, where in the instructions for ranking said theme groups comprise instructions for ranking theme groups based on the number of themes in a group.
-
21. The computer readable medium as set forth in claim 19, wherein the instructions for ranking said theme groups comprise instructions for ranking theme groups based on the highest total theme weight of themes in a theme group.
-
22. The computer readable medium as set forth in claim 19, where in the instructions for ranking said theme groups comprise instructions for ranking theme groups based on document strength of documents in a theme group.
-
23. The computer readable medium as set forth in claim 18, further comprising instructions for ranking documents within each theme group.
-
24. The computer readable medium as set forth in claim 23, wherein the instructions for ranking documents comprise instructions for ranking documents based on theme weight of themes in a document.
-
25. A computer system comprising:
-
memory for storing a plurality of themes for a repository of documents, wherein each theme for a document defines subject matter disclosed in a document, such that said themes stored for a document define the overall content for said document; and
a processor unit for processing a query, which includes at least one query term, to select at least one document relevant to said at least one query term, to identify said themes for said at least one document selected, and to select, in response to said query, at least one additional document, not previously selected, that comprises at least one theme in common with said themes identified in said documents selected.
-
Specification