Dynamic content organization in information retrieval systems
First Claim
1. A computer-implemented method of creating a topic arrangement for a set of documents resulting from a query on a document collection, each document in the document collection associate-d with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, each topic being a semantic generalization of its subtopics, the method comprising:
- receiving a set of documents satisfying the query;
receiving a first selection of at least one topic derived from the query;
receiving a user selection of a type of topic arrangement from a plurality of topic arrangements for displaying topics semantically related to the first selected topic;
responsive to the user selection of the type of topic arrangement, selecting a set of topics for the topic arrangement as the set of topics which optimizes a predetermined set of parameters associated with the determined type of topic arrangement and the selected topic; and
displaying the topic arrangement including the selected topics.
2 Assignments
0 Petitions
Accused Products
Abstract
An information system and method provide organizational and navigational aids to a user to facilitate exploration and analysis of a document collection. The system includes a document collection containing a plurality of documents, and a knowledge base containing a plurality of topics. Each topic expresses an idea or concept, and is associated with a set of terms which describe the topic, a set of documents in the document collection which are about the topic. Each topic also has topic-subtopic relationships with selected other topics, forming local topic hierarchies. A query analysis module receives a current query and processes the query against the document collection to select a set of documents that satisfy the query. A dynamic content organization module processes the document set according to defined parameters and a user selection or automatic selection of a desired topic arrangement to create various types of topic arrangements. These topic arrangements include supertopics, subtopics, perspective topic, and theme topic arrangements. A supertopic arrangement is a set of parent topics of a topic derived from the query, which parent topics best generalize the document set. A subtopic arrangement is a set of subtopics of a topic derived from the query which best cover and partition the document set. A perspective topic arrangement has perspective topics, each of is a parent topic of a set of subtopics that cover and partition the document set. A theme topic arrangement has theme topics, each of which expresses a major subject or concept that describes the document set and distinguishes it from the rest of the document collection.
309 Citations
36 Claims
-
1. A computer-implemented method of creating a topic arrangement for a set of documents resulting from a query on a document collection, each document in the document collection associate-d with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, each topic being a semantic generalization of its subtopics, the method comprising:
-
receiving a set of documents satisfying the query;
receiving a first selection of at least one topic derived from the query;
receiving a user selection of a type of topic arrangement from a plurality of topic arrangements for displaying topics semantically related to the first selected topic;
responsive to the user selection of the type of topic arrangement, selecting a set of topics for the topic arrangement as the set of topics which optimizes a predetermined set of parameters associated with the determined type of topic arrangement and the selected topic; and
displaying the topic arrangement including the selected topics. - View Dependent Claims (2, 3, 4, 5, 6)
an ideal number of topics in a topic arrangement of the determined type compared to an actual number of topics in the set of topics.
-
-
3. The method of claim 2, wherein the parameters include:
an ideal percentage of documents of the document set that should be associated with any of the topics included in the topic arrangement compared to an actual percentage of documents in the document set associated with any of topics in the topic set.
-
4. The method of claim 2, wherein the parameters include:
an ideal percentage of documents of the document set that should be associated with more than one topic included in the topic arrangement compared to an actual percentage of documents in the document set associated with more than one topic in the topic set.
-
5. The method of claim 1, further comprising:
-
receiving a user selection of one of the displayed topics of the displayed topic arrangement;
modifying the query to incorporate the user selected topic to the query to form a modified query; and
processing the modified query on the document collection to select a new set of documents satisfying the modified query.
-
-
6. The method of claim 5, wherein modifying the query comprises:
replacing the first selected topic derived from the query with the user selected displayed topic.
-
7. A computer-implemented method of creating a topic arrangement for a set of documents resulting from a query on a document collection, each document in the document collection associated with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, each topic being a semantic generalization of its subtopics, the method comprising:
-
receiving a set of documents satisfying the query;
receiving a first selection of at least one topic derived from the query;
displaying a set of topics semantically related to the selected topic, the set of topics including topics that are either a semantic refinement or a semantic generalization of the selected topic;
receiving a user selection of one of the displayed topics;
modifying the query to incorporate the user selected topic to the query to form a modified query; and
processing the modified query on the document collection to select a new set of documents satisfying the modified query. - View Dependent Claims (8)
replacing the first selected topic derived from the query with the user selected displayed topic.
-
-
9. A computer-implemented method of creating a supertopic arrangement for a set of documents resulting from a query on a document collection, each document in the document collection associated with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, each topic being a semantic generalization of its subtopics, the method comprising:
-
processing the query to select a set of documents satisfying the query;
receiving a selection of at least one topic derived from the query;
determining the supertopic arrangement as a combination of supertopics that are associated with the documents of the document set and with the selected topic and that optimally generalizes the document set with respect to parameters; and
displaying the supertopic arrangement. - View Dependent Claims (10, 11, 12, 13)
receiving a user specification of a new query term;
conjoining the query term to the query to form a refined query that is a semantic refinement of the query; and
processing the refined query on the document collection to select a new set of documents satisfying the refined query.
-
-
11. The method of claim 9, further comprising:
-
receiving a user selection of one of the displayed list of supertopics;
disjoining the selected supertopic to the query to form a new query that is a semantic generalization of the query; and
processing the new query on the document collection to select a new set of documents satisfying the new query.
-
-
12. The method of claim 9, wherein determining the supertopic arrangement comprises:
-
creating a candidate set of supertopics for the selected topic by recursively including all parent topics of the selected topic in the candidate set;
for each of a plurality of combinations of supertopics of the candidate set of supertopics, rating the combination according to;
a number of supertopics in the combination and the ideal number of supertopics;
a number of supertopics in the combination and the maximum number of supertopics;
a number of documents in the document collection associated with more than one supertopic of the combination; and
selecting a most favorably rated combination as the supertopic arrangement.
-
-
13. The method of claim 9, further comprising:
-
for each of the parameters;
scoring a plurality of combinations of supertopics with respect to the parameter;
selecting a number of the most favorably scored combinations for the parameter; and
scoring only the selected number with respect to the next parameter.
-
-
14. A computer-implemented method of creating a subtopic arrangement for a set of documents resulting from a query on a document collection, each document in the document collection associated with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, each topic being a semantic generalization of its subtopics, the method comprising:
-
processing the query to select a set of documents satisfying the query;
receiving a selection of at least one topic derived from the query;
determining the subtopic arrangement as a combination of semantically related subtopics that are associated with the documents of the document set and with the selected topic and that optimally covers and partitions the document set with respect to parameters including;
an ideal number of subtopics in the combination of subtopics, a maximum number of subtopics in the combination of subtopics, and an ideal percentage of the document set that is associated with at least one subtopic of the combination; and
displaying the subtopic arrangement. - View Dependent Claims (15, 16, 17, 18, 19, 20)
an ideal number of documents of the document collection associated with more than subtopic of the combination of subtopics.
-
-
16. The method of claim 14, further comprising:
-
receiving a user specification of a new query term;
conjoining the query term to the query to form a refined query that is a semantic refinement of the query; and
processing the refined query on the document collection to select a new set of documents satisfying the refined query.
-
-
17. The method of claim 14, further comprising:
-
receiving a user selection of a displayed subtopic;
conjoining the selected subtopic to the query to form a refined query that is a semantic refinement of the query; and
processing the refined query on the document collection to select a new set of documents satisfying the refined query.
-
-
18. The method of claim 17, further comprising iteratively repeating the steps of receiving, conjoining, and processing to iteratively narrow the document set.
-
19. The method of claim 14, wherein determining the subtopic arrangement further comprises:
-
creating a candidate set of subtopics for the selected topic by recursively including each child topic of the selected topic in the candidate set that is also a topic associated with a document of the document set;
for each of a plurality of combinations of subtopics of the candidate set of subtopics, rating the combination according to;
a number of subtopics in the combination and the ideal number of subtopics;
a number of subtopics in the combination and the maximum number of subtopics;
a percentage of the documents in the document set that are associated with at least one subtopic of the combination and an ideal percentage; and
selecting a most favorably rated combination as the subtopic arrangement.
-
-
20. The method of claim 19, further comprising:
-
rating the combination according to a number of documents in the document collection associated with more than one subtopic of the combination;
wherein combinations for which documents of the document collection are associated few subtopics of the combination are more favorably rated than combinations for which documents of the document collection are associated many subtopics of the combination.
-
-
21. A computer-implemented method of creating a topic arrangement of documents resulting from a query, each document associated with a at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, the method comprising:
-
receiving a set of documents satisfying the query;
determining from the set of documents a set of topics associated with the documents, each topic in the set of topics associated with at least one document in the set of documents;
selecting from the set of topics at least one topic having a plurality of semantically related subtopics that partition the set of documents into subsets of documents and that are associated with a substantial portion of the set of documents; and
displaying each selected topic and its subtopics. - View Dependent Claims (22, 23, 24)
determining for each topic a rating as a function of a;
an ideal number of subtopics for a topic;
a number of subtopics of the topic that are associated with the set of documents; and
selecting a predefined number of topics having the most favorable ratings.
-
-
23. The computer-implemented method of claim 22, wherein determining for each topic a rating as a function, further comprises:
determining for each topic a rating as a function of an ideal percentage of the document set that are associated with subtopics of the topic.
-
24. The method of claim 21 wherein selecting from the set of topics at least one topic having a plurality of semantically related subtopics that partition the set of documents into subsets of documents comprises:
selecting topics for which the semantically related subtopics partition the set of documents into a number of subsets that does not exceed a maximum number of subsets.
-
25. A computer-implemented method of creating a topic arrangement of documents resulting from a query, each document associated with at least one topic, various ones of the topics having semantically related subtopics, each subtopic being a semantic refinement of its topic, the method comprising:
-
processing the query to select a set of documents;
determining from the set of documents a set of topics associated with the documents, each topic in the set of topics associated with at least one document in the set of documents;
for each of a number of topics in the set of topics, rating each of a plurality of combinations of subtopics of the topic as a function of;
a number of subtopics in the combination of subtopics;
a percentage of the documents in the document set associated with at least one subtopic in the combination of subtopics;
selecting a number of most favorably rated topics; and
displaying the selected topics. - View Dependent Claims (26, 27)
rating each of a plurality of combinations of subtopics of the topic as a further function of;
a number of documents in the set of documents associated with more than one subtopic in the combination of subtopics.
-
-
27. The method of claim 26, further comprising:
-
rating each of a plurality of combinations of subtopics of the topic as a further function of;
an ideal number of documents in the set of documents that should be associated with more than one subtopic in the combination of subtopics;
an ideal number of subtopics that should be the combination of subtopics; and
an ideal percentage of the documents in the document set that should be associated with more than one subtopic in the combination of subtopics.
-
-
28. A computer-implemented method of creating a topic arrangement of documents resulting from a query, each document associated with a plurality of descriptive topics, various ones of the topics having semantically related subtopics, the method comprising:
-
processing the query to produce a first set of documents satisfying the query;
determining from the set of documents a set of topics, each topic in the set associated with at least one document in the first set of documents;
selecting from among the set of topics at least one topic that is associated with a second set of documents that is substantially similar to the unorganized first set of documents resulting from the query; and
displaying the at least one selected topic.
-
-
29. A computer-implemented method of creating a topic arrangement of documents resulting from a query in an information retrieval system including a document collection containing a plurality of documents, each document associated with a plurality of descriptive topics, various ones of the topics having semantically related subtopics, the method comprising:
-
processing the query to select a first set of documents less than the plurality of documents, and which satisfy the query;
determining from the set of documents a set of topics, each topic in the set associated with at least one document in the first set of documents;
selecting from among the set of topics a number of topics having a highest normalized frequency of occurrence in the first set of documents relative to a frequency of occurrence of the topic in the plurality of documents; and
displaying the selected number of topics.
-
-
30. An information retrieval system, comprising:
-
a document collection including a plurality of documents, each document associated with at least one topic;
a knowledge base including a plurality of topics, various ones of the topics having semantically related subtopics;
a user interface module that receives a query including a plurality of query terms; and
a dynamic content analysis module communicatively coupled to receive a set of documents satisfying the query, a first selected topic derived from the query terms, and a user selection of a type of topic arrangement, from a plurality of topic arrangements for displaying topics semantically related to the first selected topic, selecting a set of topics of the knowledge base for the topic arrangement as the set of topics which optimizes a predetermined set of parameters associated with the determined type and the selected topic, and display the topic arrangement including the selected topics.
-
-
31. An information retrieval system, comprising:
-
a document collection including a plurality of documents, each document associated with at least one topic;
a knowledge base including a plurality of topics, various ones of the topics having semantically related subtopics;
a user interface module that receives a query including a plurality of query terms; and
a dynamic content analysis module communicatively coupled to receive a set of documents satisfying the query, and to receive a first selected topic derived from the query terms, determine a supertopic arrangement as a combination of supertopics that are associated with the documents of the document set and with the selected topic and that optimally generalizes the document set with respect to parameters, and provide the supertopic arrangement to the user interface module to display.
-
-
32. An information retrieval system, comprising:
-
a document collection including a plurality of documents, each document associated with at least one topic;
a knowledge base including a plurality of topics, various ones of the topics having semantically related subtopics;
a user interface module that receives a query including a plurality of query terms; and
a dynamic content analysis module communicatively coupled to receive a set of documents satisfying the query, to receive a first selected topic derived from the query terms, to determine a subtopic arrangement as a combination of semantically related subtopics that are associated with the documents of the document set and with the selected topic and that optimally covers and partitions the document set with respect to parameters including an ideal number of subtopics in the combination of subtopics, a maximum number of subtopics in the combination of subtopics, and an ideal percentage of the document set that is associated with at least one subtopic of the combination, and to display the subtopic arrangement.
-
-
33. An information retrieval system, comprising:
-
a document collection including a plurality of documents, each document associated with at least one topic;
a knowledge base including a plurality of topics, various ones of the topics having semantically related subtopics;
a user interface module that receives a query including a plurality of query terms; and
a dynamic content analysis module communicatively coupled to receive a set of documents satisfying the query, and to determine from the set of documents a set of topics, each topic in the set of topics associated with at least one document in the set of documents, and to select at least one topic from the set of topics that has semantically related subtopics that optimally partition the set of documents into a plurality of subsets, the dynamic content analysis module providing the selected at least one topic to the user interface module to display. - View Dependent Claims (34, 35)
a query analysis module communicatively coupled to receive the query from the user interface module and to process the query to select the set of documents from the document collection that satisfy the query terms.
-
-
35. The system of claim 34, wherein the query analysis module receives a query containing a plurality of topics as query terms, determines for each topic in the query a subset of documents of the knowledge base associated with the topic, produces the document set as the intersection or union of the subsets of documents.
-
36. An information retrieval system, comprising:
-
a document collection including a plurality of documents, each document associated with at least one topic;
a knowledge base including a plurality of topics, various ones of the topics having semantically related subtopics;
a user interface module that receives a query including a plurality of query terms;
a query analysis module communicatively coupled to receive the query from the user interface module and to process the query to select a first set of documents from the document collection that satisfy the query terms; and
a dynamic content analysis module communicatively coupled to receive the set of documents and to determine from the set of documents a set of topics, each topic in the set of topics associated with at least one document in the set of documents, and to select from among the set of topics a number of topics having a highest normalized frequency of occurrence in the first set of documents relative to a frequency of occurrence of the topic in the plurality of documents.
-
Specification