Topic identification and use thereof in information retrieval systems
First Claim
1. A method to identify topics in a data corpus having a plurality of segments, comprising:
- determining a segment-level actual usage value for one or more word combinations;
computing a segment-level expected usage value for each of the one or more word combinations; and
designating a word combination as a topic if the segment-level actual usage value of the word combination is substantially greater than the segment-level expected usage value of the word combination.
15 Assignments
0 Petitions
Accused Products
Abstract
A technique to determine topics associated with, or classifications for, a data corpus uses an initial domain-specific word list to identify word combinations (one or more words) that appear in the data corpus significantly more often than expected. Word combinations so identified are selected as topics and associated with a user-specified level of granularity. For example, topics may be associated with each table entry, each image, each sentence, each paragraph, or an entire file. Topics may be used to guide information retrieval and/or the display of topic classifications during user query operations.
-
Citations
48 Claims
-
1. A method to identify topics in a data corpus having a plurality of segments, comprising:
-
determining a segment-level actual usage value for one or more word combinations;
computing a segment-level expected usage value for each of the one or more word combinations; and
designating a word combination as a topic if the segment-level actual usage value of the word combination is substantially greater than the segment-level expected usage value of the word combination. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A program storage device, readable by a programmable control device, comprising instructions stored on the program storage device for causing the programmable control device to identify topics in a data corpus having a plurality of segments, the instructions causing the programmable control device to:
-
determine a segment-level actual usage value for one or more word combinations;
compute a segment-level expected usage value for each of the one or more word combinations; and
designate a word combination as a topic if the segment-level actual usage value of the word combination is substantially greater than the segment-level expected usage value of the word combination. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method to display a list of topics associated with data items stored in a database, comprising:
-
identifying a result set based on an initial user query, the result set identifying a plurality of stored data items;
identifying those topics associated with the stored data items identified in the result set;
selecting for display a topic associated with the most identified stored data items;
selecting for display another topic, said another topic associated with the most identified stored data items not associated with a previously identified display topic, wherein this step is repeated until all identified stored items in the result set have been accounted for; and
displaying the selected display topics. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A program storage device, readable by a programmable control device, comprising instructions stored on the program storage device for causing the programmable control device to display a list of topics associated with data items stored in a database, the instructions causing the programmable control device to:
-
identify a result set based on an initial user query, the result set identifying a plurality of stored data items;
identify those topics associated with the stored data items identified in the result set;
select for display a topic associated with the most identified stored data items;
select for display another topic, said another topic associated with the most identified stored data items not associated with a previously identified display topic, wherein this step is repeated until all identified stored items in the result set have been accounted for; and
display the selected display topics. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification