Lightweight subject indexing for e-mail collections
First Claim
Patent Images
1. A method for creating a light weight subject index, comprising:
- identifying, as candidate headwords, words in the subject lines of a collection of documents which are not listed in a user modified common word list;
creating lexical contexts for identified candidate headwords;
ranking the set of identified candidate headwords for a collection of documents and selecting among them for inclusion in an index; and
listing selected candidate headwords based on the results of ranking and selection, wherein the lexical context for a candidate headword within a subject line is identified as the words to the left and the right of the candidate headword up to, but not including, a barrier word.
7 Assignments
0 Petitions
Accused Products
Abstract
A light weight subject indexing system including a candidate headword identification system for identifying candidate words in the subject line of a document which are not listed in a user modified common word list, a lexical context system for creating lexical context for an identified candidate headword, a ranking system for ranking all the candidate headwords identified for the subject lines of a document or message collection, and selecting among the ranked headwords for inclusion in an index based on that ranking, and an index creation system for listing candidate headwords selected by the ranking system.
-
Citations
35 Claims
-
1. A method for creating a light weight subject index, comprising:
-
identifying, as candidate headwords, words in the subject lines of a collection of documents which are not listed in a user modified common word list;
creating lexical contexts for identified candidate headwords;
ranking the set of identified candidate headwords for a collection of documents and selecting among them for inclusion in an index; and
listing selected candidate headwords based on the results of ranking and selection, wherein the lexical context for a candidate headword within a subject line is identified as the words to the left and the right of the candidate headword up to, but not including, a barrier word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for creating a user specified index, comprising:
-
at least one user interface for specifying a desired index;
a document application system electrically connected to the at least one user interface; and
an indexing system for creating the desired index, the indexing system comprising;
a candidate headword identification system for identifying candidate words in the subject line of a document which are not listed in a user modified common word list;
a lexical context system for creating a lexical context for an identified candidate headword;
a ranking system for ranking the set of identified candidate headwords for a collection of documents and selecting among them for inclusion in an index; and
an index creation system for listing selected candidate headwords based on the results of ranking and selection, wherein the lexical context system identifies the lexical context for the candidate headword as the words to the left and the right of the candidate headword up to, but not including, a barrier word. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A light weight subject indexing system, comprising:
-
a candidate headword identification system for identifying candidate words in the subject line of a document which are not listed in a user modified common word list;
a lexical context system for creating a lexical context for an identified candidate headword;
a ranking system for ranking the set of identified candidate headwords for a collection of documents and selecting among them for inclusion in an index; and
an index creation system for listing selected candidate headwords based on the results of ranking and selection, wherein the lexical context system identifies the lexical context for the candidate headword as the words to the left and the right of the candidate headword up to, but not including, a barrier word. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification