Method and product for determining salient features for use in information searching
First Claim
1. A computer method for generating a word set for use in locating a document having a type similar to a type of document in a document collection, the method comprising:
- selecting a plurality of documents from the document collection, each document selected including a plurality of words;
stemming the plurality of words in each document selected to obtain a plurality of stem words;
determining a word count for each stem word in each document; and
clustering the plurality of stem words based on the word count of each stem word in each document to obtain a word set.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and product are provided for generating a word set for use in locating a document having a type similar to a type of document in a document collection. The method includes selecting multiple documents from the document collection, each document selected including multiple words, and stemming the words in each document selected to obtain multiple stem words. The method also includes determining a word count for each stem word in each document, and clustering the stem words based on the word count of each stem word in each document to obtain a word set. The product includes a storage medium having programmed instructions recorded thereon for performing the method steps.
-
Citations
23 Claims
-
1. A computer method for generating a word set for use in locating a document having a type similar to a type of document in a document collection, the method comprising:
-
selecting a plurality of documents from the document collection, each document selected including a plurality of words; stemming the plurality of words in each document selected to obtain a plurality of stem words; determining a word count for each stem word in each document; and clustering the plurality of stem words based on the word count of each stem word in each document to obtain a word set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
- 10. A product for generating a word set for use in locating a document having a type similar to a type of document in a document collection, the product comprising a storage medium having programmed instructions recorded thereon, the programmed instructions operative to select a plurality of documents from the document collection, each document selected including a plurality of words, stem the plurality of words in each document selected to obtain a plurality of stem words, determine a word count for each stem word in each document, and cluster the plurality of stem words based on the word count of each stem word in each document to obtain a word set.
-
19. A computer method for generating a feature set for use in locating a pattern having a type similar to a type of pattern in a pattern space, the method comprising:
-
selecting a plurality of patterns from the pattern space, each pattern selected including a plurality of features; minimizing the plurality of features in each pattern selected to obtain a plurality of core features; determining a count for each core feature in each pattern; and clustering the plurality of core features based on the count of each core feature in each pattern to obtain a feature set. - View Dependent Claims (20, 21, 22, 23)
-
Specification