Method and system for clustering using generalized sentence patterns
First Claim
1. A method in a computer system for identifying generalized sentence patterns for sentences, the method comprising:
- for each sentence, generating a generalized sentence by generalizing parts of speech of the sentence; and
selecting a generalized sentence to be a generalized sentence pattern based on the generalized sentence being a subset of another generalized sentence.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
-
Citations
40 Claims
-
1. A method in a computer system for identifying generalized sentence patterns for sentences, the method comprising:
-
for each sentence, generating a generalized sentence by generalizing parts of speech of the sentence; and
selecting a generalized sentence to be a generalized sentence pattern based on the generalized sentence being a subset of another generalized sentence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method in a computer system for identifying clusters of documents, the method comprising:
-
identifying generalized sentence patterns for sentences, each sentence representing a document;
selecting identified generalized sentence patterns to guide the identification of clusters; and
applying a cluster identification algorithm to identify clusters using the selected generalized sentence patterns to guide the identification. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method in a computer system for naming clusters of groups of generalized sentence patterns, the method comprising:
when generalized sentences in a cluster match one or more groups of generalized sentence patterns, selecting the generalized sentence pattern from the group with the highest support as the name. - View Dependent Claims (30, 31)
-
32. A computer-readable medium containing instructions for controlling a computer system to identify generalized sentence patterns for sentences, by a method comprising:
-
for each sentence, generating a generalized sentence by generalizing parts of speech of the sentence; and
selecting a generalized sentence pattern based on the frequency with which the generalized sentence pattern is a subset of a generalized sentence. - View Dependent Claims (33, 34, 35)
-
-
36. A computer-readable medium containing instructions for controlling a computer system to identify clusters of documents, by a method comprising:
-
identifying generalized sentence patterns for sentences, each sentence representing a document; and
identifying clusters of documents using the identified generalized sentence patterns to guide the identification of clusters. - View Dependent Claims (37, 38, 39, 40)
-
Specification