×

Method and system for clustering using generalized sentence patterns

  • US 7,584,100 B2
  • Filed: 06/30/2004
  • Issued: 09/01/2009
  • Est. Priority Date: 06/30/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method in a computer system with a processor and memory for identifying clusters of documents, the method comprising:

  • providing sentences having words, each sentence representing a topic of a document;

    for each sentence representing the topic of a document, identifying a generalized sentence for the sentence, the generalized sentence representing a generalization of words of the sentence, a generalization including a part of speech of a word;

    identifying by the processor generalized sentence patterns for the identified generalized sentences, each generalized sentence pattern representing a pattern of generalizations of the generalized sentences;

    grouping the identified generalized sentence patterns into groups of generalized sentence patterns based on similarity of the generalized sentence patterns;

    selecting identified generalized sentence patterns to guide the identification of clusters wherein the groups of generalized sentence patterns are used to guide the identification of clusters; and

    applying a cluster identification algorithm to identify clusters of documents using the selected generalized sentence patterns to guide the identification such that documents whose generalized sentences are similar to the same generalized sentence pattern are identified as being in the same clusterwherein similarity of generalized sentence patterns is defined as;

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×