System and method for efficiently generating cluster groupings in a multi-dimensional concept space
First Claim
1. A system for building a multi-dimensional semantic concept space over a stored document collection, comprising:
- an extraction module identifying a plurality of documents within a stored document collection containing substantially correlated terms reflecting syntactic content, comprising;
an extractor extracting the terms in literal form from the documents;
a selector selecting the terms having frequencies of occurrence falling within a predefined threshold as being substantially correlated;
a vector module generating a vector reflecting latent semantic similarities discovered between substantially correlated documents logically projected at an angle θ
from a common axis in a concept space;
a cluster module forming one or more arbitrary clusters at an angle σ
from the common axis in the concept space, each cluster comprising documents having such an angle θ
falling within a predefined variance of the angle σ
for the cluster, and constructing a new arbitrary cluster at an angle σ
from the common axis in the concept space, each new cluster comprising documents having such an angle θ
falling outside the predefined variance of the angle σ
for the remaining clusters.
13 Assignments
0 Petitions
Accused Products
Abstract
A system and method for efficiently generating cluster groupings in a multi-dimensional concept space is described. A plurality of terms is extracted from each document in a collection of stored unstructured documents. A concept space is built over the document collection. Terms substantially correlated between a plurality of documents within the document collection are identified. Each correlated term is expressed as a vector mapped along an angle θ originating from a common axis in the concept space. A difference between the angle θ for each document and an angle σ for each cluster within the concept space is determined. Each such cluster is populated with those documents having such difference between the angle θ for each such document and the angle σ for each such cluster falling within a predetermined variance. A new cluster is created within the concept space those documents having such difference between the angle θ for each such document and the angle σ for each such cluster falling outside the predetermined variance.
-
Citations
1 Claim
-
1. A system for building a multi-dimensional semantic concept space over a stored document collection, comprising:
-
an extraction module identifying a plurality of documents within a stored document collection containing substantially correlated terms reflecting syntactic content, comprising;
an extractor extracting the terms in literal form from the documents;
a selector selecting the terms having frequencies of occurrence falling within a predefined threshold as being substantially correlated;
a vector module generating a vector reflecting latent semantic similarities discovered between substantially correlated documents logically projected at an angle θ
from a common axis in a concept space;
a cluster module forming one or more arbitrary clusters at an angle σ
from the common axis in the concept space, each cluster comprising documents having such an angle θ
falling within a predefined variance of the angle σ
for the cluster, and constructing a new arbitrary cluster at an angle σ
from the common axis in the concept space, each new cluster comprising documents having such an angle θ
falling outside the predefined variance of the angle σ
for the remaining clusters.
-
Specification