Streaming text data mining method & apparatus using multidimensional subspaces
First Claim
1. A method of comparing a unit of streaming text data to an existing document collection, comprising:
- computing a vector representation of the unit of streaming text data;
transforming the vector representation into a projection in a predetermined vector subspace; and
calculating a relationship value indicative of a relationship between the vector representation and the subspace.
1 Assignment
0 Petitions
Accused Products
Abstract
A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.
-
Citations
39 Claims
-
1. A method of comparing a unit of streaming text data to an existing document collection, comprising:
-
computing a vector representation of the unit of streaming text data;
transforming the vector representation into a projection in a predetermined vector subspace; and
calculating a relationship value indicative of a relationship between the vector representation and the subspace. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product for comparing a unit of streaming text data to an existing document collection, including a computer-readable medium encoded with instructions configured to be executed by a processor in order to perform predetermined operations comprising:
-
computing a vector representation of the unit of streaming text data;
transforming the vector representation into a projection in a predetermined vector subspace; and
calculating a relationship value indicative of the relationship between the vector representation and the subspace. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A streaming text data comparator, comprising:
-
a streaming text data vectorizer configured to compute a vector representation of a unit of streaming text data;
a vector projector configured to transform the vector representation into a projection in a predetermined vector subspace; and
a relationship calculator configured to calculate a relationship value between the vector representation and the subspace. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification