Computer-implemented system and method for text-based document processing
First Claim
Patent Images
1. A computer-implemented method for processing text-based documents, comprising the steps of:
- generating frequency of terms data for terms appearing in the documents;
performing singular value decomposition upon the frequency of terms data in order to form projections of the terms and documents into a reduced dimensional subspace, normalizing the projections to a pre-selected length; and
using the normalized projections to provide structured data about the documents.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for processing text-based documents. A frequency of terms data set is generated for the terms appearing in the documents. Singular value decomposition is performed upon the frequency of terms data set in order to form projections of the terms and documents into a reduced dimensional subspace. The projections are normalized, and the normalized projections are used to analyze the documents.
93 Citations
60 Claims
-
1. A computer-implemented method for processing text-based documents, comprising the steps of:
-
generating frequency of terms data for terms appearing in the documents;
performing singular value decomposition upon the frequency of terms data in order to form projections of the terms and documents into a reduced dimensional subspace, normalizing the projections to a pre-selected length; and
using the normalized projections to provide structured data about the documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A computer-implemented method for processing unstructured text-based documents, comprising the steps of:
-
using a dimensionality reduction procedure in order to form projections of unstructured documents'"'"' terms into a reduced dimensional subspace;
using the reduced dimensional subspace to generate structured data about the unstructured documents;
combining the structured document data with additional structured data; and
analyzing the combined structured data. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58)
-
-
59. A computer-implemented apparatus for processing text-based documents, comprising:
-
means for generating frequency of terms data for terms appearing in the documents;
means for performing singular value decomposition upon the frequency of terms data in order to form projections of the terms and documents into a reduced dimensional subspace, means for normalizing the projections to a pre-selected length; and
means for using the normalized projections to provide structured data about the documents.
-
-
60. A memory for storing data for access by a computer program being executed on a data processing system, comprising a data structure stored in said memory, said data structure including:
-
frequency of terms data for terms appearing in unstructured text-based documents; and
normalized reduced projections of the frequency of terms data, wherein the normalized reduced projections are used by the computer program to generate structured data about the unstructured text-based documents.
-
Specification