Method and system for analyzing unstructured text in data warehouse
First Claim
Patent Images
1. A computer-implemented method for analyzing information in a data warehouse, comprising:
- selecting a sample of documents from the data warehouse;
generating at least one feature space of terms of interest in unstructured text fields of the documents using the sample;
generating at least one default classification using the feature space;
modifying the default classification to render a modified classification;
establishing at least one classifier using the modified classification; and
establishing a classification dimension in the data warehouse using the classifier.
1 Assignment
0 Petitions
Accused Products
Abstract
A user initially analyzes a statistically significant sample of documents randomly drawn from a data warehouse to create a cached feature space and text classifier, which can then be used to establish a classification dimension in the data warehouse for in depth and detailed text analysis of the entire data set.
-
Citations
30 Claims
-
1. A computer-implemented method for analyzing information in a data warehouse, comprising:
-
selecting a sample of documents from the data warehouse;
generating at least one feature space of terms of interest in unstructured text fields of the documents using the sample;
generating at least one default classification using the feature space;
modifying the default classification to render a modified classification;
establishing at least one classifier using the modified classification; and
establishing a classification dimension in the data warehouse using the classifier. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A service for analyzing information in a data warehouse of a customer, comprising:
-
receiving a sample of documents in the warehouse;
based on the sample, generating at least one initial classification;
using the initial classification to generate a classifier;
using the classifier to add documents not in the sample to a classification dimension; and
returning at least one of;
the classification dimension, and an analysis rendered by using the classification dimension, to the customer. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer executing logic for analyzing unstructured text in documents in a data warehouse, the logic comprising:
establishing, based on only a sample of documents in the warehouse, a classification dimension listing all documents in the warehouse, the classification dimension being based on words in unstructured text fields in the documents. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
24. A computer program product having means executable by a digital processing apparatus to analyze data in a data warehouse, comprising:
-
means for selecting a sample of documents from the data warehouse;
means for generating at least one feature space of terms of interest in unstructured text fields of the documents using the sample;
means for generating at least one classification using the feature space;
means for establishing at least one classifier using the classification;
means for identifying a subset of documents in the warehouse;
means for selecting features from the feature space that are relevant to the subset; and
means for comparing the subset with the sample using the features from the feature space that are relevant to the subset. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
Specification