System and method for use in text analysis of documents and records
First Claim
1. A method of processing text for analysis in a text processing system, comprising:
- receiving a plurality of data records, each data record having one or more attribute fields, each field being associated with a different section of the record and wherein at least one of the attribute fields contains textual information;
identifying the specific textual content of each field containing textual information;
generating an index that associates the specific textual content with the attribute field containing the specific textual content, wherein said index is operable for use in text processing; and
generating a vector for each data record that differentiates the textual information of that data record based on the specific textual content contained in at least one of that record'"'"'s attribute field(s).
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems are provided that enable text in various sections of data records to be separately catalogued, indexed, or vectorized for analysis in a text visualization and mining system. A text processing system receives a plurality of data records, where each data record has one or a plurality of attribute fields associated with the records. The attributes fields containing textual information are identified. The specific textual content of each attribute field is identified. An index is generated that associates the textual content contained in each attribute field with the attribute field containing the textual content. The index is operable for use in text processing. The plurality of data records may be located in a data table and the textual information may be contained within cells of the data table. In another aspect, a plurality of data records is received, where at least some of the data records contain text terms. A first method is applied to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of the first method. A second method is applied to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of the second method. A vector is generated to distinguish each of the data records based on the text terms weighted by either the first or second method.
119 Citations
17 Claims
-
1. A method of processing text for analysis in a text processing system, comprising:
-
receiving a plurality of data records, each data record having one or more attribute fields, each field being associated with a different section of the record and wherein at least one of the attribute fields contains textual information;
identifying the specific textual content of each field containing textual information;
generating an index that associates the specific textual content with the attribute field containing the specific textual content, wherein said index is operable for use in text processing; and
generating a vector for each data record that differentiates the textual information of that data record based on the specific textual content contained in at least one of that record'"'"'s attribute field(s). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of processing text for a data analysis and mining system, comprising the steps of:
-
receiving a plurality of data records, wherein at least some of the data records contain text terms;
applying a first method to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of said first method;
applying a second method to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of said second method; and
generating a vector to distinguish each of said data records based on the text terms weighted by either the first or second method. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification