System and method for use in text analysis of documents and records

US 6,665,661 B1
Filed: 09/29/2000
Issued: 12/16/2003
Est. Priority Date: 09/29/2000
Status: Active Grant

First Claim

Patent Images

1. A method of processing text for analysis in a text processing system, comprising:

receiving a plurality of data records, each data record having one or more attribute fields, each field being associated with a different section of the record and wherein at least one of the attribute fields contains textual information;

identifying the specific textual content of each field containing textual information;

generating an index that associates the specific textual content with the attribute field containing the specific textual content, wherein said index is operable for use in text processing; and

generating a vector for each data record that differentiates the textual information of that data record based on the specific textual content contained in at least one of that record'"'"'s attribute field(s).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems are provided that enable text in various sections of data records to be separately catalogued, indexed, or vectorized for analysis in a text visualization and mining system. A text processing system receives a plurality of data records, where each data record has one or a plurality of attribute fields associated with the records. The attributes fields containing textual information are identified. The specific textual content of each attribute field is identified. An index is generated that associates the textual content contained in each attribute field with the attribute field containing the textual content. The index is operable for use in text processing. The plurality of data records may be located in a data table and the textual information may be contained within cells of the data table. In another aspect, a plurality of data records is received, where at least some of the data records contain text terms. A first method is applied to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of the first method. A second method is applied to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of the second method. A vector is generated to distinguish each of the data records based on the text terms weighted by either the first or second method.

119 Citations

17 Claims

1. A method of processing text for analysis in a text processing system, comprising:
- receiving a plurality of data records, each data record having one or more attribute fields, each field being associated with a different section of the record and wherein at least one of the attribute fields contains textual information;
  
  identifying the specific textual content of each field containing textual information;
  
  generating an index that associates the specific textual content with the attribute field containing the specific textual content, wherein said index is operable for use in text processing; and
  
  generating a vector for each data record that differentiates the textual information of that data record based on the specific textual content contained in at least one of that record'"'"'s attribute field(s).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein said plurality of data records are located in a data table.
  - 3. The method of claim 2 wherein said textual information is contained with cells of said data table.
  - 4. The method of claim 3 wherein said data records represent rows in said data table and columns of said table correspond to the attribute fields.
  - 5. The method of claim 1 wherein said textual information includes a plurality of terms and wherein said step of generating an index comprises associating each term with the attribute field containing the term.
  - 6. The method of claim 1 wherein only a selected number of the attribute fields containing textual information are used to generate said vector.
  - 7. The method of claim 1 further comprising receiving a user selectable command for generating said index with textual information indexed either based on the case of the textual information or not based on the case of the textual information.
  - 8. The method of claim 1 wherein said textual information is indexed in a manner that enables the textual information contained within different attribute fields to be compared.
  - 9. The method of claim 1 wherein said data records and associated attribute fields are identified by record and attribute delimiters designated for a particular file format.
  - 10. The method of claim 1 further comprising the step of enabling a user to specify the removal of numeric strings or combination of alphabetic and numeric strings from being considered as part of said textual information.

11. A method of processing text for a data analysis and mining system, comprising the steps of:
- receiving a plurality of data records, wherein at least some of the data records contain text terms;
  
  applying a first method to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of said first method;
  
  applying a second method to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of said second method; and
  
  generating a vector to distinguish each of said data records based on the text terms weighted by either the first or second method.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11 further comprising the step of weighting only text terms corresponding to selected criteria.
  - 13. The method of claim 12 wherein said data record is a table.
  - 14. The method of claim 12 wherein said selected criteria is based on columns selected from said data table.
  - 15. The method of claim 11 wherein said steps of applying said first and second method comprise applying topicality methods.
  - 16. The method of claim 15 wherein said first method includes receiving user specified topicality values.
  - 17. The method of claim 16 wherein said second method includes receiving a number of topics and cross terms for deriving topicality values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Battelle Memorial Institute
Original Assignee
Battelle Memorial Institute
Inventors
Nakamura, Grant C., Crow, Vernon L., Saffer, Jeffrey D., Miller, Nancy E., Scarberry, Randall E., Calaprist, Augustin J.
Primary Examiner(s)
Metjahic, Safet
Assistant Examiner(s)
Alaubaidi, Haythim J.

Application Number

US09/672,599
Time in Patent Office

1,173 Days
Field of Search

707/2-3, 707/1
US Class Current

1/1
CPC Class Codes

G06F 16/313   Selection or weighting of t...

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

System and method for use in text analysis of documents and records

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

119 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

System and method for use in text analysis of documents and records

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

119 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others