Method of translating free-format data records into a normalized format based on weighted attribute variants

US 5,515,534 A
Filed: 09/29/1992
Issued: 05/07/1996
Est. Priority Date: 09/29/1992
Status: Expired due to Term

First Claim

Patent Images

1. A system for generating a normalized version of a free-formatted data record, said data record being characterized by a plurality of data words, said system comprising a dictionary comprising sequences of data words associated with respective attribute fields corresponding to respective attribute fields defining a normalized data record and associated with respective weight values,means for partitioning said free-formatted record into a plurality of n-word tuples, where n≧

1,means for associating each of said n-word tuples, that compares with a respective one of said sequences, with the associated one of said attribute fields and a respective score value, said score value being derived as a function of the associated one of said weight values, and for identifying for each of said attribute fields the associated n-word tuples having the largest score values, andmeans for forming different combinations from individual ones of the identified n-word tuples and output as said normalized data record that one of said combinations formed from those of said identified n-word tuples having score values which produce the largest sum.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A facility is provided for normalizing the format of stored data records using a dictionary that is generated from a training set of data records having predefined formats.

Citations

7 Claims

1. A system for generating a normalized version of a free-formatted data record, said data record being characterized by a plurality of data words, said system comprising a dictionary comprising sequences of data words associated with respective attribute fields corresponding to respective attribute fields defining a normalized data record and associated with respective weight values,means for partitioning said free-formatted record into a plurality of n-word tuples, where n≧
- 1,means for associating each of said n-word tuples, that compares with a respective one of said sequences, with the associated one of said attribute fields and a respective score value, said score value being derived as a function of the associated one of said weight values, and for identifying for each of said attribute fields the associated n-word tuples having the largest score values, andmeans for forming different combinations from individual ones of the identified n-word tuples and output as said normalized data record that one of said combinations formed from those of said identified n-word tuples having score values which produce the largest sum.
- View Dependent Claims (2, 3)
- - 2. The system set forth in claim 1 further comprising means for forming said dictionary, said means for forming said dictionary includinga plurality of test records each having a predefined format corresponding with respective ones of said attribute fields,means for partitioning each of said test records into respective ones of said sequences of words based on said predefined format, andmeans for storing each of said sequences of words in said dictionary if it has not yet been stored therein or for incrementing the associated one of said weight values if it has been so stored, and for associating each of said stored sequences of words with a respective one of said attribute fields based on said predefined format.
  - 3. The system set forth in claim 1 wherein said n-word tuples include specific and free-matching n-word tuples.

4. A system for converting a free-formatted data record into a normalized data record, said free-formatted data record being characterized by a plurality of data components, said system comprisinga plurality of test records each having a predefined format and each being formed from a plurality of data attributes corresponding to respective attribute fields defining a normalized data record,means for partitioning the data components forming each of said test records into respective specific tokens and free matching tokens, for associating each of said specific and said free matching tokens with respective ones of said data attributes based on said predefined format and for storing each of said specific and free matching tokens and its associated one of said attributes in a dictionary if it has not yet been stored therein or for incrementing an associated counter if it has been so stored,means, responsive to receipt of said free-formatted record from a source of information records, for partitioning the data components forming said record into a number of different sets of specific tokens and a number of different sets of free-matching tokens, for generating a score for each of the tokens of each of said sets that matches a corresponding token stored in said dictionary, in which said score is a function of the contents of the counter associated with the corresponding token, andmeans for forming the tokens associated with scores into different combinations and for outputting as the normalized version of said record that one of said combinations that is formed from those tokens having scores which produce the largest sum.

5. A method of converting data record having a free-formatted format into a data record having a normalized format, said free-formatted data record being characterized by a plurality of data words, said method comprising the steps ofgenerating a dictionary comprising sequences of data words associated with respective attribute fields and associated with respective weight values,partitioning said free-formatted record into a plurality of ordered n-word tuples, where n≧
- 1,associating each of said n-word tuples, that compares with a respective one of said sequences, with the associated one of said attribute fields and a respective score value, said score value being derived as a function of the associated one of said weight values, and identifying for each of said attribute fields the associated n-word tuples having the largest score values, andforming different combinations from individual ones of the identified n-word tuples and outputting as said normalized data record that one of said combinations formed from those of said identified n-word tuples having score values which produce the largest sum.

6. A method of translating a data record comprising a plurality of data words and having a free-formatted format into a data record having a normalized format, said method comprising the steps ofpartitioning said free-formatted data record into a number of different combinations of data words,forming different translated versions of said free-formatted data record from those of said combinations that compare with individual ones of a plurality of sequences of data words forming a dictionary of such sequences, in which each of said sequences is associated with (a) individual ones of a plurality of attribute fields defining the format of a normalized data record and (b) respective score values indicative of the likelihood of such association, andoutputting as the normalized format for said free-formatted data record that one of said different translated versions that is formed from those of said combinations that are associated with score values that produce the largest sum of score values.

7. A method of converting a data record formed from a plurality of data words and having a free-formatted format into a data record having a normalized format, said method comprising the steps ofstoring in a memory a dictionary comprising sequences of data words associated with (a) respective attribute fields corresponding to the fields of a normalized data record and (b) respective weight values,partitioning said free-formatted data record into a plurality of n-word tuples, where n≧
- 1,associating each of said n-word tuples that compares with at least a respective one said of sequences with the associated one of said attribute fields and a respective score value, said score value being derived as a function of the associated one of said weight values, and identifying for each of said attribute fields the associated n-word tuples having the largest score values, andforming different combinations from individual ones of the identified n-word tuples and outputting as said normalized data record that one of said combinations formed from those of said identified n-word tuples having score values which produce the largest sum.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T IPM Corp. (AT&T, Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Chuah, Mooi C., Wong, Wing S.
Primary Examiner(s)
AMSBURY, WAYNE P

Application Number

US07/953,403
Time in Patent Office

1,316 Days
Field of Search

395/600, 364/419
US Class Current

707/695
CPC Class Codes

G06F 16/258   Data format conversion from...

Y10S 707/968   Partitioning

Y10S 707/99942   Manipulating data structure...

Method of translating free-format data records into a normalized format based on weighted attribute variants

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method of translating free-format data records into a normalized format based on weighted attribute variants

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links