Classification of data records by comparison of records to a training database using probability weights

US 5,251,131 A
Filed: 07/31/1991
Issued: 10/05/1993
Est. Priority Date: 07/31/1991
Status: Expired due to Term

First Claim

Patent Images

1. A system for classifying natural language data, comprising:

means for storing a new record including a plurality of predictor data fields containing the natural language data expressed in natural language values,means for storing a plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record, andprobability weight means for storing, for each feature, a probability weight value representing a probability that a new record will have the target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the new record,query means for extracting features from the new record and querying the training records with each feature extracted from the new record,the query means being responsive to a match between a feature extracted from the new record and a feature stored in said training record for providing the probability weight corresponding to the feature, andmetric means for receiving the probability weights from the query means and accumulating for each training record a comparison score representing the probability that said training record matches the new record, andproviding an output indicating said target field value of said training record as said target value of the new record.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Classification of natural language data wherein the natural language data has an open-ended range of possible values or the data values do not have a relative order. A training database stores training records, wherein each training record includes predictor data fields. Each predictor data field containes a feature, wherein each feature is a natural language term, and a target data field containing a target value representing a classification of the record. Features may also include conjunctions of natural language terms and each feature may also be a member of a category subset of features. The training database stores, for each feature, a probability weight value representing the probability that a record will have the target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the record. Features are extracted from a new record and each feature from the new record is used to query the training records to determine the probability weights from the training records having matching features. The probability weights are accumulated for each training record to determine a comparison score representing the probability that the training record matches the new record and provide an output indicating the training records most probability matching the new record.

269 Citations

37 Claims

1. A system for classifying natural language data, comprising:
- means for storing a new record including a plurality of predictor data fields containing the natural language data expressed in natural language values,means for storing a plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record, andprobability weight means for storing, for each feature, a probability weight value representing a probability that a new record will have the target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the new record,query means for extracting features from the new record and querying the training records with each feature extracted from the new record,the query means being responsive to a match between a feature extracted from the new record and a feature stored in said training record for providing the probability weight corresponding to the feature, andmetric means for receiving the probability weights from the query means and accumulating for each training record a comparison score representing the probability that said training record matches the new record, andproviding an output indicating said target field value of said training record as said target value of the new record.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The system of claim 1 for classifying records containing natural language data, wherein each probability weight value is a per-target value weight representing, for a corresponding single combination of a feature and a target field value, a conditional probability that the target field will have the target field value when the feature occurs among the features of the record containing the target field value.
  - 3. The system of claim 2 for classifying natural language data, wherein the probability weight means includes:
    - for each feature in each training record, a corresponding probability weight field associated with the predictor data field containing the feature, each probability weight field containing a per-target probability weight value for the feature contained in the associated predictor data field.
  - 4. The system of claim 1 for classifying natural language data, wherein each probability weight value is a cross-target value weight representing, for a given corresponding feature, a relative conditional probability that the corresponding feature is of significance in determining target field values across all target field values of the training records.
  - 5. The system of claim 4 for classifying natural language data, wherein the probability weight means further includes:
    - for each feature, a probability weight field for containing the cross-target probability weight value for the feature.
  - 6. The system of claim 4 for classifying natural language data, wherein a per-target value of a feature is a conditional probability that a target field will have a target field value given that the feature occurs among the features of the record containing the target field value and wherein each cross-target value weight is the sum of the squares of the per-target value weights for the corresponding feature.
  - 7. The system of claim 1 for classifying natural language data, wherein natural language data is characterized by data values having an open-ended range of possible values.
  - 8. The system of claim 1 for classifying natural language data, wherein natural language data is characterized in that the natural language data values do not have a relative order or ranking.
  - 9. The system of claim 1 for classifying natural language data, wherein features further comprise conjunctions of natural language terms.
  - 10. The system of claim 1 for classifying natural language data, wherein features comprise words and conjunctions of pairs of words.
  - 11. The system of claim 1 for classifying natural language data, wherein:
    - each feature is a member of one of a plurality of category subsets of features, anda feature appearing in identical form a multiplicity of the category subsets comprises a corresponding multiplicity of separate and distinct features.
  - 12. The system of claim 1 for classifying natural language data, wherein the accumulated comparison score for each training record is the sum of the probability weights of the features in the new record which match features in the training record.
  - 13. The system of claim 1 for classifying natural language data, wherein the accumulated comparison score for each training record is the highest probability weight of all the probability weights of all features in the new record that match features in the training record.
  - 14. The system of claim 1 for classifying natural language data, wherein the accumulated comparison score for each training record is a cumulative probability of error in predicting the target value of the new record target field over all features of the new record which matched features of the training record.
  - 15. The system of claim 1 for classifying natural language data, wherein the metric means further comprises means for selecting from the training records a subset of the training records having the highest comparison scores, without regard to the target field values of the selected training records, aggregating the selected training records by their target field values, and selecting the target field value of the training records having the highest aggregate match score as the target field value for the new record.
  - 16. The system of claim 15 for classifying natural language data, wherein the metric means further comprises means for determining a confidence score for the selected target field value as the ratio of the aggregate match score of the training records having the highest aggregate match score to the sum of the aggregate match scores of the training records having the highest aggregate match score and the second highest aggregate match score.

17. In a data parallel system, means for classifying natural language data, comprising:
- a plurality of processing elements for storing a corresponding plurality of training records, each training record residing in a process in a memory associated with the corresponding processor element and includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record,a probability weight means for storing, for each feature, a probability weight value representing a probability that a new record will have said target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the new record,a query means for extracting features from the new record and querying the training records with each feature extracted from the new record, includinga control means for storing the new record, extracting the features from the new record and transmitting the features to the processing elements, andthe processing elements for reading the features from each associated training record and responsive to a match between a feature extracted from the new record and a feature stored in said training record for reading the probability weight corresponding to the feature, anda metric means for receiving the probability weights from the query means and accumulating for each training record a comparison score representing the probability that the training record matches the new record and providing an output indicating said training record most probably matching the new record, includingthe processing elements for receiving the probability weights, each processor being responsive to instructions from the control means for accumulating for each corresponding training record a comparison score representing the probability that the training record matches the new record, anda global combining means for providing an output indicating a training record most probability matching the new record.

18. A system for generating training records for use in classifying natural language data, comprising:
- means for storing a plurality of historic records,each historic record includinga target data field containing a target value representing a classification of a historic record, anda plurality of predictor data fields, each predictor data field containing a natural language term, andmeans for storing a plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, andsaid target data field containing said target value representing a classification of a training record, anda probability weight memory for storing, for each feature,a probability weight value representing a probability that a new record will have the target value contained in the target data field if said feature contained in a corresponding predictor data field occurs in the new record,means for reading the natural language terms from each of the historic records and identifying each feature appearing in the historic records,probability weight generating means forselecting in turn each feature identified from the historic records, andfor each feature, selecting in turn each historic record target value for which the feature appears,determining, for each historic record target value for each feature a probability weight, andgenerating, for each historic record, a corresponding training record andstoring in the predictor data fields of the training record the features identified from the corresponding historic record, andin the target data field of the training record the target value from the historic record, andstoring the probability weight generated for the feature and target in the probability weight memory.

19. In a data parallel system, means for generating training records for use in classifying natural language data, comprising:
- a first plurality of processing elements for storing a corresponding plurality of historic records,each historic record residing in a process in a memory of a corresponding processor element and includinga target data field containing a target value representing a classification of a historic record, anda plurality of predictor data fields, each predictor data field containing a natural language term, anda second plurality of processing elements for storing a corresponding plurality of training records,each training record residing in said process in said memory of a corresponding processor element and includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record,a probability weight memory for storing, for each feature,a probability weight value representing a probability that a new record will have the target value contained in the target data field if said feature contained in a corresponding predictor data field occurs in the new record,a query means including a control means and the processing elements for reading the natural language terms from each of the historic records and identifying each feature appearing in the historic records,a probability weight generating means including the control means and the processing elements forselecting in turn each feature identified from the historic records, andfor each feature, selecting in turn each historic record target value for which the feature appears, and,a global combining means for determining, for each historic record target value and for each feature a probability weight value representing the probability that a record will have the target value contained in the target data field if said feature contained in a corresponding predictor data field occurs in the record, andstoring in the predictor data fields of each training record the features identified from the corresponding historic record, andin the target data fields of the corresponding training record the target value from the historic record, andstoring the probability weight generated for the feature and target value in a probability weight memory.

20. In a data parallel system, means for classifying natural language data, comprising:
- a plurality of processing elements for storing a corresponding plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record,control means for storing a new record containing features comprised of the natural language data, and for broadcasting the features of the new record to the processing elements storing the training records,each processing element being responsive to the broadcast features of the new record to construct, in a process associated with each training data record, a boolean comparison table for storing indications of matches between the broadcast new record features and the features of the corresponding training record,scanning means for reading the indications of matches in each boolean comparison table and determining, for each broadcast feature of the new record and each target value of the training records,number of indications of matches between the broadcast feature and the features of the training record for each target value, andnumber of indications of matches between the broadcast feature and the features of the training record over all target values, andfor determining, for each broadcast feature and each target value,a probability weight representing a probability that the new sample will have the target value of the record if the new sample feature appears in the record.
- View Dependent Claims (21, 22, 23)
- - 21. The means for classifying natural language data of claim 20, further comprising:
    - a metric means responsive to each probability weight corresponding to a feature of the new sample for determining the training record most probably matching the new record.
  - 22. The means for classifying natural language data of claim 20, wherein the scanning means further comprises:
    - means for performing logical AND operations on each possible combination of the match indications in each boolean comparison table, each AND operation representing a conjunctive feature, and determining, for each conjunctive feature and each target value of the training records,the number of matches between each conjunctive feature and the features of the training record for each target value, andthe number of matches between the conjunctive feature and the features of the training record over all target values, andfor determining, for each broadcast feature and each target value,a probability weight representing the probability that the new sample will have the target value of the record if the conjunctive feature appears in the record.
  - 23. The means for classifying natural language data of claim 20, wherein the probability weights determined over each target value are per target probability weights and the scanning means further comprises:
    - means for determining, for each feature, the square of each per target probability weight of the feature for all target values, anddetermining the cross target probability weight for each feature by summing the squares of the per target probability weights of the feature over all target values.

24. In a data parallel system, means for constructing probability weight tables for storing probability weights of a plurality of features of each of a plurality of training records for use in classifying natural language data, wherein each training record includes a plurality of predictor data fields, each predictor data field containing a feature wherein each feature is a natural language term, and a target data field containing a target value representing a classification of a record, wherein a probability weight is a probability that a new sample of a record containing natural language data will have the target value of a training record if a feature of the training record appears in the new sample of the new sample, comprising:
- a plurality of processors for storing a corresponding plurality of record feature structures,each record feature structure corresponding to said training record and includinga plurality of record feature structure entries,each record feature structure entry containing a feature or a possible conjunction of features of the corresponding training record, and whereinthe record feature structure entries of each record feature structure are sorted according to the values of the keys formed by a concatenation of the values of the features of each entry and the target value of the corresponding training record,a plurality of processors for storing a corresponding plurality of probability weight tables,each probability weight table corresponding to said training record and containing an entry for each possible conjunction of features of the training record,a single feature of said training record being represented in the corresponding probability weight table as a conjunction of the single feature with itself,means for selecting sets of record feature structures,wherein each set of record feature structures have a common value for a feature portion of their keys, andselecting, within each set of register feature structures, a plurality of subsets of record feature structures, wherein the record feature structures of each subset of record feature structures have a common target field value, andwherein each subset of record feature structures corresponds to a feature of the record feature structure,means for determining, for each set of record feature structures,the number of record feature structures in the set of record feature structures, andfor each subset of record feature structures in the set of register feature structures, the number of record feature structures in the subset, andfor each subset in the set of record feature structures,dividing the number of record feature structures in the subset by the number of record feature structures in the set of record feature structures to determine a conjunctive probability weight of the corresponding feature of the record feature structure, andwriting the conjunctive probability weight for each feature of each record feature structure into a corresponding entry of the probability weight table of the corresponding training record.

25. In a data parallel system including a means for storing a plurality of weight tables for storing probability weights of features of each of a plurality of training records, wherein a probability weight is a probability that a new sample of a record containing natural language data will have the target value of a training record if a feature of the training record appears in the new sample of the new sample, means for classifying new records containing natural language data, comprising:
- a plurality of processors for storing a corresponding plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of said training record,a plurality of processors for storing a corresponding plurality of probability weight tables,each probability weight table corresponding to said training record and containing an entry for each possible conjunction of features of the training record,a single feature of said training record being represented in the corresponding probability weight table as a conjunction of the single feature with itself,a control means for storing a new record containing features comprised of natural language data, and for broadcasting the features of the new sample to the processors storing the training records,each processor being responsive to the broadcast features of the new sample to construct, in a process associated with each training data record, a boolean comparison table for storing indications of matches between the broadcast new sample features and the features of the corresponding training record,scanning means for performing logical AND operations on each combination of the match indications in each boolean comparison table to find conjunctive feature matches, wherein each AND operation represents a conjunctive feature, and,for each conjunctive feature match found in a boolean comparison table, using the values of the conjunctive features resulting in the match as indices into the probability weight table of the corresponding training record and reading from the probability weight table a conjunctive probability weight of the conjunctive feature.
- View Dependent Claims (26)
- - 26. The means for classifying natural language data of claim 25, further comprising:
    - a metric means responsive to each probability weight corresponding to a feature of the new sample for determining the training record most probably matching the new record.

27. In a data parallel system including a plurality of processing elements, each processing element including a memory for storing data and an associated processor to operate on the memory for performing operations on the data residing in the memory, and control means for issuing instructions for directing operations of the system, each processor being responsive to the instructions for performing the operations in parallel on the data stored in the associated memory, a method for classifying natural language data, comprising steps of:
- storing a plurality of training records in a corresponding plurality of processor elements,each training record residing in a process in the memory of the processor element and includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of training record, andstoring in a probability weight memory, and for each feature, a probability weight value representing a probability that a new record will have the target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the new record,querying the training records with each feature extracted from a new record, bystoring the new record in the control means and, by operation of the control means, extracting the features from a new record and transmitting the features to the processors of the processing elements, andin the processors of the processing elements,reading the features from the training records stored in each associated training record, andresponsive to each match between a feature extracted from the new record and a feature stored in a training record, reading the probability weight corresponding to the feature, andaccumulating, in the processing element for each corresponding training record and according to a selected metric, a comparison score representing the probability that said training record matches the new record and providing an output indicating said training record most probably matching the new record, andselecting the target field value of a training record as a target value of the new record.

28. A method for implementing in a data parallel system which includes a plurality of processing elements, each processing element including a memory for storing data and an associated processor to operate on the memory for performing operations on the data residing in the memory, a global combining means for performing operations on outputs of the processing elements, and control means for issuing instructions for directing operations of the system, each processor being responsive to the instructions for performing the operations in parallel on the data stored in the associated memory, said method for generating training records for use in classifying natural language data, comprising steps of:
- storing in a first plurality of processing elements a corresponding plurality of historic records,each historic record residing in a process in the memory of a corresponding processor element and includinga target data field containing a target value representing a classification of a historic record, anda plurality of predictor data fields, each predictor data field containing a natural language term, andstoring in a second plurality of processing elements a corresponding plurality of training records,each training record residing in said process in the memory of a corresponding processor element and includinga plurality of predictor data fields, each predictor data field containing said feature, wherein said each feature is a natural language term, anda target data field containing a target value representing a classification of a training record, andby operation of the control means and the processing elements, reading the natural language terms from each of the historic records and identifying each feature appearing in the historic records,generating a probability weight for said each feature, includingby operation of the control means and the processing elements,selecting in turn said each feature identified from the historic records, andfor said each feature, selecting in turn each historic record target value for which the feature appears, and,by operation of the global combining means,determining, for each historic record target value and for said each feature a probability weight value representing a probability that a record will have the target value contained in the target data field if said feature contained in a corresponding predictor data field occurs in the record, andstoring in the predictor data fields of each training record the features identified from the corresponding historic record, andin the target data fields of the corresponding training record the target value from the historic record, andstoring in a probability weight memory, for said each feature,a probability weight value generated for each feature and target value.

29. A method for implementing in a data parallel system which includes a plurality of processing elements, each processing element including a memory for storing data and an associated processor to operate on the memory for performing operations on the data residing in the memory, a global combining means for performing operations on outputs of the processing elements, and control means for issuing instructions for directing operations of the data parallel system, each processor of each processing element being responsive to the instructions for performing the operations in parallel on the data stored in the memory of the processing element, said method for classifying natural language data, comprising the steps of:
- storing in the processing elements a plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein each feature is a natural language term, anda target data field containing a target value representing a classification of a training record,storing a new record containing features comprised of the natural language data in the control means and, by operation of the control means, broadcasting the features of the new record to the processing elements storing the training records,in each processing element, and responsive to the broadcast features of the new record, constructing in a process associated with each training data record a boolean comparison table for storing indications of matches between the broadcast new record features and the features of the corresponding training record,in the processing elements, scanning each of the boolean comparison tables for indications of matches and determining, for each broadcast feature and each target value of the training records,a number of indications of matches between the broadcast feature and the features of the training record for each target value, anda number of indications of matches between the broadcast feature and the features of the training record over all target values, anddetermining, for each broadcast feature and each target value,a probability weight representing a probability that the new record will have the target value of a training record if a new sample feature appears in the training record.
- View Dependent Claims (30, 31, 32)
- - 30. The method for classifying natural language data of claim 29, further comprising the steps of:
    - accumulating according to a selected metric each probability weight corresponding to a feature of the new sample, andselecting a training record most probably matching the new record.
  - 31. The method for classifying natural language data of claim 29, wherein the scanning steps further comprise the steps of:
    - scanning the boolean comparison tables with a logical AND operations on each possible combination of the match indications in each boolean comparison table, each AND operation representing a conjunctive feature, and determining, for each conjunctive feature and each target value of the training records,a number of indications of matches between each conjunctive feature and the features of the training record for each target value, anda number of indications of matches between the conjunctive feature and the features of the training record over all target values, anddetermining, for each broadcast feature and each target value,a probability weight representing a probability that the new sample will have the target value of a training record if the conjunctive feature appears in the training record.
  - 32. The method for classifying natural language data of claim 29, wherein the probability weights determined over each target value are per target probability weights and the steps for determining probability weights further comprise the steps of:
    - determining, for said feature, a square of each per target probability weight of the feature for all target values, anddetermining the cross target probability weight for said feature by summing the squares of the per target probability weights of the feature over all target values.

33. In a data parallel system which includes a plurality of processing elements, each processing element including a memory for storing data and an associated processor to operate on the memory for performing operations on the data residing in the memory, a global combining means for performing operations on outputs of the processing elements, and control means for issuing instructions for directing operations of the data parallel system, each processor being responsive to the instructions for performing the operations in parallel on the data, stored in the associated memory, said method for constructing probability weight tables for storing probability weights of a plurality of features of each of a plurality of training records for use in classifying natural language data, wherein each training record includes a plurality of predictor data fields, each predictor data field containing a feature wherein each feature is a natural language term, and a target data field containing a target value representing a classification of a record, wherein a probability weight is a probability that a new sample of a record containing natural language data will have the target value of a training record if a feature of the training record appears in the new sample of the new sample, comprising the steps of:
- storing in a first plurality of processors a corresponding plurality of record feature structures,each record feature structure corresponding to said training record and includinga plurality of record feature structure entries,each record feature structure entry containing a feature or a possible conjunction of features of the corresponding training record, and whereinthe record feature structure entries of each record feature structure are sorted according to the values of the keys formed by the concatenation of the values of the features of each entry and the target value of the corresponding training record,storing in a second plurality of processors a corresponding plurality of probability weight tables,each probability weight table corresponding to said training record and containing an entry for each possible conjunction of features of the training record,a single feature of said training record being represented in the corresponding probability weight table as a conjunction of the single feature,in the processors, selecting sets of record feature structures,wherein each set of record feature structures have a common value for a feature portion of their keys, andwithin each set of register feature structures, a plurality of subsets of record feature structures, wherein a record feature structure of each subset of record feature structures have a common target field value, andwherein each subset of record feature structures corresponds to a feature of the record feature structure,in the processors, determining, for each set of record feature structures,a number of record feature structures in the set of record feature structures, andfor each subset of record feature structures in the set of register feature structures, the number of record feature structures in the subset, andfor each subset in the set of record feature structures,dividing the number of record feature structures in the subset by the number of record feature structures in the set of record feature structures to determine a conjunctive probability weight of the corresponding feature of the record feature structure, andwriting the conjunctive probability weight for each feature of each record feature structure into a corresponding entry of the probability weight table of the corresponding training record.

34. A method for implementing in a data parallel system which includes a plurality of processing elements, each processing element including a memory for storing data and an associated processor to operate on the memory for performing operations on the data residing in the memory, a global combining means for performing operations on outputs of the processing elements, and control means for issuing instructions for directing operations of the system, each processor of each processing element being responsive to the instructions for performing the operations in parallel on the data stored in the memory of the processing element, said method for classifying new records containing natural language data, comprising the steps of:
- storing in a plurality of processors a corresponding plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, wherein said feature is a natural language term, anda target data field containing a target value representing a classification of a training record,storing in a plurality of processors a corresponding plurality of probability weight tables,each probability weight table corresponding to said training record and containing a probability weight entry for each possible conjunction of features of the training record, whereina single feature of said training record is represented in the corresponding probability weight table as a conjunction of a single feature, and whereina probability weight is a probability that a new sample record containing natural language data will have the target value of said training record if said feature of the training record appears in the new record,storing a new record containing features comprised of natural language data and broadcasting the features of the new sample to the processors storing the training records,in each processor and responsive to the broadcast features of the new sample, constructing in a process associated with each training data record a boolean comparison table for storing indications of matches between the broadcast new sample features and the features of the corresponding training record,in the processors, performing logical AND scanning operation on each combination of the indications of matches in each boolean comparison table to find conjunctive feature matches, wherein each AND operation represents a conjunctive feature, and,for each indication of said conjunctive feature match found in a boolean comparison table, using the values of the conjunctive features resulting in the indication of said conjunctive feature match as indices into the probability weight table of the corresponding training record and reading from the probability weight table the conjunctive probability weight of the conjunctive feature.
- View Dependent Claims (35)
- - 35. The method for classifying natural language data of claim 34, further comprising the steps of:
    - accumulating the probability weights corresponding to a feature of the new sample according to a selected metric, anddetermining the training record most probably matching the new record.

36. A system for classifying natural language data, comprising:
- means for storing a new record including a plurality of predictor data fields containing the natural language data expressed in natural language values,means for storing a plurality of training records,each training record includinga plurality of predictor data fields, each predictor data field containing a feature, whereinsaid feature is a natural language term,said feature is a member of one of a plurality of category subsets of features, andsaid feature appearing in identical form in a multiplicity of the category subsets comprises a corresponding multiplicity of separate and distinct features, anda target data field containing a target value representing a classification of said training record, andprobability weight means for storing, for said feature, a probability weight value representing a probability that a new record will have the target value contained in the target data field if said feature contained in a corresponding predictor data field occurs in the new record,query means for extracting features from the new record and querying the training records with said feature extracted from the new record,the query means being responsive to a match between said feature extracted from the new record and said feature stored in a training record for providing the probability weight corresponding to the feature, andmetric means for receiving the probability weights from the query means and accumulating for said training record a comparison score representing the probability that said training record matches the new record, andproviding an output indicating a target field value of said training record as a target value of the new record.

37. A system for comparing a new data record to training data records, comprising:
- means for storing a new record including a plurality of data fields containing new record data values,means for storing a plurality of training records,each training record including a plurality of data fields containing training record data values,probability weight means for storing a probability weight for each training record data value,each probability weight value representing a probability that a new record will have a match with said training record if a data value in a training record data field occurs in a new record data field, andcomparison means forquerying the training records with each data value of the new record,accumulating for said training record a comparison score of the probability weights for each match between a new record data value and a training record data value, andproviding an output indicating a comparison score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Thinking Machines Corporation (Oracle Corporation)
Inventors
Masand, Brij M., Smith, Stephen J.
Primary Examiner(s)
Envall, Jr,, Roy N.
Assistant Examiner(s)
Poinvil, Frantzy

Application Number

US07/739,111
Time in Patent Office

797 Days
Field of Search

364/419, 364/513.5, 364/200 MS File, 395/600, 381/41
US Class Current

704/9
CPC Class Codes

G06F 18/24147   Distances to closest patter...

G06F 40/30   Semantic analysis

G10L 15/18   using natural language mode...

Classification of data records by comparison of records to a training database using probability weights

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

269 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Classification of data records by comparison of records to a training database using probability weights

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

269 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links