Detection of attributes in unstructured data
First Claim
1. A method for processing information, comprising:
- receiving a set of records, which comprise a plurality of fields containing data regarding respective items;
selecting a field that occurs in all of the records and contains multiple terms in each of the records;
identifying at least first and second terms that occur among the terms in the selected field in the records, such that the records are partitioned into at least first and second respective subsets by occurrences of the at least first and second terms in the selected field;
determining, responsively to partitioning of the records by the occurrences, that the at least first and second terms correspond to at least first and second different values of an attribute of the items; and
classifying the data according to the values of the attribute and outputting the classified data.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for processing information includes receiving a set of records, which include a plurality of fields containing data regarding respective items, and selecting a field that occurs in all of the records and contains multiple terms in each of the records. At least first and second terms that occur among the terms in the selected field in the records are identified, such that the records are partitioned into at least first and second respective subsets by occurrences of the at least first and second terms in the selected field. Responsively to partitioning of the records by the occurrences, it is determined that the at least first and second terms correspond to at least first and second different values of an attribute of the items. The data are classified according to the values of the attribute.
29 Citations
20 Claims
-
1. A method for processing information, comprising:
-
receiving a set of records, which comprise a plurality of fields containing data regarding respective items; selecting a field that occurs in all of the records and contains multiple terms in each of the records; identifying at least first and second terms that occur among the terms in the selected field in the records, such that the records are partitioned into at least first and second respective subsets by occurrences of the at least first and second terms in the selected field; determining, responsively to partitioning of the records by the occurrences, that the at least first and second terms correspond to at least first and second different values of an attribute of the items; and classifying the data according to the values of the attribute and outputting the classified data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. Apparatus for processing information, comprising:
-
a memory, which is configured to store a set of records, which comprise a plurality of fields containing data regarding respective items; and a processor, which is configured to select a field that occurs in all of the records and contains multiple terms in each of the records, to identify at least first and second terms that occur among the terms in the selected field in the records, such that the records are partitioned into at least first and second respective subsets by occurrences of the at least first and second terms in the selected field, to determine responsively to partitioning of the records by the occurrences, that the at least first and second terms correspond to at least first and second different values of an attribute of the items, and to classify the data according to the values of the attribute. - View Dependent Claims (10, 11, 12, 13, 14)
-
- 15. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a set of records, which comprise a plurality of fields containing data regarding respective items, to select a field that occurs in all of the records and contains multiple terms in each of the records, to identify at least first and second terms that occur among the terms in the selected field in the records, such that the records are partitioned into at least first and second respective subsets by occurrences of the at least first and second terms in the selected field, to determine responsively to partitioning of the records by the occurrences, that the at least first and second terms correspond to at least first and second different values of an attribute of the items, and to classify the data according to the values of the attribute.
-
17. The product according to claim 17, wherein the instructions cause the computer to compute a metric that increases in response to a union of the subsets and decreases in response to an intersection of the subsets, and selecting the terms to add to the group so as to maximize the metric.
Specification