Extraction of attributes and values from natural language documents
First Claim
1. A method for associating a plurality of attributes and a plurality of values for a product within at least one natural language document to define attribute-value pairs, the method comprising:
- determining, by a computer, correlations between two or more attributes of the plurality of attributes;
identifying at least one attribute phrase based on the correlations between the two or more attributes;
determining correlations between two or more values of the plurality of values;
identifying at least one value phrase based on the correlations between the two or more values;
associating an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and
storing the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair.
2 Assignments
0 Petitions
Accused Products
Abstract
One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
32 Citations
9 Claims
-
1. A method for associating a plurality of attributes and a plurality of values for a product within at least one natural language document to define attribute-value pairs, the method comprising:
-
determining, by a computer, correlations between two or more attributes of the plurality of attributes; identifying at least one attribute phrase based on the correlations between the two or more attributes; determining correlations between two or more values of the plurality of values; identifying at least one value phrase based on the correlations between the two or more values; associating an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and storing the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair. - View Dependent Claims (2, 3)
-
-
4. An apparatus for associating a plurality of attributes and a plurality of values within at least one natural language document to define attribute-value pairs, comprising:
-
a correlation module operative to determine correlations between two or more attributes of the plurality of attributes, and to determine correlations between two or more values of the plurality of values; a phrase determination module operative to identify at least one attribute phrase based on the correlations between the two or more attributes, and to identify at least one value phrase based on the correlations between the two or more values; a syntactic dependency module operative to associate an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and a machine readable store storing the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair. - View Dependent Claims (5, 6)
-
-
7. A computer-readable medium having stored thereon executable instructions that, when executed, cause a computer to:
-
determine correlations between two or more attributes of a plurality of attributes for a product within at least one natural language document; identify at least one attribute phrase based on the correlations between the two or more attributes; determine correlations between two or more values of a plurality of values for the product within the at least one natural language document; identify at least one value phrase based on the correlations between the two or more values; associate an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and store the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair. - View Dependent Claims (8, 9)
-
Specification