Extraction of attributes and values from natural language documents
First Claim
1. A method comprising:
- labeling, by a device, a first portion of a first string of words as at least two attributes for a product associated with at least one document;
labeling, by the device, a second portion of the first string of words as at least two values for the product,the first portion of the first string of words being different than the second portion of the first string of words;
associating, by the device, the at least two attributes and the at least two values,the associating including;
associating an attribute, of the at least two attributes, with a value, of the at least two values, to provide an attribute-value pair by;
merging attributes, of the at least two attributes, having one or more first correlation values that satisfy a correlation threshold; and
merging values, of the at least two values, having one or more second correlation values that satisfy the correlation threshold;
identifying, by the device, a second string of words,the second string of words being different than the first string of words, andthe second string of words including a plurality of words that are included in the first string of words;
determining, by the device, that a context associated with the first string of words is similar to a context associated with the second string of words; and
labeling, by the device and based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value using the association of the at least two attributes and the at least two values.
0 Assignments
0 Petitions
Accused Products
Abstract
One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
29 Citations
30 Claims
-
1. A method comprising:
-
labeling, by a device, a first portion of a first string of words as at least two attributes for a product associated with at least one document; labeling, by the device, a second portion of the first string of words as at least two values for the product, the first portion of the first string of words being different than the second portion of the first string of words; associating, by the device, the at least two attributes and the at least two values, the associating including; associating an attribute, of the at least two attributes, with a value, of the at least two values, to provide an attribute-value pair by; merging attributes, of the at least two attributes, having one or more first correlation values that satisfy a correlation threshold; and
merging values, of the at least two values, having one or more second correlation values that satisfy the correlation threshold;identifying, by the device, a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determining, by the device, that a context associated with the first string of words is similar to a context associated with the second string of words; and labeling, by the device and based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value using the association of the at least two attributes and the at least two values. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device comprising:
-
a memory to store instructions; and a processor to execute the instructions to; label a first a portion of a first string of words as at least two attributes for a product associated with at least one document; label a second portion of the first string of words as at least two values for the product, the first portion of the first string of words being different than the second portion of the first string of words; associate the at least two attributes with the at least two values, the processor, when associating the at least two attributes with the at least two values, being to; associate an attribute, of the at least two attributes, with a value, of the at least two values, to provide an attribute-value pair by; merging attributes, of the at least two attributes, having one or more first correlation values that satisfy a correlation threshold; and
merging values, of the at least two values, having one or more second correlation values that satisfy the correlation threshold;identify a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determine that a context associated with the first string of words is similar to a context associated with the second string of words; and label, based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value using the association of the at least two attributes and the at least two values. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing instructions, the instructions comprising:
-
one or more instructions which, when executed by at least one processor, cause the at least one processor to; label a first portion of a first string of words in at least one document as at least two attributes for a product; label a second portion of the first string of words as at least two values for the product, the first portion of the first string of words being different than the second portion of the first string of words; associate the at least two attributes and the at least two values, the one or more instructions to associate the at least two attributes and the at two values including; one or more instructions to associate an attribute, of the at least two attributes, with a value, of the at least two values, to provide an attribute-value pair by; merging attributes, of the at least two attributes, having one or more first correlation values that satisfy a correlation threshold; and
merging values, of the at least two values, having one or more second correlation values that satisfy the correlation threshold;identify a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determine that a context associated with the first string of words is similar to a context associated with the second string of words; and label, based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value using the association of the at least two attributes and the at least two values. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method comprising:
-
labeling, by a device, a first set of attributes and a first set of values of a first string of words associated with a product using a first algorithm applied to the at least one document; identifying, by the device, a second set of attributes and a second set of values of the product via a second algorithm applied to the at least one document and based on the first set of attributes and the first set of values, the first algorithm being different than the second algorithm; associating, by the device, the first set of attributes and the second set of attributes with the first set of values and the second set of values, the associating including; associating the first set of attributes with the first set of values to provide one or more attribute-value pairs by; merging attributes, of the first set of attributes, having one or more first correlation values that satisfy a correlation threshold; and merging values, of the first set of values, having one or more second correlation values that satisfy the correlation threshold; and associating the second set of attributes with the second set of values to provide one or more attribute-value pairs by; merging attributes, of the second set of attributes, having one or more first correlation values that satisfy the correlation threshold; and merging values, of the second set of values, having one or more second correlation values that satisfy the correlation threshold; identifying, by the device, a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determining, by the device, that a context associated with the first string of words is similar to a context associated with the second string of words; and labeling, by the device and based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value or an attribute using the association of the first set of attributes and the second set of attributes with the first set of values and the second set of values. - View Dependent Claims (23, 24)
-
-
25. A device comprising:
-
a memory to store instructions; and a processor to execute the instructions to; identify a first set of attributes and a first set of values of a first string of words for a product associated with at least one document; identify a second set of attributes and a second set of values of the product based on the first set of attributes and the first set of values; associate the first set of attributes and the second set of attributes with the first set of values and the second set of values, the processor, when associating the first set of attributes and the second set of attributes with the first set of values and the second set of values, being to; associate the first set of attributes with the first set of values to provide one or more attribute-value pairs by; merging attributes, of the first set of attributes, having one or more first correlation values that satisfy a correlation threshold; and merging values, of the first set of values, having one or more second correlation values that satisfy the correlation threshold; and
associate the second set of attributes with the second set of values to provide one or more attribute-value pairs by;merging attributes, of the second set of attributes, having one or more first correlation values that satisfy the correlation threshold; and
merging values, of the second set of values, having one or more second correlation values that satisfy the correlation threshold;identify a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determine that a context associated with the first string of words is similar to a context associated with the second string of words; and label, based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value or an attribute using the association of the first set of attributes and the second set of attributes with the first set of values and the second set of values. - View Dependent Claims (26, 27)
-
-
28. A non-transitory computer-readable medium storing instructions, the instructions comprising:
-
one or more instructions which, when executed by at least one processor, cause the at least one processor to; identify a first set of attributes and a first set of values of a first string of words for a product using a first algorithm applied to at least one document; identify a second set of attributes and a second set of values of the product using a second algorithm applied to the at least one document and based on the first set of attributes and the first set of values, the first algorithm being different than the second algorithm; associate the first set of attributes and the second set of attributes with the first set of values and the second set of values, the one or more instructions to associate the first set of attributes and the second set of attributes with the first set of values and the second set of values including; one or more instructions to associate the first set of attributes with the first set of values to provide one or more attribute-value pairs by; merging attributes, of the first set of attributes, having one or more first correlation values that satisfy a correlation threshold; and
merging values, of the first set of values, having one or more second correlation values that satisfy the correlation threshold; andone or more instructions to associate the second set of attributes with the second set of values to provide one or more attribute-value pairs by; merging attributes, of the second set of attributes, having one or more first correlation values that satisfy the correlation threshold; and
merging values, of the second set of values, having one or more second correlation values that satisfy the correlation threshold;identify a second string of words, the second string of words being different than the first string of words, and the second string of words including a plurality of words that are included in the first string of words; determine that a context associated with the first string of words is similar to a context associated with the second string of words; and label, based on determining that the context associated with the first string of words is similar to the context associated with the second string of words, a first portion of the second string of words as a value or an attribute using the association of the first set of attributes and the second set of attributes with the first set of values and the second set of values. - View Dependent Claims (29, 30)
-
Specification