Composing text and structured databases
First Claim
1. A system for linking information from at least two data sources, the system comprising:
- a first data source comprising a plurality of documents comprising text pertaining to at least one object;
a second data source comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object;
a processor for determining one or more traits for each object and for associating at least one record in the second data source with the at least one document from the first data source that refers to each object, wherein each trait is instance-based and comprises at least one characteristic that serves as a proxy for identifying each object from all other objects in the plurality of documents, and wherein at least one of the one or more traits has a different number of characteristics than another trait; and
wherein the system determines that an association of a first trait to a first record is correct if a first text from a first document pertains to either the first record or to an accessory of the product represented by the first record.
2 Assignments
0 Petitions
Accused Products
Abstract
A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.
-
Citations
19 Claims
-
1. A system for linking information from at least two data sources, the system comprising:
-
a first data source comprising a plurality of documents comprising text pertaining to at least one object; a second data source comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object; a processor for determining one or more traits for each object and for associating at least one record in the second data source with the at least one document from the first data source that refers to each object, wherein each trait is instance-based and comprises at least one characteristic that serves as a proxy for identifying each object from all other objects in the plurality of documents, and wherein at least one of the one or more traits has a different number of characteristics than another trait; and wherein the system determines that an association of a first trait to a first record is correct if a first text from a first document pertains to either the first record or to an accessory of the product represented by the first record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for linking information from at least two data sources, the method comprising:
-
describing a plurality of objects using a set of characteristics as a plurality of traits, wherein each trait in the plurality of traits is instance-based and serves as a proxy to identify each object from other objects in the plurality of objects, and wherein at least one of the plurality of traits has a different number of characteristics than another trait; using at least one additional characteristic that does not comprise the set of characteristics to score a relevancy of a document to a record from a first data source that shares at least one of the characteristics of another document from a second data source, wherein the additional characteristic is from the document; and determining that an association of a first trait to a first record is correct if a first text from a first document pertains to either the first record or to an accessory of the product represented by the first record. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method comprising:
-
mapping a plurality of documents in a first data source to a plurality of traits in a second data source, wherein each trait in the plurality of traits comprises one or more characteristics, is instance-based, and serves comprises as a proxy for uniquely identifying each object from among a plurality of objects, and wherein at least one of the plurality of traits has a different number of characteristics than another trait; scoring the relevancy of a document from among the plurality of documents to at least one record from the second data source; and determining that an association of a first trait to a first record is correct if a first text from a first document pertains to either the first record or to an accessory of the product represented by the first record. - View Dependent Claims (16, 17, 18, 19)
-
Specification