Reference resolution for text enrichment and normalization in mining mixed data
First Claim
1. A method for enrichment of text comprising:
- providing a database in which defined hierarchical relationships exist between different parts of the data, the database having a structured part and an unstructured part, the database including a set of fields and a set of records, each of the fields being distinguished as comprising either structured data or unstructured data, the structured data fields having a predefined relationship to the structured data of each field, the structured part of the database including the structured data in structured data fields of records, and the unstructured part including unstructured data comprising textual data for unstructured data fields of the records, whereby some of records include both structured data in structured data fields and textual data for unstructured data fields;
after providing the database, generating a model for the structured data in the structured data fields of the structured part of the database, the generating comprising associating referents in the database with designating terms which each describe a business object, the referents each comprising or referring to one of the business objects;
identifying a plurality of candidate referring entities in the textual data of the unstructured data fields of the unstructured part of the provided database;
for each candidate referring entity, computing a similarity measure which includes;
comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer, andcomparing the candidate referring entity in the textual data with the structured data of the same record; and
based on the computed similarity measure, enriching the textual data for the unstructured data fields with information derived from the business objects for that record, the enrichment including annotating a free text entry in the database with information relating to a business object or referent.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for enrichment of text which enables mixed data mining includes generating a model for structured data found in tables of a database. In the model, semantically-linked terms are associated with referents, such as field names or cell content of the fields, of the structured data. The referents may be a business object or refer to a business object. A plurality of candidate referring entities in textual data in the database, such as chunks of free text, is identified. For each candidate referring entity, a similarity measure between the candidate referring entity in the textual data and the model is computed to identify referring entities of the candidate referring entities and corresponding business objects/referents to which the referring entities refer. The textual data is enriched with information derived from the business objects.
-
Citations
22 Claims
-
1. A method for enrichment of text comprising:
-
providing a database in which defined hierarchical relationships exist between different parts of the data, the database having a structured part and an unstructured part, the database including a set of fields and a set of records, each of the fields being distinguished as comprising either structured data or unstructured data, the structured data fields having a predefined relationship to the structured data of each field, the structured part of the database including the structured data in structured data fields of records, and the unstructured part including unstructured data comprising textual data for unstructured data fields of the records, whereby some of records include both structured data in structured data fields and textual data for unstructured data fields; after providing the database, generating a model for the structured data in the structured data fields of the structured part of the database, the generating comprising associating referents in the database with designating terms which each describe a business object, the referents each comprising or referring to one of the business objects; identifying a plurality of candidate referring entities in the textual data of the unstructured data fields of the unstructured part of the provided database; for each candidate referring entity, computing a similarity measure which includes; comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer, and comparing the candidate referring entity in the textual data with the structured data of the same record; and based on the computed similarity measure, enriching the textual data for the unstructured data fields with information derived from the business objects for that record, the enrichment including annotating a free text entry in the database with information relating to a business object or referent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system for enrichment of text comprising:
-
a database in which defined hierarchical relationships exist between different parts of the data, the database having a structured part and an unstructured part, the database including a set of fields and a set of records, each of the fields being distinguished as comprising either structured data or unstructured data, the structured data fields having a predefined relationship to the structured data of each field, the structured part of the database including the structured data in structured data fields of records, and the unstructured part including unstructured data comprising textual data for unstructured data fields of the records, whereby some of records include both structured data in structured data fields and textual data for unstructured data fields; a model for structured data in structured data fields of the structured part of the database, the model associating referents in the database with designating terms which each describe a business object, the referents each comprising or referring to one of the business objects, the model having been generated after providing the database; and a processor which; identifies a plurality of candidate referring entities in the textual data of the unstructured data fields of the unstructured part of the database, for each candidate referring entity, computes a similarity measure which includes; comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer, and comparing the candidate referring entity in the textual data with the structured data of the same record; and based on the computed similarity measure, enriches the textual data for the unstructured data fields with information derived from the business objects for that record, the enrichment including annotating a free text entry in the database with information relating to a business object or referent.
-
Specification