Reference resolution for text enrichment and normalization in mining mixed data
First Claim
1. a method for enrichment of text comprising:
- generating a model for structured data in a database, comprising associating referents in the database with designating terms which each describe an object, the referents each comprising or referring to one of the objects;
identifying a plurality of candidate referring entities in textual data in the database;
for each candidate referring entity, computing a similarity measure which includes comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer; and
enriching the textual data with information derived from the objects.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for enrichment of text which enables mixed data mining includes generating a model for structured data found in tables of a database. In the model, semantically-linked terms are associated with referents, such as field names or cell content of the fields, of the structured data. The referents may be a business object or refer to a business object. A plurality of candidate referring entities in textual data in the database, such as chunks of free text, is identified. For each candidate referring entity, a similarity measure between the candidate referring entity in the textual data and the model is computed to identify referring entities of the candidate referring entities and corresponding business objects/referents to which the referring entities refer. The textual data is enriched with information derived from the business objects.
101 Citations
23 Claims
-
1. a method for enrichment of text comprising:
-
generating a model for structured data in a database, comprising associating referents in the database with designating terms which each describe an object, the referents each comprising or referring to one of the objects; identifying a plurality of candidate referring entities in textual data in the database; for each candidate referring entity, computing a similarity measure which includes comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer; and enriching the textual data with information derived from the objects. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
a database comprising records stored in memory which include records each comprising structured data arranged in fields of structured data and textual data in fields of textual data, at least some of the fields of structured data comprising referents which refer to business objects; a processor which annotates the textual data with annotations which identify business objects referred to by the referents of the structured data by computing a similarity measure between textual chunks of the textual data and a business model associated with the structured data.
-
-
22. A method comprising:
-
associating designating terms with referents in fields of a database table, the database table comprising a plurality of records for which the fields include structured data, the referents each comprising or referring to a business object; identifying candidate referring entities in portions of textual data, the textual data portions being in textual data fields of the database table or linked thereto whereby a textual data portion is associated with fewer than all of the plurality of records; for each of a plurality of candidate referring entities; computing a similarity measure between the candidate referring entity and the designating terms associated with the referents of the same record of the database table; and where the computed similarity measure exceeds a threshold, enriching the textual data with information derived from the business object for the referent. - View Dependent Claims (23)
-
Specification