×

Reference resolution for text enrichment and normalization in mining mixed data

  • US 8,595,245 B2
  • Filed: 07/26/2006
  • Issued: 11/26/2013
  • Est. Priority Date: 07/26/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for enrichment of text comprising:

  • providing a database in which defined hierarchical relationships exist between different parts of the data, the database having a structured part and an unstructured part, the database including a set of fields and a set of records, each of the fields being distinguished as comprising either structured data or unstructured data, the structured data fields having a predefined relationship to the structured data of each field, the structured part of the database including the structured data in structured data fields of records, and the unstructured part including unstructured data comprising textual data for unstructured data fields of the records, whereby some of records include both structured data in structured data fields and textual data for unstructured data fields;

    after providing the database, generating a model for the structured data in the structured data fields of the structured part of the database, the generating comprising associating referents in the database with designating terms which each describe a business object, the referents each comprising or referring to one of the business objects;

    identifying a plurality of candidate referring entities in the textual data of the unstructured data fields of the unstructured part of the provided database;

    for each candidate referring entity, computing a similarity measure which includes;

    comparing the candidate referring entity in the textual data with the model to identify referring entities of the candidate referring entities and corresponding objects to which the referring entities refer, andcomparing the candidate referring entity in the textual data with the structured data of the same record; and

    based on the computed similarity measure, enriching the textual data for the unstructured data fields with information derived from the business objects for that record, the enrichment including annotating a free text entry in the database with information relating to a business object or referent.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×