×

Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process

  • US 10,120,844 B2
  • Filed: 10/23/2014
  • Issued: 11/06/2018
  • Est. Priority Date: 10/23/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • a processor; and

    a memory storing a program, which, when executed on the processor, performs an operation for mapping fields of an input document structured according to a first format, the operation comprising;

    identifying a plurality of first fields in the input document, wherein each first field includes an input descriptor and a text content value included with the input descriptor;

    identifying a plurality of mapping rules wherein each mapping rule specifies characteristics associated with a target field in a target format, wherein the characteristics comprise a target descriptor and a lexical answer type identifying lexical traits to locate in the plurality of first fields of the input document;

    for each first field;

    evaluating, via one or more natural language processing techniques, semantic properties of the input descriptor against the plurality of mapping rules to determine whether the input descriptor is consistent with one of the target fields;

    evaluating, via one or more natural language processing techniques, semantic properties of the text content value against the plurality of mapping rules to determine whether the text content is consistent with one of the target fields based on the lexical answer type associated with the target field, and wherein evaluating further comprises;

    determining, for each mapping rule, a descriptor score associated with the input descriptor and a content score associated with the text content value, the descriptor score and the content score indicating a likelihood that the respective input descriptor and text content value match the characteristics specified in the mapping rule; and

    converging the descriptor score and the content score into a consolidated score based on a weighting between the descriptor score and the content score specified by the associated mapping rule; and

    determining, based on evaluating the semantic properties of the input descriptor and the text content against the plurality of mapping rules, that the first field corresponds to a target field; and

    upon determining that the first field is corresponds to the target field, defining a mapping from the first field to the corresponding target field.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×