NATURAL LANGUAGE PROCESSING-ASSISTED EXTRACT, TRANSFORM, AND LOAD TECHNIQUES
First Claim
1. A method for mapping fields of an input document structured according to a first format, the method comprising:
- identifying a plurality of first fields in the input document, wherein each first field includes a descriptor and text content associated with the descriptor; and
for each first field;
evaluating, via one or more natural language processing techniques, semantic properties of the descriptor and the text content against a plurality of mapping rules to determine whether the first field is consistent with one of a plurality of second fields in a target format, wherein each mapping rule specifies characteristics associated with one of the second fields in the target format, andupon determining that the first field is consistent with one of the second fields, defining a mapping from the first field to the second field determined to be consistent with the first field.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments presented herein disclose techniques for transforming input documents having disparate formats into a normalized format (e.g., Atom, RSS, HTML, customized XML, etc.). According to one embodiment, a plurality of fields is identified in an input document that has a given format. Each field includes a descriptor and text content associated with the descriptor. For each field, semantic properties are evaluated for the descriptor and text content against a plurality of mapping rules to determine whether the field is consistent with one of a plurality of fields of a target format. Each mapping rule specifies characteristics associated with one of the fields in the target format. Once so determined, a mapping from the first field to the second field is defined.
30 Citations
7 Claims
-
1. A method for mapping fields of an input document structured according to a first format, the method comprising:
-
identifying a plurality of first fields in the input document, wherein each first field includes a descriptor and text content associated with the descriptor; and for each first field; evaluating, via one or more natural language processing techniques, semantic properties of the descriptor and the text content against a plurality of mapping rules to determine whether the first field is consistent with one of a plurality of second fields in a target format, wherein each mapping rule specifies characteristics associated with one of the second fields in the target format, and upon determining that the first field is consistent with one of the second fields, defining a mapping from the first field to the second field determined to be consistent with the first field. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification