×

Modeling and extracting elements in semi-structured documents

  • US 10,114,906 B1
  • Filed: 07/31/2015
  • Issued: 10/30/2018
  • Est. Priority Date: 07/31/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for extracting a set of data elements from a representation of a semi-structured document based on a physics model describing the semi-structured document, the method comprising:

  • obtaining, by a computer system, a physics model of the semi-structured document, wherein the physics model comprises a set of relationships represented by physical objects connecting different data elements in the semi-structured document that describe relative positions of each data element in the set of data elements in the semi-structured document, wherein the physical objects describe a relationship between a first data element and a second data element in the semi-structured document as a function of a distance range between the first and second data elements and at least one of;

    an orientation relationship of the first and second data elements, oran angular displacement range between the first and second data elements, andwherein obtaining the physics model of the semi-structured document comprises;

    requesting, from a user via a graphical user interface (GUI) having one or more area selection tools for selecting data elements in the semi-structured document and one or more connector tools for defining relationships between data elements in the semi-structured document, information identifying a relationship between the first and second data elements in the semi-structured document; and

    obtaining, through the GUI, the set of relationships among the relative positions of the set of data elements in the semi-structured document from the user by receiving information defining the first and second data elements and information identifying a physical construct representing a relationship between the first and second data elements, wherein the physical construct comprises one or more of;

    a flexible component defining the distance range between the first and second elements as a function of a weight describing an amount of compression or stretching between the first and second elements, ora rigid component defining the distance range between the first and second elements as a substantially fixed distance between the first and second elements;

    extracting, by the computer systems, a set of data from the representation of the semi-structured document based on the physical model, wherein the representation of the semi-structured document comprises an electronic file, and wherein extracting the set of data comprises;

    identifying a probable range of positions in the representation of the semi-structured document at which the second data element is located, relative to a position of the first data element in the representation of the semi-structured document, based on the physical object connecting the first and second data elements in the physics model;

    identifying, within the probable range of positions, a position of the second data element in the representation of the semi-structured document;

    extracting data associated with the second data element from information located at the identified position in the representation of the semi-structured document;

    identifying, based on the identified position of the second data element and physical objects in the model describing relationships between the second data element and a plurality of other data elements, position information associated with the plurality of other data elements in the representation of the semi-structured document; and

    extracting data associated with each of the plurality of other data elements from information located, in the representation of the semi-structured document, at the position information associated with each of the plurality of other data elements; and

    upon extracting the set of data from the representation, inputting, by the computer systems, the extracted set of data to one or more applications without requiring manual input of the data into the one or more applications.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×