×

Lean parsing: a natural language processing system and method for parsing domain-specific languages

  • US 10,579,721 B2
  • Filed: 09/22/2017
  • Issued: 03/03/2020
  • Est. Priority Date: 07/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method, comprising:

  • receiving electronic textual data relating to a form for which one or more machine-executable form field functions needs to be determined, the electronic textual data including natural language instructions relating to the determination of one or more form field values of the form;

    analyzing the electronic textual data to determine sentence data representing a plurality of separate sentences of the electronic textual data;

    separating the electronic textual data into a data array formed of the sentence data of the determined plurality of separate sentences;

    for each given sentence of sentence data representing sentences in the data array;

    isolating segment data of one or more segments of the sentence data while relating each resulting segment to prior and succeeding segments of the sentence data, further storing the isolated segment data in one or more segment data memory locations organized to retain structure and relation of one segment to another;

    for each segment of the segment data;

    classifying segment data of each segment as being of a segment type of a plurality of possible segment types, discarding segment data classified as being of one or more particular predetermined segment types; and

    parsing each segment data according to one or more predetermined lexicons and determining whether the segment contains one or more operators, an operator being a natural language token representing an operation that may be performed on data;

    upon determining that the segment data representing the segment contains operator data representing one or more operators;

    identifying all operators in the segment data representing the segment;

    identifying dependency data representing one or more dependencies of the segment data associated with each identified operator;

    discarding any tokens not identified as either an operator or a dependency; and

    applying one or more operator-specific rules to each identified operator of the segment data to determine a first predicate structure equivalent to the original natural language text of the segment; and

    upon determining that the segment data representing the segment does not contain operator data representing one or more operators;

    identifying each single or multiword token in the segment data that is a predetermined token of the domain;

    determining any remaining tokens of the segment that are not predetermined tokens of the domain and map the identified tokens and the remaining tokens to one or more predetermined rules, resulting in a first predicate structure for the segment data of the segment being analyzed;

    mapping one or more of the first predicate structures to one or more predetermined machine-executable functions; and

    implementing at least one of the mapped machine-executable functions in an electronic document preparation system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×