×

Preprocessing of string inputs in natural language processing

  • US 10,372,816 B2
  • Filed: 12/13/2016
  • Issued: 08/06/2019
  • Est. Priority Date: 12/13/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer system comprising:

  • a processing unit in communication with a memory; and

    a functional unit in communication with the processing unit having a tool for natural language processing, the tool to;

    determine optimal sentence boundary placement with a received string input comprising;

    identify two or more preliminary sentence boundaries within the input;

    identify two or more first potential sentences within the input utilizing the two or more preliminary sentence boundaries;

    assign a first score to each first potential sentence, wherein each assigned first score corresponds to a probability of each potential sentence of the two or more first potential sentences being an actual sentence; and

    selectively identify a grouping comprising at least two adjacent potential sentences based on a relationship to the assigned first scores;

    categorize each of the two adjacent sentences as one of ill-formed prose (IFP) and semi-structure entity constructs (SSECs);

    upon determining there are IFPs for further processing;

    merge the at least two adjacent first potential sentences to create a second potential sentence; and

    iteratively assign a second score to the created second potential sentence and merge at least one additional sentence adjacent to the created second potential sentence until there are no further IFPs to process, any SSECs are normalized, and a sentence boundary optimized output is created as a function of the iteratively assigned second score; and

    output the sentence boundary optimized output to replace the adjacent first and second potential sentences.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×