Processing of String Inputs Utilizing Machine Learning
First Claim
1. A computer system comprising:
- a processing unit in communication with a memory;
a functional unit in communication with the processing unit having tools for natural language processing, the tools to;
determine optimal sentence boundary placement within a received string input comprising;
identify two or more preliminary sentence boundaries within the input;
identify two or more potential sentences within the input utilizing the two or more preliminary sentence boundaries;
assign a first score to each identified potential sentence wherein the assigned first score corresponds to a probability of the potential sentence being an actual sentence;
selectively compare each identified potential sentences based on a relationship to the assigned first score to a first potential sentence in a first category and a second potential sentence in a second category;
selectively categorize each compared potential sentence based on the comparison into one of the first and second categories; and
transform the input into a sentence optimized output including modify at least one potential sentence utilizing the input, categorization, and at least one preliminary sentence boundary.
1 Assignment
0 Petitions
Accused Products
Abstract
Natural language processing of raw text data for optimal sentence boundary placement. Raw text is extracted from a document and subject to cleaning. The extracted raw text is examined to identify preliminary sentence boundaries, which are used to identify potential sentences in the raw text. One or more potential sentences are assigned a well-formedness score. A value of the score correlates to whether the potential sentence is a truncated/ill-formed sentence or a well-formed sentence. One or more preliminary sentence boundaries are optimized depending on the value of the score of the potential sentence(s). Accordingly, the processing herein is an optimization that creates a sentence boundary optimized output.
16 Citations
18 Claims
-
1. A computer system comprising:
-
a processing unit in communication with a memory; a functional unit in communication with the processing unit having tools for natural language processing, the tools to; determine optimal sentence boundary placement within a received string input comprising; identify two or more preliminary sentence boundaries within the input; identify two or more potential sentences within the input utilizing the two or more preliminary sentence boundaries; assign a first score to each identified potential sentence wherein the assigned first score corresponds to a probability of the potential sentence being an actual sentence; selectively compare each identified potential sentences based on a relationship to the assigned first score to a first potential sentence in a first category and a second potential sentence in a second category; selectively categorize each compared potential sentence based on the comparison into one of the first and second categories; and transform the input into a sentence optimized output including modify at least one potential sentence utilizing the input, categorization, and at least one preliminary sentence boundary. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for natural language processing, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by a processor to:
-
determine optimal sentence boundary placement within a received string input comprising; identify two or more preliminary sentence boundaries within the input; identify two or more potential sentences within the input utilizing the two or more preliminary sentence boundaries; assign a first score to each identified potential sentence wherein the assigned first score corresponds to a probability of the potential sentence being an actual sentence; selectively compare each identified potential sentences based on a relationship to the assigned first score to a first potential sentence in a first category and a second potential sentence in a second category; selectively categorize each compared potential sentence based on the comparison into one of the first and second categories; and transform the input into a sentence optimized output including modify at least one potential sentence utilizing the input, categorization, and at least one preliminary sentence boundary. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for natural language processing comprising:
-
determining optimal sentence boundary placement within a received string input comprising; identifying two or more preliminary sentence boundaries within the input; identifying two or more potential sentences within the input utilizing the two or more preliminary sentence boundaries; assigning a first score to each identified potential sentence wherein the assigned first score corresponds to a probability of the potential sentence being an actual sentence; selectively comparing each identified potential sentences based on a relationship to the assigned first score to a first potential sentence in a first category and a second potential sentence in a second category; selectively categorizing each compared potential sentence based on the comparison into one of the first and second categories; and transforming the input into a sentence optimized output including modifying at least one potential sentence utilizing the input, categorization, and at least one preliminary sentence boundary. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification