Deep learning approach to grammatical correction for incomplete parses
First Claim
Patent Images
1. A method, comprising:
- determining, by a parser executing on a processor, that a parse of an input string comprising a plurality of tokens is incomplete;
generating, based on a machine learning (ML) model;
(i) a plurality of candidate addition tokens for adding to the input string, and (ii) a plurality of candidate removal tokens for removing from the input string, comprising, for a first token of the plurality of tokens;
identifying a second token of the plurality of tokens, wherein the second token is immediately subsequent to the first token in the input string;
processing the first and second tokens using the ML model to generate a potential new token to be inserted between the first and second tokens without removing either the first or second token from the input string;
identifying a third token of the plurality of tokens, wherein the third token is immediately subsequent to the second token in the input string; and
processing the first and third tokens using the ML model to generate a potential removal token indicating a confidence that the second token should be removed from the input string;
selecting, from the plurality of candidate addition tokens and the plurality of candidate removal tokens, a first candidate token; and
modifying the input string based on the first candidate token to facilitate a complete parse of the modified input string by the parser.
1 Assignment
0 Petitions
Accused Products
Abstract
Performing an operation comprising determining that a parse of an input string comprising a plurality of tokens is incomplete, generating, based on a machine learning (ML) model: (i) a plurality of candidate addition tokens for adding to the input string, and (ii) a plurality of candidate removal tokens for removing from the input string, selecting, from the plurality of candidate addition tokens and the plurality of candidate removal tokens, a first candidate token, and modifying the input string based on the first candidate token to facilitate a complete parse of the modified input string by a parser.
25 Citations
20 Claims
-
1. A method, comprising:
-
determining, by a parser executing on a processor, that a parse of an input string comprising a plurality of tokens is incomplete; generating, based on a machine learning (ML) model;
(i) a plurality of candidate addition tokens for adding to the input string, and (ii) a plurality of candidate removal tokens for removing from the input string, comprising, for a first token of the plurality of tokens;identifying a second token of the plurality of tokens, wherein the second token is immediately subsequent to the first token in the input string; processing the first and second tokens using the ML model to generate a potential new token to be inserted between the first and second tokens without removing either the first or second token from the input string; identifying a third token of the plurality of tokens, wherein the third token is immediately subsequent to the second token in the input string; and processing the first and third tokens using the ML model to generate a potential removal token indicating a confidence that the second token should be removed from the input string; selecting, from the plurality of candidate addition tokens and the plurality of candidate removal tokens, a first candidate token; and modifying the input string based on the first candidate token to facilitate a complete parse of the modified input string by the parser. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product, comprising:
a non-transitory computer-readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor to perform an operation comprising; determining that a parse of an input string comprising a plurality of tokens is incomplete; generating, based on a machine learning (ML) model;
(i) a plurality of candidate addition tokens for adding to the input string, and (ii) a plurality of candidate removal tokens for removing from the input string comprising, for a first token of the plurality of tokens;identifying a second token of the plurality of tokens, wherein the second token is immediately subsequent to the first token in the input string; processing the first and second tokens using the ML model to generate a potential new token to be inserted between the first and second tokens without removing either the first or second token from the input string; identifying a third token of the plurality of tokens, wherein the third token is immediately subsequent to the second token in the input string; and processing the first and third tokens using the ML model to generate a potential removal token indicating a confidence that the second token should be removed from the input string; selecting, from the plurality of candidate addition tokens and the plurality of candidate removal tokens, a first candidate token; and modifying the input string based on the first candidate token to facilitate a complete parse of the modified input string by a parser. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A system, comprising:
-
a processor; and a memory storing one or more instructions which, when executed by the processor, performs an operation comprising; determining that a parse of an input string comprising a plurality of tokens is incomplete; generating, based on a machine learning (ML) model;
(i) a plurality of candidate addition tokens for adding to the input string, and (ii) a plurality of candidate removal tokens for removing from the input string comprising, for a first token of the plurality of tokens;identifying a second token of the plurality of tokens, wherein the second token is immediately subsequent to the first token in the input string; processing the first and second tokens using the ML model to generate a potential new token to be inserted between the first and second tokens without removing either the first or second token from the input string; identifying a third token of the plurality of tokens, wherein the third token is immediately subsequent to the second token in the input string; and processing the first and third tokens using the ML model to generate a potential removal token indicating a confidence that the second token should be removed from the input string; selecting, from the plurality of candidate addition tokens and the plurality of candidate removal tokens, a first candidate token; and modifying the input string based on the first candidate token to facilitate a complete parse of the modified input string by a parser. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification