Systems and methods for natural language processing including morphological analysis, lemmatizing, spell checking and grammar checking
First Claim
Patent Images
1. A system comprising at least one computer configured to form:
- a linguistic knowledgebase (LKB) for a natural language, the LKB comprising a set of computer-readable lexicon declarations, a set of computer-readable inflected form declarations, and a set of computer-readable syntax rule declarations;
a computer-implemented word retriever connected to the LKB and configured to;
receive a first word,perform a lookup of an inflected form declaration of the first word in the LKB, in response to performing the lookup of the inflected form declaration, perform a lookup of a lexicon declaration of the first word in the LKB,determine a first word interpretation of the first word according to the lexicon declaration and the inflected form declaration, the first word interpretation comprising a lemma of the first word and an inflection indicator of the first word;
a computer-implemented form generator connected to the word retriever and configured to;
receive a second word not necessarily distinct from the first word,produce a first set of words, each word of the first set of words having a predetermined spelling similarity to the second word, andfor each word of the first set of words, receive from the word retriever a second word interpretation of said each word of the first set of words;
a computer-implemented synthetic annotator connected to the word retriever and configured to;
receive a word sequence,for each word of the word sequence, receive from the word retriever a third word interpretation of said each word of the word sequence, anddetermine a synthetic annotation of the word sequence, the synthetic annotation comprising the third word interpretation of said each word of the word sequence; and
a computer-implemented syntax checker connected to the synthetic annotator and configured to;
receive the synthetic annotation from the synthetic annotator,perform a lookup of a syntax rule declaration of the word sequence in the LKB according to the synthetic annotation, andperform a syntactic analysis of the word sequence according to the syntax rule declaration, to determine a synthetic dependency tree of the word sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, a linguistic application exploits a linguistic knowledgebase (LKB) including, among others, lexicon data, inflection form data, and syntax data for a natural language such as English or Romanian. The application employs a set modules including a word retriever, a form generator, and a syntax checker, which are interconnected to perform a number of higher-level text-processing operations such as synthetic and analytic annotation, lemmatizing, spell checking, and grammar checking.
-
Citations
19 Claims
-
1. A system comprising at least one computer configured to form:
-
a linguistic knowledgebase (LKB) for a natural language, the LKB comprising a set of computer-readable lexicon declarations, a set of computer-readable inflected form declarations, and a set of computer-readable syntax rule declarations; a computer-implemented word retriever connected to the LKB and configured to; receive a first word, perform a lookup of an inflected form declaration of the first word in the LKB, in response to performing the lookup of the inflected form declaration, perform a lookup of a lexicon declaration of the first word in the LKB, determine a first word interpretation of the first word according to the lexicon declaration and the inflected form declaration, the first word interpretation comprising a lemma of the first word and an inflection indicator of the first word; a computer-implemented form generator connected to the word retriever and configured to; receive a second word not necessarily distinct from the first word, produce a first set of words, each word of the first set of words having a predetermined spelling similarity to the second word, and for each word of the first set of words, receive from the word retriever a second word interpretation of said each word of the first set of words; a computer-implemented synthetic annotator connected to the word retriever and configured to; receive a word sequence, for each word of the word sequence, receive from the word retriever a third word interpretation of said each word of the word sequence, and determine a synthetic annotation of the word sequence, the synthetic annotation comprising the third word interpretation of said each word of the word sequence; and a computer-implemented syntax checker connected to the synthetic annotator and configured to; receive the synthetic annotation from the synthetic annotator, perform a lookup of a syntax rule declaration of the word sequence in the LKB according to the synthetic annotation, and perform a syntactic analysis of the word sequence according to the syntax rule declaration, to determine a synthetic dependency tree of the word sequence. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
-
employing a computer-implemented word retriever to; receive a first word, perform a lookup of an inflected form declaration of the first word in a linguistic knowledgebase (LKB) of a natural language, the LKB comprising a set of computer-readable lexicon declarations, a set of computer-readable inflected form declarations, and a set of computer-readable syntax rule declarations, in response to performing the lookup of the inflected form declaration, perform a lookup of a lexicon declaration of the first word in the LKB, determine a first word interpretation of the first word according to the lexicon declaration and the inflected form declaration, the first word interpretation comprising a lemma of the first word and an inflection indicator of the first word; employing a computer-implemented form generator connected to the word retriever to; receive a second word not necessarily distinct from the first word, produce a first set of words, each word of the first set of words having a predetermined spelling similarity to the second word, and for each word of the first set of words, receive from the word retriever a second word interpretation of said each word of the first set of words; employing a computer-implemented synthetic annotator connected to the word retriever to receive a word sequence, for each word of the word sequence, receive from the word retriever a third word interpretation of said each word of the word sequence, and determine a synthetic annotation of the word sequence, the synthetic annotation comprising the third word interpretation of said each word of the word sequence; and employing a computer-implemented syntax checker connected to the synthetic annotator and configured to receive the synthetic annotation from the synthetic annotator, perform a lookup of a syntax rule declaration of the word sequence in the LKB according to the synthetic annotation, and perform a syntactic analysis of the word sequence according to the syntax rule declaration, to determine a synthetic dependency tree of the word sequence. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium encoding instructions which, when executed by a computer system comprising at least one computer, cause the computer system to form:
-
a linguistic knowledgebase (LKB) for a natural language, the LKB comprising a set of computer-readable lexicon declarations, a set of computer-readable inflected form declarations, and a set of computer-readable syntax rule declarations; a computer-implemented word retriever connected to the LKB and configured to; receive a first word, perform a lookup of an inflected form declaration of the first word in the LKB, in response to performing the lookup of the inflected form declaration, perform a lookup of a lexicon declaration of the first word in the LKB, determine a first word interpretation of the first word according to the lexicon declaration and the inflected form declaration, the first word interpretation comprising a lemma of the first word and an inflection indicator of the first word; a computer-implemented form generator connected to the word retriever and configured to; receive a second word not necessarily distinct from the first word, produce a first set of words, each word of the first set of words being a spelling variant of the second word having a predetermined spelling similarity to the second word, and for each word of the first set of words, receive from the word retriever a second word interpretation of said each word of the first set of words; a computer-implemented synthetic annotator connected to the word retriever and configured to; receive a word sequence, for each word of the word sequence, receive from the word retriever a third word interpretation of said each word of the word sequence, and determine a synthetic annotation of the word sequence, the synthetic annotation comprising the third word interpretation of said each word of the word sequence; and a computer-implemented syntax checker connected to the synthetic annotator and configured to; receive the synthetic annotation from the synthetic annotator, perform a lookup of a syntax rule declaration of the word sequence in the LKB according to the synthetic annotation, and perform a syntactic analysis of the word sequence according to the syntax rule declaration, to determine a synthetic dependency tree of the word sequence. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification