System for extracting information from a natural language text
First Claim
1. A method implemented by computer of extracting information from a natural-language text of words comprising identifying patterns, wherein the words of the text are encoded by comparing them, using a processor, with the contents of a predefined lexicon containing less than 1000 tool words, said tools being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries, and in that nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, wherein the words of the text are encoded by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon of tool words, so as to identify the tool words in the text, the grammatical function of said tool words being predefined, and in that the grammatical functions of the other words, which are not recognized as being tool words, are deduced by comparing their locations relative to the words recognized as being tool words.
2 Assignments
0 Petitions
Accused Products
Abstract
In the method of extraction, the words of the text are encoded by comparing them with the contents of a lexicon of tool words (essentially articles, prepositions, conjunctions, and verbal auxiliaries), and nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules.
70 Citations
11 Claims
- 1. A method implemented by computer of extracting information from a natural-language text of words comprising identifying patterns, wherein the words of the text are encoded by comparing them, using a processor, with the contents of a predefined lexicon containing less than 1000 tool words, said tools being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries, and in that nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, wherein the words of the text are encoded by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon of tool words, so as to identify the tool words in the text, the grammatical function of said tool words being predefined, and in that the grammatical functions of the other words, which are not recognized as being tool words, are deduced by comparing their locations relative to the words recognized as being tool words.
-
4. A system for extracting information from a natural-language text, said system comprising:
-
an input unit for receiving said natural-language text; a lexicon file in which less than 1000 tool words with predefined grammatical functions are recorded, said tool words being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries; an analysis processor connected to said input unit, and to the lexicon file, and organized to act in a first stage to encode the words of the natural-language text by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon file of tool words, so as to identify the tool words in the text and so as to evaluate the functions of the other words which are not recognized as being tool words, by comparing their locations relative to the locations of the words recognized as being tool words, and, in a second stage, to search subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, so as to identify nominal groups; and an output unit connected to said analysis processor for receiving the groups of encoded words recognized as being syntactical patterns. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
Specification