System for extracting information from a natural language text

US 8,170,867 B2
Filed: 07/18/2003
Issued: 05/01/2012
Est. Priority Date: 07/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method implemented by computer of extracting information from a natural-language text of words comprising identifying patterns, wherein the words of the text are encoded by comparing them, using a processor, with the contents of a predefined lexicon containing less than 1000 tool words, said tools being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries, and in that nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, wherein the words of the text are encoded by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon of tool words, so as to identify the tool words in the text, the grammatical function of said tool words being predefined, and in that the grammatical functions of the other words, which are not recognized as being tool words, are deduced by comparing their locations relative to the words recognized as being tool words.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the method of extraction, the words of the text are encoded by comparing them with the contents of a lexicon of tool words (essentially articles, prepositions, conjunctions, and verbal auxiliaries), and nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules.

70 Citations

View as Search Results

11 Claims

1. A method implemented by computer of extracting information from a natural-language text of words comprising identifying patterns, wherein the words of the text are encoded by comparing them, using a processor, with the contents of a predefined lexicon containing less than 1000 tool words, said tools being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries, and in that nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, wherein the words of the text are encoded by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon of tool words, so as to identify the tool words in the text, the grammatical function of said tool words being predefined, and in that the grammatical functions of the other words, which are not recognized as being tool words, are deduced by comparing their locations relative to the words recognized as being tool words.
- View Dependent Claims (2, 3)
- - 2. A method according to claim 1, wherein the identified nominal groups are then evaluated so as to keep only those groups which are perceived as being the most important, by using predefined evaluation criteria.
  - 3. A method according to claim 1, wherein the identified nominal groups are then evaluated so as to keep only those groups which are perceived as being the most important, by using predefined evaluation criteria.

4. A system for extracting information from a natural-language text, said system comprising:
- an input unit for receiving said natural-language text;
  
  a lexicon file in which less than 1000 tool words with predefined grammatical functions are recorded, said tool words being essentially constituted by articles, prepositions, conjunctions and verbal auxiliaries;
  
  an analysis processor connected to said input unit, and to the lexicon file, and organized to act in a first stage to encode the words of the natural-language text by evaluating the grammatical function of each word by comparing each word with the contents of said lexicon file of tool words, so as to identify the tool words in the text and so as to evaluate the functions of the other words which are not recognized as being tool words, by comparing their locations relative to the locations of the words recognized as being tool words, and, in a second stage, to search subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules, so as to identify nominal groups; and
  
  an output unit connected to said analysis processor for receiving the groups of encoded words recognized as being syntactical patterns.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
- - 5. A system according to claim 4, wherein the analysis processor further comprises means for evaluating the importance of the kept groups of encoded words in order to keep only those groups which are perceived as being the most important.
  - 6. A system according to claim 5, wherein the analysis processor further comprises means for recognizing the language of the text received at the input unit.
  - 7. A system according to claim 6, wherein the analysis processor further comprises means for regularizing the text received at the input unit so as to remove the amalgams of signs.
  - 8. A system according to claim 5, wherein the analysis processor further comprises means for regularizing the text received at the input unit so as to remove the amalgams of signs.
  - 9. A system according to claim 4, wherein the analysis processor further comprises means for recognizing the language of the text received at the input unit.
  - 10. A system according to claim 9, wherein the analysis processor further comprises means for regularizing the text received at the input unit so as to remove the amalgams of signs.
  - 11. A system according to claim 4, wherein the analysis processor further comprises means for regularizing the text received at the input unit so as to remove the amalgams of signs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alain Beauvieux, Eric Fourboul, Go-Albert France
Original Assignee
Alain Beauvieux, Eric Fourboul, Go-Albert France
Inventors
Germain, Nicolas
Primary Examiner(s)
Hudspeth, David R
Assistant Examiner(s)
SPOONER, LAMONT M

Application Number

US10/524,624
Publication Number

US 20110099001A1
Time in Patent Office

3,210 Days
Field of Search

704/1, 704 8- 10, 715/255, 715/264
US Class Current

704/9
CPC Class Codes

G06F 40/253 Grammatical analysis; Style...

System for extracting information from a natural language text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

70 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

System for extracting information from a natural language text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

70 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links