Device for retrieving data from a knowledge-based text

US 20040073874A1
Filed: 08/14/2003
Published: 04/15/2004
Est. Priority Date: 02/20/2001
Status: Abandoned Application

First Claim

Patent Images

1. A device for extracting information from a text (10) comprising an extraction module (20) and a learning module (30) cooperating with each other comprising means (212) for automatically selecting in the text (10) the contexts of instance of classes/entities of information to be extracted, for automatically selecting from these contexts those which are relevant for a domain and for enabling the user to modify this latter selection in a manner such that the learning module (30) will improve the next output (70, 80) of the extraction module (20), characterized in that the extraction module (20) additionally comprises means (213) for identifying relations existing in the text (10) between the relevant entities at the output of the means (212).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a device and a method for extracting information from an unstructured text, said information including relevant instances of classes/entities searched for by the user and relations between these classes/entities. The device and method improve in a semi-automatic manner on a given domain. The transition from one domain to a new domain is also highly facilitated by the device and method of the invention.

Citations

18 Claims

1. A device for extracting information from a text (10) comprising an extraction module (20) and a learning module (30) cooperating with each other comprising means (212) for automatically selecting in the text (10) the contexts of instance of classes/entities of information to be extracted, for automatically selecting from these contexts those which are relevant for a domain and for enabling the user to modify this latter selection in a manner such that the learning module (30) will improve the next output (70, 80) of the extraction module (20), characterized in that the extraction module (20) additionally comprises means (213) for identifying relations existing in the text (10) between the relevant entities at the output of the means (212).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The information extraction device as claimed in claim 1, characterized in that the selection module (20) comprises a program (211) able to recognize the structure of the text (10).
  - 3. The information extraction device as claimed in claim 1 or claim 2, characterized in that the selection module (20) simultaneously applies rules defined a priori and rules calculated by the learning module
  - 4. The information extraction device as claimed in one of the preceding claims, characterized in that the selection module (20) is able to automatically apply similarity rules inferred from the context.
  - 5. The information extraction device as claimed in one of the preceding claims, characterized in that the learning module (30) and the selection module (20) are able to manage homonyms belonging to different classes/entities.
  - 6. The information extraction device as claimed in one of the preceding claims, characterized in that the learning module (30) is capable of not generating new rules from non-essential elements.
  - 7. The information extraction device as claimed in one of the preceding claims, characterized in that the learning module (30) is able to generate new rules from positive selections and from negative selections made by the user.
  - 8. The information extraction device as claimed in one of the preceding claims, characterized in that the outputs of the selection module can be arranged in a file or a database.
  - 9. The information extraction device as claimed in one of the preceding claims, characterized in that the vocabulary and grammar of the domain are represented by finite state machines.
  - 10. The information extraction device as claimed in the preceding claim, characterized in that the finite state machines are represented in the form of graphs to the user.

11. A method for extracting information from a text (10) comprising a learning process (2000) and a selection process (1000), said selection process comprising a step (1100) of automatic selection in the text of contexts of instance of classes/entities of the information to be extracted, a step (1110) of automatic selection from these contexts of those which are relevant for a domain and a step (1130) of modification by the user of outputs of the previous step, the modified outputs being taken into account in the learning process (2000) to improve the next result of the selection process (1000), characterized in that the selection process (1000) additionally comprises steps (1310, 1320, 1330) to identify the relations existing in the text (10) between the relevant entities at the output of the steps (1120, 1130) of the selection process (1000).
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The information extraction method as claimed in claim 11, characterized in that the selection process (1000) comprises a step for recognizing the structure of the text (10).
  - 13. The information extraction method as claimed in claim 11 or claim 12, characterized in that the selection process (1000) simultaneously applies rules defined a priori and rules calculated by the learning module (30).
  - 14. The information extraction method as claimed in one of claims 11 to 13, characterized in that the selection process (1000) can include the automatic application of similarity rules inferred from the context.
  - 15. The information extraction method as claimed in one of claims 11 to 14, characterized in that the learning process (2000) and the selection process (1000) enable the management of homonyms belonging to different classes.
  - 16. The information extraction method as claimed in one of claims 11 to 15, characterized in that the learning process (2000) is capable of not generating new rules from non-essential elements.
  - 17. The information extraction method as claimed in one of claims 11 to 16, characterized in that the learning process (2000) is able to generate new rules from positive selections and from negative selections made by the user.
  - 18. The information extraction method as claimed in one of claims 11 to 16, characterized in that the outputs of the selection process (1000) can be arranged in a file or a database (80).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Thales SA
Original Assignee
Thales SA
Inventors
Poibeau, Thierry, Sedogbo, C?eacute;lestin

Application Number

US10/467,937
Publication Number

US 20040073874A1
Time in Patent Office

Days
Field of Search
US Class Current

715/531
CPC Class Codes

G06F 16/313 Selection or weighting of t...

Device for retrieving data from a knowledge-based text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Device for retrieving data from a knowledge-based text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links