Optimization of fact extraction using a multi-stage approach
First Claim
1. A method of finding facts within electronic resources, comprising:
- scanning an electronic resource to discover factual descriptions of sentences that comprise words matching words of a fact-word table;
examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions; and
determining whether to present a factual description as a fact based on the identified linguistic constituents.
2 Assignments
0 Petitions
Accused Products
Abstract
Facts are extracted from electronic documents by recognizing factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighborhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.
43 Citations
20 Claims
-
1. A method of finding facts within electronic resources, comprising:
-
scanning an electronic resource to discover factual descriptions of sentences that comprise words matching words of a fact-word table; examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions; and determining whether to present a factual description as a fact based on the identified linguistic constituents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable medium containing instructions that perform acts comprising:
-
receiving a search term; parsing a plurality of electronic documents to discover factual descriptions of sentences that comprise words matching words of a fact-word table; examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions; and determining whether to present a factual description as a fact relevant to the search term based on the identified linguistic constituent. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer system, comprising:
-
storage containing a plurality of electronic resources that comprise textual information; a processor that receives a request to present facts that are related to the search term from a set of electronic documents, wherein the processor parses the plurality of electronic documents to discover factual descriptions of sentences that comprise words matching words of a fact-word table, examines the discovered factual descriptions to identify the linguistic constituents of the factual descriptions, determines whether to present a factual description as a fact based on the identified linguistic constituent, and presents at least a portion of sentences that contain the factual descriptions that are determined to be presented as a fact and that are related to the search term. - View Dependent Claims (17, 18, 19, 20)
-
Specification