Distinguishing facts from opinions using a multi-stage approach
First Claim
1. A computer-implemented method performed by a processor for distinguishing facts from opinions within electronic resources, comprising:
- receiving a search term comprising a noun;
finding relevant electronic resources that match the search term;
displaying a list of relevant electronic resources and snippets of the relevant electronic resources in the list that comprise words matching the search term;
scanning a relevant electronic resource to discover factual descriptions of sentences that comprise the noun of the search term and one or more verbs matching words of a fact-word table constructed to include a list of verbs determined to be indicative of fact expressions;
eliminating portions of the relevant electronic resource from fact extraction processing that comprise words not matching the search term and the words of the fact-word table;
examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions after eliminating portions of the relevant electronic resource;
determining whether to present a factual description as a fact based on the identified linguistic constituents; and
presenting at least a portion of a sentence that contains the search term and a factual description determined to be a fact relevant to the search term.
2 Assignments
0 Petitions
Accused Products
Abstract
Facts are extracted from electronic documents by recognizing factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighborhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.
33 Citations
20 Claims
-
1. A computer-implemented method performed by a processor for distinguishing facts from opinions within electronic resources, comprising:
-
receiving a search term comprising a noun; finding relevant electronic resources that match the search term; displaying a list of relevant electronic resources and snippets of the relevant electronic resources in the list that comprise words matching the search term; scanning a relevant electronic resource to discover factual descriptions of sentences that comprise the noun of the search term and one or more verbs matching words of a fact-word table constructed to include a list of verbs determined to be indicative of fact expressions; eliminating portions of the relevant electronic resource from fact extraction processing that comprise words not matching the search term and the words of the fact-word table; examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions after eliminating portions of the relevant electronic resource; determining whether to present a factual description as a fact based on the identified linguistic constituents; and presenting at least a portion of a sentence that contains the search term and a factual description determined to be a fact relevant to the search term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable storage medium containing executable program instructions that, when executed by a processor, cause the processor to perform acts comprising:
-
receiving a search term comprising a noun; finding relevant electronic resources that match the search term; displaying a list of relevant electronic resources and snippets of the relevant electronic resources in the list that comprise words matching the search term; parsing a plurality of relevant electronic documents to discover factual descriptions of sentences that comprise the noun of the search term and one or more verbs matching words of a fact-word table constructed to include a list of verbs determined to be indicative of fact expressions; eliminating portions of the relevant electronic documents from fact extraction processing that comprise words not matching the search term and words of the fact-word table; examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions after eliminating portions of the electronic documents; determining whether to present a factual description as a fact relevant to the search term based on the identified linguistic constituent by applying excluding rules to candidate factual descriptions in relation to the linguistic constituents, scoring candidate factual descriptions based on certainty of a matching fact-word and on individual weights of subject and object noun phrases, and eliminating candidate factual descriptions from consideration according to the excluding rules and scoring of the factual descriptions; and presenting at least a portion of a sentence that contains the search term and a factual description determined to be a fact relevant to the search term. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer system, comprising:
-
storage containing a plurality of electronic resources that comprise textual information; a processor that receives a search term comprising a noun, finds relevant electronic resources that match the search term, displays a list of relevant electronic resources and snippets of the relevant electronic resources in the list that comprise words matching the search term, and receives a request to present facts that are related to the search term from a set of relevant electronic documents, wherein the processor parses the relevant electronic documents to discover factual descriptions of sentences that comprise the noun of the search term and one or more verbs matching words of a fact-word table constructed to include a list of verbs determined to be indicative of fact expressions, the processor eliminates portions of the relevant electronic documents from fact extraction processing that comprise words not matching the search term and words of the fact-word table, the processor examines the discovered factual descriptions to identify the linguistic constituents of the factual descriptions after eliminating portions of the relevant electronic documents, determines whether to present a factual description as a fact based on the identified linguistic constituent, and presents at least a portion of sentences that contain the factual descriptions that are determined to be presented as a fact and that are related to the search term. - View Dependent Claims (17, 18, 19, 20)
-
Specification