Computer based summarization of natural language documents
First Claim
Patent Images
1. A method for summarizing the contents of a natural language document including a plurality of sentences and provided in electronic or digital form, said method comprising:
- A. extracting words from sentences in said document, including determining knowledge at a fact level for each sentence by;
i) identifying the words within the sentence as parts of speech in the form of eSAOs, including identifying the words as at least one of subjects, objects, actions, adjectives, prepositions, indirect objects and adverbials; and
ii) determining if Cause-Effect relationships exist in the sentence based on semantic relationships between eSAOs in the sentence;
B. determining a weight for each eSAO and a Cause-Effect weight for each Cause-Effect relationship;
C. determining a sentence weight for each sentence in said document, using the weights of all eSAOs for said sentence and, if the sentence has a Cause-Effect relationship, the Cause-Effect weight for each Cause-Effect relationship in the sentence; and
D. generating one or more weight-based document summaries as a function of said sentence weights and at least one of displaying the summaries to a user and storing the summaries to a memory.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for summarizing the contents of a natural language document provided in electronic or digital form includes preformatting the document, performing linguistic analysis, weighting each sentence in the document as a function of quantitative importance, and generating one or more document summaries, from a plurality of selectable document summary types, as a function of the sentence weights.
127 Citations
34 Claims
-
1. A method for summarizing the contents of a natural language document including a plurality of sentences and provided in electronic or digital form, said method comprising:
-
A. extracting words from sentences in said document, including determining knowledge at a fact level for each sentence by; i) identifying the words within the sentence as parts of speech in the form of eSAOs, including identifying the words as at least one of subjects, objects, actions, adjectives, prepositions, indirect objects and adverbials; and ii) determining if Cause-Effect relationships exist in the sentence based on semantic relationships between eSAOs in the sentence; B. determining a weight for each eSAO and a Cause-Effect weight for each Cause-Effect relationship; C. determining a sentence weight for each sentence in said document, using the weights of all eSAOs for said sentence and, if the sentence has a Cause-Effect relationship, the Cause-Effect weight for each Cause-Effect relationship in the sentence; and D. generating one or more weight-based document summaries as a function of said sentence weights and at least one of displaying the summaries to a user and storing the summaries to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for summarizing the contents of a natural language document provided in electronic or digital form, said method comprising:
-
A. performing linguistic analysis, including; i) extracting words from sentences in said document, including determining knowledge at a fact level for each sentence by; a. identifying the words within the sentence as parts of speech in the form of eSAOs, including identifying the words as at least one of subjects, objects, and actions, and tagging substantially each word as a function of a part of speech of said word; and b. determining if Cause-Effect relationships exist in the sentence based on semantic relationships between eSAOs in the sentence; ii) parsing verbal sequences and noun phrases from said tagged words; and iii) building a syntactical parsed tree from said verbal sequences and noun phrases, according to a set of rules, wherein words grouped by a rule become inaccessible to other rules; B. weighting each sentence in the document as a function of quantitative importance and said syntactical parsed tree, including determining a Cause-Effect weight for each Cause-Effect relationship in the sentence; and C. generating one or more weight-based document summaries, from a plurality of selectable document summary types, as a function of the sentence weights and at least one of displaying the summaries to a user and storing the summaries to a memory. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A system for summarizing the contents of a natural language document provided in electronic or digital form, said system comprising:
-
A. at least one memory having a set of linguistic rules stored therein; B. a linguistic analyzer coupled to said at least one memory and configured to; i) extract words from sentences in said document, including determining knowledge at a fact level for each sentence by; a. identifying the words within the sentence as parts of speech in the form of eSAOs, including identifying the words as at least one of subjects, objects, and actions, and tagging substantially each word as a function of a part of speech of said word; and b. determining if Cause-Effect relationships exist in the sentence based on semantic relationships between eSAOs in the sentence; ii) parsing verbal sequences and noun phrases from said tagged words; and iii) building a syntactical parsed tree from said verbal sequences and noun phrases, according to said set of rules, wherein words grouped by a rule become inaccessible to other rules; C. a sentence weighting module configured to access said syntactical phrase tree and to determine a weight tbr each sentence in the document as a function of quantitative importance and said syntactical parsed tree and if the sentence has a Cause-Effect relationship, determine the Cause-Effect weight for each Cause-Effect relationship in the sentence; and D. a summary generator configured to generate one or more weight-based document summaries, from a plurality of selectable document summary types, as a function of the sentence weights. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A system for summarizing the contents of a natural language document including a plurality of sentences and provided in electronic or digital form, said system comprising:
-
A. at least one memory having a set of linguistic rules stored therein; B. a linguistic analyzer coupled to said at least one memory and configured to extract words from sentences in said document, including determining knowledge at a fact level for each sentence by; i. identifying the words within the sentence as parts of speech in the form of eSAOs, including subjects, objects, actions, adjectives, prepositions, indirect objects and adverbials; and ii. determining if Cause-Effect relationships exist in the sentence based on semantic relationships between eSAOs in the sentence; C. a weighting module configured to determine a weight for each eSAO and a Cause-Effect weight for each Cause-Effect relationship and, to determine a sentence weight for each sentence in said document, using the weights of all eSAOs for said sentence and, if the sentence has a Cause-Effect relationship, the Cause-Effect weight for each Cause-Effect relationship in the sentence; and D. a summary generator configured to generate one or more weight-based document summaries as a function of said sentence weights. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification