Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication
First Claim
1. A system having a memory for generating a document summary, the system comprising:
- receiving means for receiving a set of documents, the documents comprising a collection of related symbols wherein document structure specifies relationships between and among the symbols;
graph generating means, in communication with the receiving means, for generating, based on the received set of documents, conceptual graphs indicative of relevance of concepts or sub-concepts within one or more documents in the set of documents, the graph generating means comprisinga semantic relevance filter for testing groups of concepts or sub-concepts as possible conceptual centers of semantic relevance and determining which concepts are semantically central and which are peripheral, thereby generating semantic relevance relationship vectors that define the relative semantic relevance between and among concepts in each document, wherein the semantic relevance filter utilizes information structure inherent in the documents in testing groups of concepts or sub-concepts, the information structure including any of symbol position, symbol co-proximity, highlighted symbols, or consecutive groupings of symbols, and wherein the testing process is iterated for each term, where the semantic relevance filter further comprises noise reduction means for removing noise from a document by analyzing the structure of the document, the noise reduction means including;
means for determining the grouping of concepts in the document;
means for determining a hierarchy among the concepts;
means for determining how information is distributed through the hierarchy; and
means for determining, based on the determined information distribution, what portion of the document is attempting to convey information;
and normalizing means, in communication with the testing means, for normalizing semantic relevance scores assigned to each concept, across the total number of concepts;
determining means, in communication with the graph generating means, for determining from the conceptual graphs a set of directed conceptual relation vectors, utilizing a selected heuristic, the heuristic including utilizing a derived semantic ordering to determine a direction of each vector, from general to specific; and
summary generating means, in communication with the determining means, for creating, based on the vectors, a summary of at least one document in the set of received documents, the summary being generated from sentences or sentence fragments in the document, the summary generating means including ranking means for ranking sentences or sentence fragments in accordance with any of core ideas contained therein or relationships expressed there between.
1 Assignment
0 Petitions
Accused Products
Abstract
A system that processes a collection of one or more documents and thereby constructs a knowledge base is described. The system leverages innovative graph theoretical analysis of documents leveraging the inherent structure in communication. Through the generation of the automated knowledge base the system is able to provides intra-document analysis such as variable summarization and indexing, document key concepts, better filtering and relevance matching on a semantic level for documents, context dependant directories, document categorization, better basis for natural language processing, new knowledge and information through the amalgamation of the data (collection intelligence).
115 Citations
22 Claims
-
1. A system having a memory for generating a document summary, the system comprising:
-
receiving means for receiving a set of documents, the documents comprising a collection of related symbols wherein document structure specifies relationships between and among the symbols; graph generating means, in communication with the receiving means, for generating, based on the received set of documents, conceptual graphs indicative of relevance of concepts or sub-concepts within one or more documents in the set of documents, the graph generating means comprising a semantic relevance filter for testing groups of concepts or sub-concepts as possible conceptual centers of semantic relevance and determining which concepts are semantically central and which are peripheral, thereby generating semantic relevance relationship vectors that define the relative semantic relevance between and among concepts in each document, wherein the semantic relevance filter utilizes information structure inherent in the documents in testing groups of concepts or sub-concepts, the information structure including any of symbol position, symbol co-proximity, highlighted symbols, or consecutive groupings of symbols, and wherein the testing process is iterated for each term, where the semantic relevance filter further comprises noise reduction means for removing noise from a document by analyzing the structure of the document, the noise reduction means including; means for determining the grouping of concepts in the document; means for determining a hierarchy among the concepts; means for determining how information is distributed through the hierarchy; and means for determining, based on the determined information distribution, what portion of the document is attempting to convey information; and normalizing means, in communication with the testing means, for normalizing semantic relevance scores assigned to each concept, across the total number of concepts; determining means, in communication with the graph generating means, for determining from the conceptual graphs a set of directed conceptual relation vectors, utilizing a selected heuristic, the heuristic including utilizing a derived semantic ordering to determine a direction of each vector, from general to specific; and summary generating means, in communication with the determining means, for creating, based on the vectors, a summary of at least one document in the set of received documents, the summary being generated from sentences or sentence fragments in the document, the summary generating means including ranking means for ranking sentences or sentence fragments in accordance with any of core ideas contained therein or relationships expressed there between. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification