Method and mechanism for the creation, maintenance, and comparison of semantic abstracts
First Claim
1. A method implemented in a computer system including one or more computers communicating with each other, each of the one or more computers including a memory, for determining a semantic abstract in a topological vector space for a semantic content of a document using a dictionary and a basis, where the document, dictionary, and basis are each stored on at least one of the one or more computers, comprising:
- accessing the dictionary including a directed set of concepts, the directed set including at least one chain from a maximal element to each other concept in the dictionary;
accessing the basis, the basis including a subset of chains from the dictionary;
identifying dominant phrases in the document;
measuring how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary;
constructing in the memory of one of the one or more computers dominant phrase vectors for the document using the measures of how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary; and
determining the semantic abstract using the dominant phrase vectors.
11 Assignments
0 Petitions
Accused Products
Abstract
Codifying the “most prominent measurement points” of a document can be used to measure semantic distances given an area of study (e.g., white papers on some subject area). A semantic abstract is created for each document. The semantic abstract is a semantic measure of the subject or theme of the document providing a new and unique mechanism for characterizing content. The semantic abstract includes state vectors in the topological vector space, each state vector representing one lexeme or lexeme phrase about the document. The state vectors can be dominant phrase vectors in the topological vector space mapped from dominant phrases extracted from the document. The state vectors can also correspond to words in the document that are most significant to the document'"'"'s meaning (the state vectors are called dominant vectors in this case). One semantic abstract can be directly compared with another semantic abstract, resulting in a numeric semantic distance between the semantic abstracts being compared.
-
Citations
18 Claims
-
1. A method implemented in a computer system including one or more computers communicating with each other, each of the one or more computers including a memory, for determining a semantic abstract in a topological vector space for a semantic content of a document using a dictionary and a basis, where the document, dictionary, and basis are each stored on at least one of the one or more computers, comprising:
-
accessing the dictionary including a directed set of concepts, the directed set including at least one chain from a maximal element to each other concept in the dictionary; accessing the basis, the basis including a subset of chains from the dictionary; identifying dominant phrases in the document; measuring how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary; constructing in the memory of one of the one or more computers dominant phrase vectors for the document using the measures of how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary; and determining the semantic abstract using the dominant phrase vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable medium, said computer-readable medium having stored thereon a program, that, when executed by a computer, result in:
-
accessing a dictionary including a directed set of concepts, the directed set including at least one chain from a maximal element to each other concept in the dictionary; accessing a basis, the basis including a subset of chains from the dictionary; identifying dominant phrases in the document; measuring how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary; constructing dominant phrase vectors for the document using the measures of how concretely each identified dominant phrase is represented in each chain in the basis and the dictionary; and determining a semantic abstract using the dominant phrase vectors. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification