Method and mechanism for the creation, maintenance, and comparison of semantic abstracts
First Claim
1. A method for determining dominant phrase vectors in a topological vector space for a semantic content of a document on a computer system, the method comprising:
- identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept;
selecting a subset of the chains to form a basis for the dictionary;
accessing dominant phrases for the document, the dominant phrases representing a condensed content for the document;
measuring how concretely each dominant phrase is represented in each chain in the basis and the dictionary;
constructing at least one state vector in the topological vector space for each dominant phrase using the measures of how concretely each dominant phrase is represented in each chain in the dictionary and the basis; and
collecting the state vectors into the dominant phrase vectors for the document.
10 Assignments
0 Petitions
Accused Products
Abstract
Codifying the “most prominent measurement points” of a document can be used to measure semantic distances given an area of study (e.g., white papers on some subject area). A semantic abstract is created for each document. The semantic abstract is a semantic measure of the subject or theme of the document providing a new and unique mechanism for characterizing content. The semantic abstract includes state vectors in the topological vector space, each state vector representing one lexeme or lexeme phrase about the document. The state vectors can be dominant phrase vectors in the topological vector space mapped from dominant phrases extracted from the document. The state vectors can also correspond to words in the document that are most significant to the document'"'"'s meaning (the state vectors are called dominant vectors in this case). One semantic abstract can be directly compared with another semantic abstract, resulting in a numeric semantic distance between the semantic abstracts being compared.
-
Citations
40 Claims
-
1. A method for determining dominant phrase vectors in a topological vector space for a semantic content of a document on a computer system, the method comprising:
-
identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; accessing dominant phrases for the document, the dominant phrases representing a condensed content for the document; measuring how concretely each dominant phrase is represented in each chain in the basis and the dictionary; constructing at least one state vector in the topological vector space for each dominant phrase using the measures of how concretely each dominant phrase is represented in each chain in the dictionary and the basis; and collecting the state vectors into the dominant phrase vectors for the document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for determining dominant vectors in a topological vector space for a semantic content of a document on a computer system, the method comprising:
-
identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; storing the document in computer memory accessible by the computer system; extracting words from at least a portion of the document; measuring how concretely each word is represented in each chain in the basis and the dictionary; constructing a state vector in the topological vector space for each word using the measures of how concretely each word is represented in each chain in the dictionary and the basis; filtering the state vectors; and collecting the filtered state vectors into the dominant vectors for the document. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A method for determining a semantic abstract in a topological vector space for a semantic content of a document on a computer system, the method comprising:
-
identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; storing the document in computer memory accessible by the computer system; determining dominant phrases for the document; measuring how concretely each dominant phrase is represented in each chain in the basis and the dictionary; constructing dominant phrase vectors in the topological vector space for the dominant phrases using the measures of how concretely each dominant phrase is represented in each chain in the dictionary and the basis; selecting words for the document; measuring how concretely each word is represented in each chain in the basis and the dictionary; constructing dominant vectors in the topological vector space for the words using the measures of how concretely each word is represented in each chain in the dictionary and the basis; and generating the semantic abstract using the dominant phrase vectors and the dominant vectors. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for comparing the semantic content of first and second documents on a computer system, the method comprising:
-
identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; accessing dominant phrases for the first document, the dominant phrases representing a condensed content for the first document; measuring how concretely each dominant phrase for the first document is represented in each chain in the basis and the dictionary; constructing at least one state vector for the first document in a topological vector space for each dominant phrase for the first document using the measures of how concretely each dominant phrase for the first document is represented in each chain in the dictionary and the basis; collecting the state vectors for the first document into the semantic abstract for the first document; determining a semantic abstract for the second document; measuring a distance between the semantic abstracts; and classifying how closely related the first and second documents are using the distance. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
-
30. A method for locating a second document on a computer with a semantic content similar to a first document, the method comprising:
-
identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; accessing dominant phrases for the first document, the dominant phrases representing a condensed content for the first document; measuring how concretely each dominant phrase for the first document is represented in each chain in the basis and the dictionary; constructing at least one state vector for the first document in a topological vector space for each dominant phrase for the first document using the measures of how concretely each dominant phrase for the first document is represented in each chain in a dictionary and the basis; collecting the state vectors for the first document into the semantic abstract for the first document; locating a second document; determining a semantic abstract for the second document; measuring a distance between the semantic abstracts for the first and second documents; classifying how closely related the first and second documents are using the distance; and if the second document is classified as having a semantic content similar to the semantic content of the first document, selecting the second document. - View Dependent Claims (31, 32)
-
-
33. An apparatus on a computer system to determine a semantic abstract in a topological vector space for a semantic content of a document stored on the computer system, the apparatus comprising:
-
a phrase extractor adapted to extract phrases from the document; a state vector constructor adapted to construct state vectors in the topological vector space for each phrase extracted by the phrase extractor, the state vectors measuring how concretely each phrase extracted by the phrase extractor is represented in each chain in a basis and a dictionary, the dictionary including a directed set of concepts including a maximal element and at least one chain from the maximal element to every concept in the directed set, the basis including a subset of chains in the directed set; and collection means for collecting the state vectors into the semantic abstract for the document. - View Dependent Claims (34, 35, 36)
-
-
37. A method for determining a semantic abstract in a topological vector space for a semantic content of a document on a computer system, the method comprising:
-
extracting dominant phrases from the document using a phrase extractor, the dominant phrases representing a condensed content for the document; identifying a directed set of concepts as a dictionary, the directed set including a maximal element and at least one concept, and at least one chain from the maximal element to every concept; selecting a subset of the chains to form a basis for the dictionary; measuring how concretely each dominant phrase is represented in each chain in the basis and the dictionary; constructing at least one first state vector in the topological vector space for each dominant phrase using the measures of how concretely each dominant phrase is represented in each chain in the dictionary and the basis; collecting the first state vectors into dominant phrase vectors for the document; extracting words from at least a portion of the document; constructing a second state vector in the topological vector space for each word using the dictionary and the basis; filtering the second state vectors; collecting the filtered second state vectors into dominant vectors for the document; and generating the semantic abstract using the dominant phrase vectors and the dominant vectors. - View Dependent Claims (38, 39, 40)
-
Specification