Automatic Generation Of Ontologies Using Word Affinities
First Claim
1. A method comprising:
- accessing an inverted index stored in a tangible storage medium, the inverted index comprising a plurality of inverted index lists for a plurality of words of a language, an inverted index list corresponding to a word indicating one or more pages that include the word;
for each word pair of the plurality of words, the word pair comprising a first word and a second word;
searching a first inverted index list and a second inverted index list, the first inverted index list corresponding to the first word, the second inverted index list corresponding to the second word;
calculating an affinity between the first word and the second word according to the first inverted index list and the second inverted index list, the affinity describing a quantitative relationship between the first word and the second word; and
recording the affinity in an affinity matrix; and
reporting the affinity matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, generating an ontology includes accessing an inverted index that comprises inverted index lists for words of a language. An inverted index list corresponding to a word indicates pages that include the word. A word pair comprises a first word and a second word. A first inverted index list and a second inverted index list are searched, where the first inverted index list corresponds to the first word and the second inverted index list corresponds to the second word. An affinity between the first word and the second word is calculated according to the first inverted index list and the second inverted index list. The affinity describes a quantitative relationship between the first word and the second word. The affinity is recorded in an affinity matrix, and the affinity matrix is reported.
-
Citations
23 Claims
-
1. A method comprising:
-
accessing an inverted index stored in a tangible storage medium, the inverted index comprising a plurality of inverted index lists for a plurality of words of a language, an inverted index list corresponding to a word indicating one or more pages that include the word; for each word pair of the plurality of words, the word pair comprising a first word and a second word; searching a first inverted index list and a second inverted index list, the first inverted index list corresponding to the first word, the second inverted index list corresponding to the second word; calculating an affinity between the first word and the second word according to the first inverted index list and the second inverted index list, the affinity describing a quantitative relationship between the first word and the second word; and recording the affinity in an affinity matrix; and reporting the affinity matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. One or more computer-readable media encoding software, when executed, operable to:
-
access an inverted index stored in a tangible storage medium, the inverted index comprising a plurality of inverted index lists for a plurality of words of a language, an inverted index list corresponding to a word indicating one or more pages that include the word; for each word pair of the plurality of words, the word pair comprising a first word and a second word; search a first inverted index list and a second inverted index list, the first inverted index list corresponding to the first word, the second inverted index list corresponding to the second word; calculate an affinity between the first word and the second word according to the first inverted index list and the second inverted index list, the affinity describing a quantitative relationship between the first word and the second word; and record the affinity in an affinity matrix; and report the affinity matrix. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system comprising:
-
means for accessing an inverted index stored in a tangible storage medium, the inverted index comprising a plurality of inverted index lists for a plurality of words of a language, an inverted index list corresponding to a word indicating one or more pages that include the word; means for, for each word pair of the plurality of words, the word pair comprising a first word and a second word; searching a first inverted index list and a second inverted index list, the first inverted index list corresponding to the first word, the second inverted index list corresponding to the second word; calculating an affinity between the first word and the second word according to the first inverted index list and the second inverted index list, the affinity describing a quantitative relationship between the first word and the second word; and recording the affinity in an affinity matrix; and means for reporting the affinity matrix.
-
Specification