Efficiently Representing Word Sense Probabilities
First Claim
1. A method for efficiently representing word sense probabilities, the method comprising:
- identifying one or more word senses associated with a word;
obtaining a word sense probability associated with each of the word senses;
mapping each word sense to a bucket by assigning a bucket score to each word sense, the bucket score based upon the word sense probability; and
storing the bucket score for each word sense in a semantic index.
4 Assignments
0 Petitions
Accused Products
Abstract
Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.
91 Citations
20 Claims
-
1. A method for efficiently representing word sense probabilities, the method comprising:
-
identifying one or more word senses associated with a word; obtaining a word sense probability associated with each of the word senses; mapping each word sense to a bucket by assigning a bucket score to each word sense, the bucket score based upon the word sense probability; and storing the bucket score for each word sense in a semantic index. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer storage medium having computer executable instructions stored thereon which, when executed by a computer, cause the computer to:
-
store a semantic index; identify one or more word senses associated with a word; obtain a word sense probability for each of the word senses; map each word sense to a bucket score; and
tostore the bucket score for each word sense probability in the semantic index. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computing system for efficiently representing word sense probabilities, the computing system comprising:
-
a central processing unit; a memory; and a mass storage device coupled to the central processing unit storing a semantic index and program code that is executable by the central processing unit and which, when executed by the central processing unit, will cause the computing system to identify one or more word senses associated with a word, to obtain a word sense probability for each of the word senses, to create a monotonic mapping between each word sense and a bucket by assigning a bucket score to each word sense, and to store the bucket score for each word sense probability in the semantic index, and wherein approximately equal percentages of the word sense probabilities are associated with each bucket score. - View Dependent Claims (18, 19, 20)
-
Specification