Efficiently representing word sense probabilities
First Claim
1. A computer-implemented method for efficiently representing word sense probabilities, the method comprising performing computer-implemented operations for:
- identifying by way of a computer one or more word senses associated with a word;
obtaining by way of the computer a word sense probability associated with each of the word senses;
mapping each word sense to exactly one of a plurality of N-bit binary numbers by generating a monotonic mapping between the word sense probabilities and the N-bit binary numbers by way of the computer, the N-bit binary numbers mapped to each word sense based upon the word sense probability and whereby approximately equal percentages of the word sense probabilities are associated with each N-bit binary number; and
storing the N-bit binary number mapped to each word sense in a semantic index by way of the computer.
4 Assignments
0 Petitions
Accused Products
Abstract
Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.
98 Citations
10 Claims
-
1. A computer-implemented method for efficiently representing word sense probabilities, the method comprising performing computer-implemented operations for:
-
identifying by way of a computer one or more word senses associated with a word; obtaining by way of the computer a word sense probability associated with each of the word senses; mapping each word sense to exactly one of a plurality of N-bit binary numbers by generating a monotonic mapping between the word sense probabilities and the N-bit binary numbers by way of the computer, the N-bit binary numbers mapped to each word sense based upon the word sense probability and whereby approximately equal percentages of the word sense probabilities are associated with each N-bit binary number; and storing the N-bit binary number mapped to each word sense in a semantic index by way of the computer. - View Dependent Claims (2, 3, 4)
-
-
5. A computer storage medium that is not a signal having computer executable instructions stored thereon which, when executed by a computer, cause the computer to:
-
store a semantic index; identify one or more word senses associated with a word; obtain a word sense probability for each of the word senses; map each word sense to exactly one of a plurality of N-bit binary numbers by assigning an N-bit binary number to the word sense based upon the word sense probability such that a monotonic mapping exists between the word sense probabilities and the N-bit binary numbers, whereby approximately equal percentages of the word sense probabilities are associated with each of the plurality of N-bit numbers; and
tostore the N-bit binary number assigned to each word sense probability in the semantic index. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A computing system for efficiently representing word sense probabilities, the computing system comprising:
-
a central processing unit; a memory; and a mass storage device coupled to the central processing unit storing a semantic index and program code that is executable by the central processing unit and which, when executed by the central processing unit, will cause the computing system to identify one or more word senses associated with a word, to obtain a word sense probability for each of the word senses, to create a mapping between each word sense and one of a plurality of N-bit binary numbers by assigning exactly one of the plurality of N-bit binary numbers to each word sense such that word senses having greater word sense probabilities are assigned greater N-bit binary numbers than word senses having lesser word sense probabilities and such that approximately equal percentages of the word sense probabilities are associated with each of the plurality of N-bit binary numbers, to store the N-bit binary number assigned to each word sense probability in the semantic index, and to utilize the N-bit binary numbers stored in the semantic index to prune one or more of the word senses prior to construction of the semantic index, to prune one or more of the word senses at the time a query is performed, or to rank the word senses when a query is performed.
-
Specification