Word sense disambiguation
First Claim
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
- determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, anddetermining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
30 Citations
12 Claims
-
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster. - View Dependent Claims (2)
-
-
3. In a collection of documents, each document containing a plurality of terms, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m by n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j; performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space; determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
4. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a sense of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term; discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster; discerning at least one non-subject term within the vicinity of the implicit position of the sense; and assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (5)
-
-
6. In a collection of n documents and a reference collection, each document containing terms, the reference collection containing at least one meaning associated with a term, the total number of terms occurring at least once in the document collection equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
forming an m×
n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space; discerning the position, within the vector space, of an occurrence of a subject term; and assigning to the occurrence, the meaning, associated with the subject term in the reference collection, that correlates best with non-subject terms closest to the implicit position.
-
-
7. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term; discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster; discerning at least one non-subject term within the vicinity of the implicit position of the sense; and assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (8)
-
-
9. In a collection of n source documents and a collection of x reference documents, each document containing terms, each reference document containing at least one meaning associated with a term, the total number of terms occurring at least once in the combination collections equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one source document and associated with at least one meaning, the method comprising:
-
forming an m by [n+x] matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j; performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space; discerning the position, within the vector space, of an occurrence of a subject term; and assigning to the occurrence, the meaning, associated with the subject term, closest to the implicit position of the sense. - View Dependent Claims (10)
-
-
11. In a collection of documents, each document containing a plurality of terms, a computer-implemented method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m×
x n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space; determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
12. In a collection of documents, each document containing a plurality of terms, a computer program product comprising instructions that when executed perform the method comprising the steps of:
-
forming an m×
n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space; determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
Specification