Word sense disambiguation
First Claim
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
- determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
-
Citations
16 Claims
-
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster. - View Dependent Claims (2)
-
-
3. In a collection of documents, each document containing a plurality of terms, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m by n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
4. (canceled)
-
5. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a sense of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term;
discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster;
discerning at least one non-subject term within the vicinity of the implicit position of the sense; and
assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (6)
-
-
7. In a collection of n documents and a reference collection, each document containing terms, the reference collection containing at least one meaning associated with a term, the total number of terms occurring at least once in the document collection equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
forming an m×
n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
discerning the position, within the vector space, of an occurrence of a subject term; and
assigning to the occurrence, the meaning, associated with the subject term in the reference collection, that correlates best with non-subject terms closest to the implicit position.
-
-
8. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term;
discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster;
discerning at least one non-subject term within the vicinity of the implicit position of the sense; and
assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (9)
-
-
10-11. -11. (canceled)
-
12. In a collection of n source documents and a collection of x reference documents, each document containing terms, each reference document containing at least one meaning associated with a term, the total number of terms occurring at least once in the combination collections equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one source document and associated with at least one meaning, the method comprising:
-
forming an m by [n+x] matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
discerning the position, within the vector space, of an occurrence of a subject term; and
assigning to the occurrence, the meaning, associated with the subject term, closest to the implicit position of the sense. - View Dependent Claims (13)
-
-
14. In a collection of documents, each document containing a plurality of terms, a computer-implemented method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m×
n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
15. (canceled)
-
16. In a collection of documents, each document containing a plurality of terms, a computer program product comprising instructions that when executed perform the method comprising the steps of:
-
forming an m×
n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
Specification