Word sense disambiguation
First Claim
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
- determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
11 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
-
Citations
16 Claims
-
1. In a vector space representing the latent semantic content of a collection of documents, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term, and determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster. - View Dependent Claims (2)
-
-
3. In a collection of documents, each document containing a plurality of terms, a method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m by n matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
4. In a collection of n documents and a reference collection, each document containing terms, the reference collection containing at least one meaning associated with a term, the total number of terms occurring at least once in the document collection equal to at least m, a method for determining a meaning for a sense of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
forming an m by n matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term;
discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster;
discerning at least one non-subject term within the vicinity of the implicit position of the sense; and
assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense.
-
-
5. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a sense of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term;
discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster;
discerning at least one non-subject term within the vicinity of the implicit position of the sense; and
assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (6)
-
-
7. In a collection of n documents and a reference collection, each document containing terms, the reference collection containing at least one meaning associated with a term, the total number of terms occurring at least once in the document collection equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
forming an m by n matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
discerning the position, within the vector space, of an occurrence of a subject term; and
assigning to the occurrence, the meaning, associated with the subject term in the reference collection, that correlates best with non-subject terms closest to the implicit position.
-
-
8. In a vector space representing the latent semantic content of a collection of documents, and in a reference collection comprising at least one meaning associated with a term, a method for determining a meaning for a occurrence of a subject term, the subject term found in at least one document and associated with at least one meaning, the method comprising:
-
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the document collection, each member of the subset having at least one occurrence of a subject term;
discerning an implicit position of a sense of the subject term, each implicit position corresponding to at least one determined cluster;
discerning at least one non-subject term within the vicinity of the implicit position of the sense; and
assigning to the sense having a discerned implicit position, the meaning, associated with the term in the reference collection, that correlates best with the discerned non-subject terms closest to the implicit position of the sense. - View Dependent Claims (9)
-
-
10. In a collection of n source documents and a collection of x reference documents, each document containing terms, each reference document containing at least one meaning associated with a term, the total number of terms occurring at least once in the combination collections equal to at least m, a method for determining a meaning for a sense of a subject term, the subject term found in at least one source document and associated with at least one meaning, the method comprising:
-
forming an m by [n+x] matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of the [n+x] documents having at least one occurrence of a subject term;
discerning the implicit position of at least one sense of the subject term corresponding to at least one determined cluster; and
assigning to at least one sense corresponding to at least one discerned implicit position, the meaning of the subject term closest within the vector space to the implicit position of the sense. - View Dependent Claims (11, 13)
-
-
12. In a collection of n source documents and a collection of x reference documents, each document containing terms, each reference document containing at least one meaning associated with a term, the total number of terms occurring at least once in the combination collections equal to at least m, a method for determining a meaning for an occurrence of a subject term, the subject term found in at least one source document and associated with at least one meaning, the method comprising:
-
forming an m by [n+x] matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
discerning the position, within the vector space, of an occurrence of a subject term; and
assigning to the occurrence, the meaning, associated with the subject term, closest to the implicit position of the sense.
-
-
14. In a collection of documents, each document containing a plurality of terms, a computer-implemented method for discerning the presence of at least one sense of a subject term, the method comprising:
-
forming an m by n matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
15. In a collection of documents, each document containing a plurality of terms, a computer program product for discerning the presence of at least one sense of a subject term when executed on a computer system, the computer program product comprising:
-
a computer-readable medium;
a matrix-forming module stored on the medium that forms an m by n matrix, where each matrix element (i,j) corresponds to the number of occurrences of term i in document j;
a singular value decomposition and dimensionality reduction module stored on the medium and couple to the matrix forming module that forms a latent semantic indexed vector space from the matrix;
a clustering module stored on the medium that determines at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
a sense position determining module stored on the medium an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
-
16. In a collection of documents, each document containing a plurality of terms, a computer program product comprising instructions that when executed perform the method comprising the steps of:
-
forming an m by n matrix, where each matrix element (i, j) corresponds to the number of occurrences of term i in document j;
performing singular value decomposition and dimensionality reduction on the matrix to form a latent semantic indexed vector space;
determining at least one cluster of documents within the vector space, each cluster corresponding to a subset of documents within the vector space containing a subject term; and
determining an implicit position within the vector space of at least one sense of the subject term, the implicit position corresponding to at least one determined cluster.
-
Specification