DOCUMENT SIMILARITY CALCULATION DEVICE
First Claim
1. A document similarity calculation device for calculating a similarity indicating a degree of how much a plurality of documents are similar to one another, the document similarity calculation device comprising:
- a unit of storing associative word group for storing an associative word group composed of words associated with one another;
a unit of generating matrix of word frequency in document for generating a matrix of word frequency in document which is a matrix each element of which is the frequency of a word present in a document with respect to each combination of the word and the document;
a unit of transforming matrix of word frequency in document for transforming the generated matrix of word frequency in document based on the stored associative word group so as to reduce the number of dimensions of the matrix of word frequency in document; and
a unit of calculating similarity for calculating the similarity based on the transformed matrix of word frequency in document.
1 Assignment
0 Petitions
Accused Products
Abstract
A document similarity calculation device, configured to calculate a similarity indicating a degree of how much a plurality of documents are similar, includes: an associative word group storage portion for storing an associative word group composed of words associated with one another, a word-in-document frequency matrix generation portion for generating a matrix of word frequency in document which is a matrix each element of which is the frequency of a word present in a document with respect to each combination of the word and the document, a word-in-document frequency matrix transformation portion for transforming the generated matrix of word frequency in document based on the stored associative word group so as to reduce the number of dimensions of the matrix of word frequency in document, and a similarity calculation portion for calculating the similarity based on the transformed matrix of word frequency in document.
-
Citations
10 Claims
-
1. A document similarity calculation device for calculating a similarity indicating a degree of how much a plurality of documents are similar to one another, the document similarity calculation device comprising:
-
a unit of storing associative word group for storing an associative word group composed of words associated with one another; a unit of generating matrix of word frequency in document for generating a matrix of word frequency in document which is a matrix each element of which is the frequency of a word present in a document with respect to each combination of the word and the document; a unit of transforming matrix of word frequency in document for transforming the generated matrix of word frequency in document based on the stored associative word group so as to reduce the number of dimensions of the matrix of word frequency in document; and a unit of calculating similarity for calculating the similarity based on the transformed matrix of word frequency in document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A document similarity calculation method for calculating a similarity indicating a degree of how much a plurality of documents are similar to one another, the document similarity calculation method comprising:
-
prestoring an associative word group composed of words associated with one another; generating a matrix of word frequency in document which is a matrix each element of which is the frequency of a word present in a document with respect to each combination of the word and the document; transforming the generated matrix of word frequency in document based on the stored associative word group so as to reduce the number of dimensions of the matrix of word frequency in document; and calculating the similarity based on the transformed matrix of word frequency in document. - View Dependent Claims (8)
-
-
9. A medium being readable by an information processing device and storing a document similarity calculation program comprising instructions for causing the information processing device to carry out a process for calculating a similarity indicating a degree of how much a plurality of documents are similar to one another, the process comprising:
-
prestoring an associative word group composed of words associated with one another; generating a matrix of word frequency in document which is a matrix each element of which is the frequency of a word present in a document with respect to each combination of the word and the document; transforming the generated matrix of word frequency in document based on the stored associative word group so as to reduce the number of dimensions of the matrix of word frequency in document; and calculating the similarity based on the transformed matrix of word frequency in document. - View Dependent Claims (10)
-
Specification