Evaluating distinctiveness of document
First Claim
1. A method of evaluating a degree of distinctiveness of each document segment contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive document segment, the method comprising:
- (a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in its respective document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from said document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from said squared sum matrices corresponding to the comparison document and the target document;
(d) computing respective degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each document segment of the target document, from said corresponding document segment vector and said topic difference factor vectors of said respective orders; and
(e) identifying a distinctive document segment in the target document, on the basis of the degrees of distinctiveness of said respective orders or on the basis of the total degree of distinctiveness thereof.
2 Assignments
0 Petitions
Accused Products
Abstract
Two document sets are compared in natural language processing and the distinctiveness of each constituent element (such as a sentence, term or phrase) of one document set is evaluated by dividing both the target and comparison documents into document segments, constructing the sentence vector of each document segment whose components are the occurring frequencies of terms occurring in the document segment, and projecting all the sentence vectors of both the documents on a projection axis to find a projection axis which maximizes a ratio equal to: (squared sum of projected values originating from the target document)/(squared sum of projected values originating from the comparison document). Projected values are obtained by projecting the sentence vectors on the projection axis, and the degrees of distinctiveness of the individual sentences of the target document are calculated on the basis of the projected values.
100 Citations
41 Claims
-
1. A method of evaluating a degree of distinctiveness of each document segment contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in its respective document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from said document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from said squared sum matrices corresponding to the comparison document and the target document;
(d) computing respective degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each document segment of the target document, from said corresponding document segment vector and said topic difference factor vectors of said respective orders; and
(e) identifying a distinctive document segment in the target document, on the basis of the degrees of distinctiveness of said respective orders or on the basis of the total degree of distinctiveness thereof. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive combination of terms, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices corresponding to the comparison document and the target document;
(d) computing a term combination vector for each combinations of terms in the target document, each term combination vector having components corresponding to the terms contained in the combination of terms being given values determined by occurring numbers of said terms in said combination of terms, and having other components equal to “
0”
;
(e) computing degrees of distinctiveness of the respective orders and a total degree of distinctiveness for each combination of terms of the target document, from the corresponding term combination vector and the topic difference factor vectors of said respective orders; and
(f) identifying a combination of terms in the target document as being distinctive, on the basis of the degrees of distinctiveness of said respective orders or the total degree of distinctiveness thereof. - View Dependent Claims (8)
-
-
9. A method of evaluating a degree of distinctiveness of each term contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive term, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices respectively corresponding to the comparison document and the target document;
(d) computing values of inner products for each of said document segments of the target document and the comparison document, the values of inner products being calculated between the corresponding document segment vector and the topic difference factor vectors of the respective orders;
(e) computing degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each term contained in the target document, on the basis of correlation coefficients between frequencies of each term in the respective document segments and the values of the inner products; and
(f) identifying a distinctive combination of terms in the target document, on the basis of the degrees of distinctiveness of said respective orders or the total degree of distinctiveness thereof. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive combination of terms, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices respectively corresponding to the comparison document and the target document;
(d) computing values of inner products for each of said document segments of the target document and the comparison document, said values of inner products being calculated between the corresponding document segment vector and the topic difference factor vectors of the respective orders;
(e) computing degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each combination of terms contained in the target document, on the basis of correlation coefficients between frequencies of each combination of terms in the respective document segments and the values of the inner products; and
(f) identifying a distinctive combination of terms in the target document, on the basis of the degrees of distinctiveness of said respective orders or the total degree of distinctiveness thereof. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method of evaluating a degree of distinctiveness of each document segment contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive document segment, the method comprising:
-
(a) identifying a respective document segment vector for each segment of the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing similarities of the document segment vector for each document segment of the target document, the similarities of the document segment vectors corresponding to the target document and the comparison document;
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document; and
(d) identifying a distinctive document segment in the target document, on the basis of the total degree of distinctiveness thereof. - View Dependent Claims (22, 23, 24)
-
-
25. A method of evaluating a degree of distinctiveness of each term contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive term, the method comprising:
-
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, the document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing similarities of the corresponding document segment vector for each document segment of the target document, the similarities of the corresponding document segment vectors corresponding to the target document and the comparison document;
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document;
(d) computing a total degree of distinctiveness for each document segment of the comparison document, by using said similarities to the target document and the comparison document;
(e) computing a total degree of distinctiveness for each of the terms contained in the target document, on the basis of correlation coefficients between frequencies of each term in the respective document segments of the target document and the comparison document and values of the total degrees of distinctiveness of the respective document segment vectors; and
(f) identifying a distinctive term in the target document, on the basis of the total degree of distinctiveness thereof.
-
-
26. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive combination of terms, the method comprising
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, said document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment; -
(b) computing similarities of the corresponding document segment vector for each of the document segments of the target document, the similarities of the corresponding document segment vectors corresponding to the target document and the comparison document;
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document;
(d) computing a degree of distinctiveness for each combination of terms contained in the target document, on the basis of correlation coefficients between frequencies of each combination of terms in the respective document segments and values of total degrees of distinctiveness of said respective document segments; and
(e) identifying a distinctive combination of terms in the target document, on the basis of the total degree of distinctiveness thereof. - View Dependent Claims (27, 28, 29)
-
-
30. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive combination of terms, the method comprising:
-
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, said document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing a term combination vector for each combination of terms in said target document, said term combination vectors having components corresponding to the terms contained in the combination of terms being given values determined by occurring numbers of said terms in said combination of terms, and having other components equal to “
0”
;
(c) computing similarities of the corresponding combination vector of terms for each combination of terms in the target document, the similarities of the corresponding combination vectors corresponding to the target document and the comparison document;
(d) computing a total degree of distinctiveness for each combination of terms in the target document, by using the similarities to the target document and the comparison document; and
(e) identifying a distinctive combination of terms in the target document, on the basis of the total degree of distinctiveness thereof. - View Dependent Claims (31, 32, 33)
-
-
34. A method of evaluating a degree of distinctiveness of each document segment contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in its respective document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from said document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from said squared sum matrices corresponding to the comparison document and the target document; and
(d) computing respective degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each document segment of the target document, from said corresponding document segment vector and said topic difference factor vectors of said respective orders.
-
-
35. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, and identifying a distinctive combination of terms, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices corresponding to the comparison document and the target document;
(d) computing a term combination vector for each combinations of terms in the target document, each term combination vector having components corresponding to the terms contained in the combination of terms being given values determined by occurring numbers of said terms in said combination of terms, and having other components equal to “
0”
; and
(e) computing degrees of distinctiveness of the respective orders and a total degree of distinctiveness for each combination of terms of the target document, from the corresponding term combination vector and the topic difference factor vectors of said respective orders.
-
-
36. A method of evaluating a degree of distinctiveness of each term contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices respectively corresponding to the comparison document and the target document;
(d) computing values of inner products for each of said document segments of the target document and the comparison document, the values of inner products being calculated between the corresponding document segment vector and the topic difference factor vectors of the respective orders; and
(e) computing degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each term contained in the target document, on the basis of correlation coefficients between frequencies of each term in the respective document segments and the values of the inner products.
-
-
37. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment contained in the comparison document and the target document, each document segment having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing squared sum matrices respectively corresponding to the comparison document and the target document, from the document segment vectors;
(c) computing a predetermined number of orders of topic difference factor vectors of the target document from the squared sum matrices respectively corresponding to the comparison document and the target document;
(d) computing values of inner products for each of said document segments of the target document and the comparison document, said values of inner products being calculated between the corresponding document segment vector and the topic difference factor vectors of the respective orders; and
(e) computing degrees of distinctiveness of said respective orders and a total degree of distinctiveness for each combination of terms contained in the target document, on the basis of correlation coefficients between frequencies of each combination of terms in the respective document segments and the values of the inner products.
-
-
38. A method of evaluating a degree of distinctiveness of each document segment contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each segment of the comparison document and the target document, each document segment vector having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing similarities of the document segment vector for each document segment of the target document, the similarities of the document segment vectors corresponding to the target document and the comparison document; and
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document.
-
-
39. A method of evaluating a degree of distinctiveness of each term contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, the document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing similarities of the corresponding document segment vector for each document segment of the target document, the similarities of the corresponding document segment vectors corresponding to the target document and the comparison document;
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document;
(d) computing a total degree of distinctiveness for each document segment of the comparison document, by using said similarities to the target document and the comparison document; and
(e) computing a total degree of distinctiveness for each of the terms contained in the target document, on the basis of correlation coefficients between frequencies of each term in the respective document segments of the target document and the comparison document and values of the total degrees of distinctiveness of the respective document segment vectors.
-
-
40. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, said document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing similarities of the corresponding document segment vector for each of the document segments of the target document, the similarities of the corresponding document segment vectors corresponding to the target document and the comparison document;
(c) computing a total degree of distinctiveness for each document segment of the target document, by using the similarities to the target document and the comparison document; and
(d) computing a degree of distinctiveness for each combination of terms contained in the target document, on the basis of correlation coefficients between frequencies of each combination of terms in the respective document segments and values of total degrees of distinctiveness of said respective document segments.
-
-
41. A method of evaluating a degree of distinctiveness of each combination of terms contained in a target document including at least one document segment with respect to a comparison document including at least one document segment, the method comprising:
-
(a) identifying a respective document segment vector for each document segment of the comparison document and the target document, said document segment vectors having component values associated with occurring frequencies of terms occurring in the document segment;
(b) computing a term combination vector for each combination of terms in said target document, said term combination vectors having components corresponding to the terms contained in the combination of terms being given values determined by occurring numbers of said terms in said combination of terms, and having other components equal to “
0”
;
(c) computing similarities of the corresponding combination vector of terms for each combination of terms in the target document, the similarities of the corresponding combination vectors corresponding to the target document and the comparison document; and
(d) computing a total degree of distinctiveness for each combination of terms in the target document, by using the similarities to the target document and the comparison document.
-
Specification