Visual comparison of documents using latent semantic differences
First Claim
1. A method for comparing documents using latent semantic differences, the method comprising:
- receiving a plurality of documents from a user;
extracting a plurality of linguistic units associated with the received plurality of documents,wherein a canonical unit for each linguistic unit from the extracted plurality of linguistic units is determined,wherein that at least one variation of each linguistic unit from the extracted plurality of linguistic units are present is determined,wherein a number of variations of each linguistic unit from the extracted plurality of linguistic units by utilizing a dictionary is determined,wherein the determined number of variations of each linguistic unit is tracked by utilizing a set of tables,wherein at least one start position and at least one end position with each linguistic unit from the extracted plurality of linguistic units is stored,wherein each linguistic unit from the extracted plurality of linguistic units includes a plurality of words in a contiguous sequence;
building a plurality of latent semantic dimensions based on the extracted plurality of linguistic units;
weighting the extracted plurality of linguistic units utilizing the built plurality of latent semantic dimensions;
determining a plurality of latent semantic differences between the received plurality of documents based on weighted plurality of linguistic units;
mapping the weighted plurality of linguistic units to a scaled visual feature; and
generating a visualization to the user of the received plurality of documents based on the determined plurality of latent semantic differences and the scaled visual feature,wherein a plurality of mark-ups is added to the mapped plurality of linguistic units,wherein at least one value associated with at least one dimension of the mapped plurality of linguistic units from each of the received plurality of documents is correlated with at least one value associated with a hue, a saturation and a lightness based on the determined plurality of latent semantic differences associated with the at least one dimension of the mapped plurality of linguistic units,wherein the at least one value associated with the hue, saturation and lightness is translated into a hexadecimal code.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer system, and a computer program product for comparing documents using latent semantic differences is provided. The present invention may include receiving documents from a user. The present invention may also include extracting linguistic units associated with the received documents. The present invention may then include building latent semantic dimensions based on the extracted linguistic units. The present invention may then include weighting the extracted linguistic units utilizing the built latent semantic dimensions. The present invention may then include determining latent semantic differences between the received documents based on weighted linguistic units. The present invention may also include mapping the weighted linguistic units to a scaled visual feature. The present invention may further include generating a visualization to the user of the received documents based on the determined latent semantic differences and the scaled visual feature.
30 Citations
17 Claims
-
1. A method for comparing documents using latent semantic differences, the method comprising:
-
receiving a plurality of documents from a user; extracting a plurality of linguistic units associated with the received plurality of documents, wherein a canonical unit for each linguistic unit from the extracted plurality of linguistic units is determined, wherein that at least one variation of each linguistic unit from the extracted plurality of linguistic units are present is determined, wherein a number of variations of each linguistic unit from the extracted plurality of linguistic units by utilizing a dictionary is determined, wherein the determined number of variations of each linguistic unit is tracked by utilizing a set of tables, wherein at least one start position and at least one end position with each linguistic unit from the extracted plurality of linguistic units is stored, wherein each linguistic unit from the extracted plurality of linguistic units includes a plurality of words in a contiguous sequence; building a plurality of latent semantic dimensions based on the extracted plurality of linguistic units; weighting the extracted plurality of linguistic units utilizing the built plurality of latent semantic dimensions; determining a plurality of latent semantic differences between the received plurality of documents based on weighted plurality of linguistic units; mapping the weighted plurality of linguistic units to a scaled visual feature; and generating a visualization to the user of the received plurality of documents based on the determined plurality of latent semantic differences and the scaled visual feature, wherein a plurality of mark-ups is added to the mapped plurality of linguistic units, wherein at least one value associated with at least one dimension of the mapped plurality of linguistic units from each of the received plurality of documents is correlated with at least one value associated with a hue, a saturation and a lightness based on the determined plurality of latent semantic differences associated with the at least one dimension of the mapped plurality of linguistic units, wherein the at least one value associated with the hue, saturation and lightness is translated into a hexadecimal code. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for comparing documents using latent semantic differences, comprising:
-
one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage medium, and program instructions stored on at least one of the one or more tangible storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising; receiving a plurality of documents from a user; extracting a plurality of linguistic units associated with the received plurality of documents, wherein a canonical unit for each linguistic unit from the extracted plurality of linguistic units is determined, wherein that at least one variation of each linguistic unit from the extracted plurality of linguistic units are present is determined, wherein a number of variations of each linguistic unit from the extracted plurality of linguistic units by utilizing a dictionary is determined, wherein the determined number of variations of each linguistic unit is tracked by utilizing a set of tables, wherein at least one start position and at least one end position with each linguistic unit from the extracted plurality of linguistic units is stored, wherein each linguistic unit from the extracted plurality of linguistic units includes a plurality of words in a contiguous sequence; building a plurality of latent semantic dimensions based on the extracted plurality of linguistic units; weighting the extracted plurality of linguistic units utilizing the built plurality of latent semantic dimensions; determining a plurality of latent semantic differences between the received plurality of documents based on weighted plurality of linguistic units; mapping the weighted plurality of linguistic units to a scaled visual feature; and generating a visualization to the user of the received plurality of documents based on the determined plurality of latent semantic differences and the scaled visual feature, wherein a plurality of mark-ups is added to the mapped plurality of linguistic units, wherein at least one value associated with at least one dimension of the mapped plurality of linguistic units from each of the received plurality of documents is correlated with at least one value associated with a hue, a saturation and a lightness based on the determined plurality of latent semantic differences associated with the at least one dimension of the mapped plurality of linguistic units, wherein the at least one value associated with the hue, saturation and lightness is translated into a hexadecimal code. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for comparing documents using latent semantic differences, comprising:
-
one or more computer-readable storage media and program instructions stored on at least one of the one or more non-transitory storage media, the program instructions executable by a processor to cause the processor to perform a method comprising; receiving a plurality of documents from a user; extracting a plurality of linguistic units associated with the received plurality of documents, wherein a canonical unit for each linguistic unit from the extracted plurality of linguistic units is determined, wherein that at least one variation of each linguistic unit from the extracted plurality of linguistic units are present is determined, wherein a number of variations of each linguistic unit from the extracted plurality of linguistic units by utilizing a dictionary is determined, wherein the determined number of variations of each linguistic unit is tracked by utilizing a set of tables, wherein at least one start position and at least one end position with each linguistic unit from the extracted plurality of linguistic units is stored, wherein each linguistic unit from the extracted plurality of linguistic units includes a plurality of words in a contiguous sequence; building a plurality of latent semantic dimensions based on the extracted plurality of linguistic units; weighting the extracted plurality of linguistic units utilizing the built plurality of latent semantic dimensions; determining a plurality of latent semantic differences between the received plurality of documents based on weighted plurality of linguistic units; mapping the weighted plurality of linguistic units to a scaled visual feature; and generating a visualization to the user of the received plurality of documents based on the determined plurality of latent semantic differences and the scaled visual feature, wherein a plurality of mark-ups is added to the mapped plurality of linguistic units, wherein at least one value associated with at least one dimension of the mapped plurality of linguistic units from each of the received plurality of documents is correlated with at least one value associated with a hue, a saturation and a lightness based on the determined plurality of latent semantic differences associated with the at least one dimension of the mapped plurality of linguistic units, wherein the at least one value associated with the hue, saturation and lightness is translated into a hexadecimal code. - View Dependent Claims (14, 15, 16, 17)
-
Specification