SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION
First Claim
1. A computer-implemented method for comparing content overlap between a first document and a second document, comprising:
- using a computer to parse a text of each of the first and second documents into constituent units;
using the computer to compute a digest of each of the first and second documents based on the constituent units;
using the computer to compare the computed digests; and
using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact).
-
Citations
39 Claims
-
1. A computer-implemented method for comparing content overlap between a first document and a second document, comprising:
-
using a computer to parse a text of each of the first and second documents into constituent units; using the computer to compute a digest of each of the first and second documents based on the constituent units; using the computer to compare the computed digests; and using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method for comparing content overlap between a target document and at least one additional document, comprising:
-
using a computer to parse a text of each of the target and at least one additional document into constituent units; using the computer to identify named entities within each of the constituent units; using the computer to pair identified named entities which appear together within the constituent units; and using the computer to assess a similarity between the target document and the at least one additional document based on a result of the paired entities. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A system for comparing content overlap between documents, the system comprising:
-
a server node in electronic communication with a user interface node, the server node being configured such that, when a user uses the user interface node to submit a content comparison request relating to a first document and a second document to the server node, the server node is configured to; parse a text of each of the first and second documents into constituent units; compute a digest of each of the first and second documents based on the constituent units; and compare the computed digests; and compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A system for comparing content overlap between documents, comprising:
-
a server node in electronic communication with a user interface node, the server node being configured such that, when a user uses the user interface node to submit a content comparison request relating to a target document and at least one additional document, the server node is configured to; parse a text of each of the target and at least one additional document into constituent units; identify named entities within each of the constituent units; pair identified named entities which appear together within the constituent units; and assess a similarity between the target document and the at least one additional document based on a result of the paired entities. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
-
27. A computer program product for comparing content overlap between a first document and a second document, the computer program product comprising a computer readable medium storing computer readable program code, the computer readable program code comprising:
-
a set of instructions for parsing a text of each of the first and second documents into constituent units; a set of instructions for computing a digest of each of the first and second documents based on the constituent units; a set of instructions for comparing the computed digests; and a set of instructions for computing a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison. - View Dependent Claims (28, 29, 30, 31, 32)
-
-
33. A computer program product for comparing content overlap between a target document and at least one additional document, the computer program product comprising a computer readable medium storing computer readable program code, the computer readable program code comprising:
-
a set of instructions for parsing a text of each of the target and at least one additional document into constituent units; a set of instructions for identifying named entities within each of the constituent units; a set of instructions for pairing identified named entities which appear together within the constituent units; and a set of instructions for assessing a similarity between the target document and the at least one additional document based on a result of the paired entities. - View Dependent Claims (34, 35, 36, 37, 38, 39)
-
Specification