Systems and methods for semantic search, content correlation and visualization
First Claim
1. A computer-implemented method for comparing content overlap between a first electronic document and a second electronic document, comprising:
- receiving, at a computer, a search request from a user for documents containing one or more keywords;
using the computer to access an electronic database and present to the user a first list of one or more documents in the electronic database based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document;
receiving at the computer a request for the first electronic document via the first hyperlink;
determining, in response to the request for the first electronic document, a second list of documents in the electronic database that are similar to the first electronic document, the second list of documents including the second electronic document, and the determining comprising;
using the computer to parse a text of each of the first and second documents into constituent units;
using the computer to compute a digest of each of the first and second documents based on the constituent units;
using the computer to compare the computed digests;
using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison;
using the computer to determine a date associated with the first document and a date associated with the second document; and
using the computer to determine a direction of borrowing based on the determined dates; and
using the computer to display to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the direction of borrowing between the first document and the second document, wherein the graphic includes an arrow oriented to point in the borrowing direction showing a computed direction of flow of the information from a donor document to a borrower document, and wherein the graphic comprises a measure of relationship overlap between the first document and at least one of the second document and a selected portion of the second document.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact).
139 Citations
31 Claims
-
1. A computer-implemented method for comparing content overlap between a first electronic document and a second electronic document, comprising:
-
receiving, at a computer, a search request from a user for documents containing one or more keywords; using the computer to access an electronic database and present to the user a first list of one or more documents in the electronic database based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document; receiving at the computer a request for the first electronic document via the first hyperlink; determining, in response to the request for the first electronic document, a second list of documents in the electronic database that are similar to the first electronic document, the second list of documents including the second electronic document, and the determining comprising; using the computer to parse a text of each of the first and second documents into constituent units; using the computer to compute a digest of each of the first and second documents based on the constituent units; using the computer to compare the computed digests; using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison; using the computer to determine a date associated with the first document and a date associated with the second document; and using the computer to determine a direction of borrowing based on the determined dates; and using the computer to display to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the direction of borrowing between the first document and the second document, wherein the graphic includes an arrow oriented to point in the borrowing direction showing a computed direction of flow of the information from a donor document to a borrower document, and wherein the graphic comprises a measure of relationship overlap between the first document and at least one of the second document and a selected portion of the second document. - View Dependent Claims (2, 20, 21, 22, 23, 24)
-
-
3. A computer-implemented method for comparing content overlap between a target electronic document and at least one additional electronic document, comprising:
-
receiving, at a computer, a search request from a user for documents containing one or more keywords; using the computer to access an electronic database and present to the user a first list of one or more documents in the electronic database based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document; receiving at the computer a request for the first electronic document via the first hyperlink; determining, in response to the request for the first electronic document, a second list of documents in the electronic database that are similar to the first electronic document, the second list of documents including the second electronic document, and the determining comprising; using the computer to parse a text of each of the target and at least one additional document into constituent units; using the computer to identify named entities within each of the constituent units; using the computer to pair identified named entities which appear together within the same constituent units; and using the computer to assess a similarity between the target document and the at least one additional document based on a result of the paired entities; and using the computer to display to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the similarity between the target document and the at least one additional document with an indication of similarity, wherein the graphic comprises; a measure of relationship overlap between the target document and the at least one additional document; a measure of relationship overlap between the target document and a selected portion of the at least one additional document; a measure of a size of the target document; a measure of a size of the at least one additional document; and
a measure of a size of the textual overlap between the target document and the at least one additional document,wherein at least two of the measure of textual overlap between the target document and the at least one additional document;
the measure of relationship overlap between the target document and the at least one additional document; and
the measure of relationship overlap between the target document and a selected portion of the at least one additional document are displayed as bars having a length corresponding to a magnitude of the respective measure. - View Dependent Claims (4, 5, 6, 26, 27, 28, 29, 30, 31)
-
-
7. A system for comparing content overlap between electronic documents, the system comprising:
-
a server node in electronic communication with a user interface node and one or more electronic databases, the server node being configured to; receive a search request from a user for documents containing one or more keywords; access the one or more electronic databases and present to the user a first list of one or more documents in the one or more electronic databases based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document; receive from the user a request for the first electronic document via the first hyperlink; determine, in response to the request for the first electronic document, a second list of documents in the one or more electronic databases that are similar to the first electronic document, the second list of documents including the second electronic document, and wherein to determine the second list of documents the server node is configured to; parse a text of each of the first and second documents into constituent units; compute a digest of each of the first and second documents, compare the computed digests; compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison; determine a date associated with the first document and a date associated with the second document; determine a direction of borrowing based on the determined dates; and display to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the direction of borrowing between the first and second documents, wherein the indication of borrowing direction between the first document and the second document is displayed as an arrow oriented to point in the borrowing direction showing a computed direction of flow of the information from a donor document to a borrower document, and wherein the graphic comprises a measure of relationship overlap between the first document and at least one of the second document and a selected portion of the second document. - View Dependent Claims (8)
-
-
9. A system for comparing content overlap between electronic documents, comprising:
-
a server node in electronic communication with a user interface node and one or more electronic databases, the server node being configured to; receive a search request from a user for documents containing one or more keywords; access the one or more electronic databases and present to the user a first list of one or more documents in the one or more electronic databases based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document; receive from the user a request for the first electronic document via the first hyperlink; determine, in response to the request for the first electronic document, a second list of documents in the one or more electronic databases that are similar to the first electronic document, the second list of documents including the second electronic document, and wherein to determine the second list of documents the server node is configured to; parse a text of each of the target and at least one additional document into constituent units; identify named entities within each of the constituent units; pair identified named entities which appear together within the same constituent units; assess a similarity between the target document and the at least one additional document based on a result of the paired entities; and
,display to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the similarity between the target document and the at least one additional document with an indication of similarity, wherein the graphic comprises; a measure of relationship overlap between the target document and the at least one additional document; a measure of relationship overlap between the target document and a selected portion of the at least one additional document; a measure of a size of the target document; a measure of a size of the at least one additional document; and a measure of a size of the textual overlap between the target document and the at least one additional document, wherein at least two of the measure of textual overlap between the target document and the at least one additional document;
the measure of relationship overlap between the target document and the at least one additional document; and
the measure of relationship overlap between the target document and a selected portion of the at least one additional document are displayed as bars having a length corresponding to a magnitude of the respective measure. - View Dependent Claims (10, 11, 12)
-
-
13. A computer program product for comparing content overlap between a first electronic document and a second electronic document, the computer program product comprising a non-transitory computer readable medium storing computer readable program code, the computer readable program code comprising:
-
a set of instructions for receiving a search request from a user for documents containing one or more keywords; a set of instructions for accessing an electronic database and presenting to the user a first list of one or more documents in the electronic database based on the one or more keywords, the first list of one or more documents including a first hyperlink for the first electronic document; a set of instructions for receiving a request for the first electronic document via the first hyperlink; a set of instructions for determining, in response to the request for the first electronic document, a second list of documents in the electronic database that are similar to the first electronic document, the second list of documents including the second electronic document, and the set of instructions for determining comprising; a set of instructions for parsing a text of each of the first and second documents into constituent units; a set of instructions for computing a digest of each of the first and second documents based on the constituent units; a set of instructions for comparing the computed digests; a set of instructions for computing a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison; and a set of instructions for determining a date associated with the first document and a date associated with the second document, and a set of instructions for determining a direction of borrowing based on the determined dates; and a set of instructions for displaying to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the direction of borrowing between the first and second documents, wherein the indication of borrowing direction between the first document and the second document is displayed as an arrow oriented to point in the borrowing direction showing a computed direction of flow of the information from a donor document to a borrower document, and wherein the graphic comprises a measure of relationship overlap between the first document and at least one of the second document and a selected portion of the second document. - View Dependent Claims (14, 15)
-
-
16. A computer program product for comparing content overlap between a target electronic document and at least one additional electronic document, the computer program product comprising a non-transitory computer readable medium storing computer readable program code, the computer readable program code comprising:
-
a set of instructions for receiving a search request from a user for documents containing one or more keywords; a set of instructions for accessing an electronic database and presenting to the user a first list of one or more documents in the electronic database based on the one or more keywords, the first list of one or more documents including a first hyperlink for the target document; a set of instructions for receiving a request for the target document via the first hyperlink; a set of instructions for determining, in response to the request for the target document, a second list of documents in the electronic database that are similar to the target document, the second list of documents including at least one additional electronic document, and the set of instructions for determining comprising; a set of instructions for parsing a text of each of the target and the at least one additional document into constituent units; a set of instructions for identifying named entities within each of the constituent units; a set of instructions for pairing identified named entities which appear together within the same constituent units; a set of instructions for assessing a similarity between the target document and the at least one additional document based on a result of the paired entities; and
,a set of instructions for displaying to the user the contents of the first electronic document and a hyperlink to the second electronic document and a graphic indicating the similarity between the target document and the at least one additional document with an indication of similarity, wherein the graphic comprises; a measure of relationship overlap between the target document and the at least one additional document, a measure of relationship overlap between the target document and a selected portion of the at least one additional document; a measure of a size of the target document; a measure of a size of the at least one additional document; and
a measure of a size of the textual overlap between the target document and the at least one additional document,wherein at least two of the measure of textual overlap between the target document and the at least one additional document;
the measure of relationship overlap between the target document and the at least one additional document; and
the measure of relationship overlap between the target document and a selected portion of the at least one additional document are displayed as bars having a length corresponding to a magnitude of the respective measure. - View Dependent Claims (17, 18, 19)
-
-
25. The method of claim a 24, wherein the step of using the computer to compute a digest of each of the first and second documents further comprises using the computer to determine a number of sentences contained in each of the first and second documents and determining a sentence signature for each of the sentences, and wherein the step of using the computer to compare the computed digests further comprises using the sentence signatures and the respective numbers of sentences contained in each of the first and second documents to perform the comparison.
Specification