SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION

US 20110270606A1
Filed: 04/29/2011
Published: 11/03/2011
Est. Priority Date: 04/30/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for comparing content overlap between a first document and a second document, comprising:

using a computer to parse a text of each of the first and second documents into constituent units;

using the computer to compute a digest of each of the first and second documents based on the constituent units;

using the computer to compare the computed digests; and

using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact).

Citations

39 Claims

1. A computer-implemented method for comparing content overlap between a first document and a second document, comprising:
- using a computer to parse a text of each of the first and second documents into constituent units;
  
  using the computer to compute a digest of each of the first and second documents based on the constituent units;
  
  using the computer to compare the computed digests; and
  
  using the computer to compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising using the computer to determine a date associated with the first document and a date associated with the second document, and using the computer to determine a direction of borrowing based on the determined dates.
  - 3. The method of claim 1, further comprising creating a graphical display to indicate the proportion of common contents and to indicate the proportion of distinct contents.
  - 4. The method of claim 1, further comprising creating a graphical display to indicate the proportion of common contents, the proportion of distinct contents, a length of the first document, and a length of the second document.
  - 5. The method of claim 1, wherein the step of using a computer to parse the text of each of the first and second documents into constituent units further comprises using the computer to parse the text of each of the first and second documents into sentences.
  - 6. The method of claim 5, wherein the step of using the computer to compute a digest of each of the first and second documents further comprises using the computer to determine a number of sentences contained in each of the first and second documents and determining a sentence signature for each of the sentences, and wherein the step of using the computer to compare the computed digests further comprises using the sentence signatures and the respective numbers of sentences contained in each of the first and second documents to perform the comparison.

7. A computer-implemented method for comparing content overlap between a target document and at least one additional document, comprising:
- using a computer to parse a text of each of the target and at least one additional document into constituent units;
  
  using the computer to identify named entities within each of the constituent units;
  
  using the computer to pair identified named entities which appear together within the constituent units; and
  
  using the computer to assess a similarity between the target document and the at least one additional document based on a result of the paired entities.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7, further comprising using the computer to determine a date associated with the target document and a date associated with the at least one additional document, and using the computer to determine a direction of borrowing based on the determined dates.
  - 9. The method of claim 7, wherein the step of using the computer to assess further comprises using the computer to calculate a similarity score by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities.
  - 10. The method of claim 7, further comprising the steps of:
    - using the computer to compute a digest of each of the target document and the at least one additional document based on the constituent units;
      
      using the computer to compare the digest corresponding to the target document to the digest corresponding to the at least one additional document; and
      
      using a result of the comparing to determine which text in the target document is original to the target document.
  - 11. The method of claim 10, wherein the step of using a computer to parse the text of each of the target and at least one additional documents into constituent units further comprises using the computer to parse the text of each of the target and at least one additional documents into sentences.
  - 12. The method of claim 11, wherein the step of using the computer to compute a digest of each of the target document and the at least one additional document further comprises using the computer to determine a number of sentences contained in each of the target and at least one additional documents and determining a sentence signature for each of the sentences, and wherein the step of using the computer to compare the computed digests further comprises using the sentence signatures and the respective numbers of sentences contained in each of the target and at least one additional documents to perform the comparison.
  - 13. The method of claim 7, wherein the step of using the computer to assess further comprises using the computer to calculate a first similarity score by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities for all text, and using the computer to calculate a second similarity score by applying a TF-IDF formula to the paired entities only for text that is determined to be original to the target document.

14. A system for comparing content overlap between documents, the system comprising:
- a server node in electronic communication with a user interface node, the server node being configured such that, when a user uses the user interface node to submit a content comparison request relating to a first document and a second document to the server node, the server node is configured to;
  
  parse a text of each of the first and second documents into constituent units;
  
  compute a digest of each of the first and second documents based on the constituent units; and
  
  compare the computed digests; and
  
  compute a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, wherein the server node is further configured to determine a date associated with the first document and a date associated with the second document, and to determine a direction of borrowing based on the determined dates.
  - 16. The system of claim 14, wherein the server node is further configured to create a graphical display to indicate the proportion of common contents and to indicate the proportion of distinct contents, and to cause the user interface node to display the created graphical display.
  - 17. The system of claim 14, wherein the server node is further configured to create a graphical display to indicate the proportion of common contents, the proportion of distinct contents, a length of the first document, and a length of the second document, and to cause the user interface node to display the created graphical display.
  - 18. The system of claim 14, wherein the server node is further configured to parse the text of each of the first and second documents into sentences.
  - 19. The system of claim 18, wherein the server node is further configured to:
    - compute the digests of the each of the first and second documents by determining a number of sentences contained in each of the first and second documents and determining a sentence signature for each of the sentences, andcompare the computed digests by using the sentence signatures and the respective numbers of sentences contained in each of the first and second documents to perform the comparison.

20. A system for comparing content overlap between documents, comprising:
- a server node in electronic communication with a user interface node, the server node being configured such that, when a user uses the user interface node to submit a content comparison request relating to a target document and at least one additional document, the server node is configured to;
  
  parse a text of each of the target and at least one additional document into constituent units;
  
  identify named entities within each of the constituent units;
  
  pair identified named entities which appear together within the constituent units; and
  
  assess a similarity between the target document and the at least one additional document based on a result of the paired entities.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The system of claim 20, wherein the server node is further configured to determine a date associated with the target document and a date associated with the at least one additional document, and to determine a direction of borrowing based on the determined dates.
  - 22. The system of claim 20, wherein the server node is further configured to calculate a similarity score relating to the target document and the at least one additional document by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities.
  - 23. The system of claim 20, wherein the server node is further configured to:
    - compute a digest of each of the target document and the at least one additional document based on the constituent units;
      
      compare the digest corresponding to the target document to the digest corresponding to the at least one additional document; and
      
      use a result of the comparing to determine which text in the target document is original to the target document.
  - 24. The system of claim 23, wherein the server node is further configured to parse the text of each of the target and at least one additional documents into sentences.
  - 25. The system of claim 24, wherein the server node is further configured to:
    - compute the digests of the each of the target document and the at least one additional document by determining a number of sentences contained in each of the target and at least one additional documents and determining a sentence signature for each of the sentences, andcompare the computed digests by using the sentence signatures and the respective numbers of sentences contained in each of the target and at least one additional documents to perform the comparison.
  - 26. The system of claim 20, wherein the server node is further configured to calculate a first similarity score by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities for all text, and to calculate a second similarity score by applying a TF-IDF formula to the paired entities only for text that is determined to be original to the target document.

27. A computer program product for comparing content overlap between a first document and a second document, the computer program product comprising a computer readable medium storing computer readable program code, the computer readable program code comprising:
- a set of instructions for parsing a text of each of the first and second documents into constituent units;
  
  a set of instructions for computing a digest of each of the first and second documents based on the constituent units;
  
  a set of instructions for comparing the computed digests; and
  
  a set of instructions for computing a proportion of common contents between the first and second documents and a proportion of distinct contents between the first and second documents based on the comparison.
- View Dependent Claims (28, 29, 30, 31, 32)
- - 28. The computer program product of claim 27, wherein the computer readable program code further comprises a set of instructions for determining a date associated with the first document and a date associated with the second document, and a set of instructions for determining a direction of borrowing based on the determined dates.
  - 29. The computer program product of claim 27, wherein the computer readable program code further comprises a set of instructions for creating a graphical display to indicate the proportion of common contents and to indicate the proportion of distinct contents.
  - 30. The computer program product of claim 27, wherein the computer readable program code further comprises a set of instructions for creating a graphical display to indicate the proportion of common contents, the proportion of distinct contents, a length of the first document, and a length of the second document.
  - 31. The computer program product of claim 27 wherein the computer readable program code further comprises a set of instructions for parsing the text of each of the first and second documents into sentences.
  - 32. The computer program product of claim 31, wherein the computer readable program code further comprises:
    - a set of instructions for determining a number of sentences contained in each of the first and second documents;
      
      a set of instructions for determining a sentence signature for each of the sentences; and
      
      a set of instructions for using the sentence signatures and the respective numbers of sentences contained in each of the first and second documents to perform a comparison between the first and second documents.

33. A computer program product for comparing content overlap between a target document and at least one additional document, the computer program product comprising a computer readable medium storing computer readable program code, the computer readable program code comprising:
- a set of instructions for parsing a text of each of the target and at least one additional document into constituent units;
  
  a set of instructions for identifying named entities within each of the constituent units;
  
  a set of instructions for pairing identified named entities which appear together within the constituent units; and
  
  a set of instructions for assessing a similarity between the target document and the at least one additional document based on a result of the paired entities.
- View Dependent Claims (34, 35, 36, 37, 38, 39)
- - 34. The computer program product of claim 33, wherein the computer readable program code further comprises a set of instructions for determining a date associated with the target document and a date associated with the at least one additional document, and a set of instructions for determining a direction of borrowing based on the determined dates.
  - 35. The computer program product of claim 33, wherein the computer readable program code further comprises a set of instructions for calculating a similarity score by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities.
  - 36. The computer program product of claim 33, wherein the computer readable program code further comprises:
    - a set of instructions for computing a digest of each of the target document and the at least one additional document based on the constituent units;
      
      a set of instructions for comparing the digest corresponding to the target document to the digest corresponding to the at least one additional document; and
      
      a set of instructions for using a result of the comparing to determine which text in the target document is original to the target document.
  - 37. The computer program product of claim 33 wherein the computer readable program code further comprises a set of instructions for parsing the text of each of the target and at least one additional documents into sentences.
  - 38. The computer program product of claim 37, wherein the computer readable program code further comprises:
    - a set of instructions for determining a number of sentences contained in each of the target and at least one additional documents;
      
      a set of instructions for determining a sentence signature for each of the sentences; and
      
      a set of instructions for using the sentence signatures and the respective numbers of sentences contained in each of the target and at least one additional documents to perform a comparison between the target and at least one additional documents.
  - 39. The computer program product of claim 33, wherein the computer readable program code further comprises a set of instructions for calculating a first similarity score by applying a Term Frequency-Inverse Document Frequency (TF-IDF) formula to the paired entities for all text, and a set of instructions for calculating a second similarity score by applying a TF-IDF formula to the paired entities only for text that is determined to be original to the target document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Contiem Incorporated
Original Assignee
Orbis Technology
Inventors
NIV, Michael, CROCHET, Larry

Granted Patent

US 9,489,350 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/194 Calculation of difference b...

SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links