DISCOVERY ENGINE

US 20140236941A1
Filed: 03/06/2014
Published: 08/21/2014
Est. Priority Date: 04/06/2012
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a memory containing a set of instructions; and

a processor for processing the set of instructions, wherein the instruction cause the processor to perform a method comprising;

receiving a current instance of search criteria;

determining tokens in the current instance of the search criteria;

for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and

for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method that is relatively inexpensive to implement and that permits a user to conduct searches of electronically stored documents using an entire document, multiple documents or portions of a document as the search criteria and to collect, store and to share the relevant documents from the search.

23 Citations

View as Search Results

26 Claims

1. A system, comprising:
- a memory containing a set of instructions; and
  
  a processor for processing the set of instructions, wherein the instruction cause the processor to perform a method comprising;
  
  receiving a current instance of search criteria;
  
  determining tokens in the current instance of the search criteria;
  
  for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and
  
  for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system of claim 1 wherein:
    - the current instance of the search criteria includes a uniform resource locator (URL); and
      
      receiving the current instance of search criteria includes accessing information residing at a location designated by the URL, extracting at least a portion of the information that is in a format native to the location designated by the URL and generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 3. The system of claim 2 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one dataset for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 4. The system of claim 1 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one dataset for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 5. The system of claim 1 wherein generating the similarity score includes normalizing at least a portion of the first frequency counts as a function of page count of the respective one of the documents with respect to one or more other documents in the source of documents.
  - 6. The system of claim 5 wherein:
    - the current instance of the search criteria includes a uniform resource locator (URL); and
      
      receiving the current instance of search criteria includes accessing information residing at a location designated by the URL, extracting at least a portion of the information that is in a format native to the location designated by the URL and generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 7. The system of claim 6 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one dataset for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 8. The system of claim 1 wherein:
    - similarity scores have been generated for documents from a first dataset and for documents from a second dataset; and
      
      the method further comprises normalizing the similarity scores of each one of the documents from the first dataset and each one of the documents from the second dataset with respect to all documents of the first and second datasets.
  - 9. The system of claim 8 wherein normalizing the similarity scores includes:
    - for each one of the datasets, determining an arithmetic mean of the similarity scores for all of the documents in a particular one of the datasets;
      
      for each one of the datasets, generating a dataset normalized similarity score for each document of the particular one of the datasets dependent upon the arithmetic mean of the similarity scores for all of the documents therein; and
      
      for each one of the documents of each one of the datasets, determining relevance of each one of the documents dependent upon the normalized similarity score thereof.
  - 10. The system of claim 8 wherein:
    - the current instance of the search criteria includes a uniform resource locator (URL); and
      
      receiving the current instance of search criteria includes accessing information residing at a location designated by the URL, extracting at least a portion of the information that is in a format native to the location designated by the URL and generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 11. The system of claim 8 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one dataset for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 12. The system of claim 8 wherein generating the similarity score includes normalizing at least a portion of the first frequency counts as a function of page count of the respective one of the documents with respect to one or more other documents in the source of documents.

13. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
- receiving a current instance of search criteria;
  
  determining tokens in the current instance of the search criteria;
  
  for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and
  
  for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The non-transitory computer-readable medium of claim 13 wherein:
    - the current instance of the search criteria includes a uniform resource locator (URL); and
      
      receiving the current instance of search criteria includes accessing information residing at a location designated by the URL, extracting at least a portion of the information that is in a hypertext markup language (HTML) format and generating search criteria in an extensible markup language (XML) format from at least a portion of the identified information in the HTML format.
  - 15. The non-transitory computer-readable medium of claim 13 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one dataset for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 16. The non-transitory computer-readable medium of claim 13 wherein generating the similarity score includes normalizing at least a portion of the first frequency counts as a function of page count of the respective one of the documents with respect to one or more other documents in the source of documents.
  - 17. The non-transitory computer-readable medium of claim 16 wherein:
    - the current instance of the search criteria includes a uniform resource locator (URL); and
      
      receiving the current instance of search criteria includes accessing information residing at a location designated by the URL, extracting at least a portion of the information that is in a format native to the location designated by the URL and generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 18. The non-transitory computer-readable medium of claim 13 wherein:
    - similarity scores have been generated for documents from a first dataset and for documents from a second dataset; and
      
      the method further comprises normalizing the similarity scores of each one of the documents from the first dataset and each one of the documents from the second dataset with respect to all documents of the first and second datasets.
  - 19. The non-transitory computer-readable medium of claim 18 wherein normalizing the similarity scores includes:
    - for each one of the datasets, determining an arithmetic mean of the similarity scores for all of the documents in a particular one of the datasets;
      
      for each one of the datasets, generating a dataset normalized similarity score for each document of the particular one of the datasets dependent upon the arithmetic mean of the similarity scores for all of the documents therein; and
      
      for each one of the documents of each one of the datasets, determining relevance of each one of the documents dependent upon the normalized similarity score thereof.

20. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
- receiving a current instance of search criteria, wherein the current instance of the search criteria includes a uniform resource locator (URL);
  
  determining tokens in the current instance of the search criteria;
  
  for each document of at least one source of documents, performing a first frequency count for characterizing a number of times that each one of the tokens occurs within the text used as the current instance of the search criteria in comparison to each one of the documents in the at least one source of documents;
  
  for each one of the tokens, performing a second frequency count for characterizing an aggregate number of times that a particular one of the tokens occurs within all of the documents in the at least one source of documents; and
  
  for each document in the at least one source of documents, generating a similarity score between the text used as the current instance of the search criteria and a particular one of the documents, wherein the similarity score is a function of the first frequency count for the particular one of the documents and the second frequency count for each token in the particular one of the documents.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The non-transitory computer-readable medium of claim 20 wherein receiving the current instance of search criteria includes:
    - accessing information residing at a location designated by the URL;
      
      extracting at least a portion of the information that is in a format native to the location designated by the URL; and
      
      generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 22. The non-transitory computer-readable medium of claim 20 wherein the method further comprises:
    - sorting the similarity scores for at least a portion of the documents of the at least one source of documents for creating a set of similarity scores associated with the current instance of the search criteria;
      
      enabling a document corresponding to one of the similarity scores of the set to designated as a next instance of the search criteria; and
      
      causing the method to be performed for the next of the search in which the next instance of the search criteria is used as the current instance of the search criteria.
  - 23. The non-transitory computer-readable medium of claim 20 wherein generating the similarity score includes normalizing at least a portion of the first frequency counts as a function of page count of the respective one of the documents with respect to one or more other documents in the at least one source of documents.
  - 24. The non-transitory computer-readable medium of claim 23 wherein receiving the current instance of search criteria includes:
    - accessing information residing at a location designated by the URL;
      
      extracting at least a portion of the information that is in a format native to the location designated by the URL; and
      
      generating search criteria in a text-based format from at least a portion of the identified information in the format native to the location designated by the URL.
  - 25. The non-transitory computer-readable medium of claim 20 wherein:
    - similarity scores have been generated for documents from a first source of documents and for documents from a second source of documents; and
      
      the method further comprises normalizing the similarity scores of each one of the documents from the first source of documents and each one of the documents from the second source of documents with respect to all documents of the first and second source of documents.

26. The non-transitory computer-readable medium of claim 27 wherein normalizing the similarity scores includes:
- for each one of the sources of documents, determining an arithmetic mean of the similarity scores for all of the documents in a particular one of the source of documents;
  
  for each one of the sources of documents, generating a dataset normalized similarity score for each document of the particular one of the source of documents dependent upon the arithmetic mean of the similarity scores for all of the documents therein; and
  
  for each one of the documents of each one of the sources of documents, determining relevance of each one of the documents dependent upon the normalized similarity score thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Enlyton, Inc.
Original Assignee
Enlyton, Inc.
Inventors
McKinzie, Chris, Johns, Mark Ellingham

Granted Patent

US 9,507,867 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/730
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 16/953   Querying, e.g. by the use o...

G06F 16/9535   Search customisation based ...

G06F 16/954   Navigation, e.g. using cate...

G06F 16/9566   URL specific, e.g. using al...

DISCOVERY ENGINE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

DISCOVERY ENGINE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links