DISCOVERY ENGINE
First Claim
Patent Images
1. A system, comprising:
- a memory containing a set of instructions; and
a processor for processing the set of instructions, wherein the instruction cause the processor to perform a method comprising;
receiving a current instance of search criteria;
determining tokens in the current instance of the search criteria;
for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and
for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token.
1 Assignment
0 Petitions
Accused Products
Abstract
A method that is relatively inexpensive to implement and that permits a user to conduct searches of electronically stored documents using an entire document, multiple documents or portions of a document as the search criteria and to collect, store and to share the relevant documents from the search.
23 Citations
26 Claims
-
1. A system, comprising:
-
a memory containing a set of instructions; and a processor for processing the set of instructions, wherein the instruction cause the processor to perform a method comprising; receiving a current instance of search criteria; determining tokens in the current instance of the search criteria; for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
-
receiving a current instance of search criteria; determining tokens in the current instance of the search criteria; for each document of at least one dataset, determining each token that has at least one occurrence thereof within the current instance of the search criteria and within the document; and for each document of the at least one dataset, generating a similarity score indicating a degree of relevance of contents of the document to the current instance of the search criteria, wherein generating a similarity score includes characterizing similarity based on a number of times each token present in both the document and the current instance of the search criteria and based on uniqueness of each token with respect to each other token. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
-
receiving a current instance of search criteria, wherein the current instance of the search criteria includes a uniform resource locator (URL); determining tokens in the current instance of the search criteria; for each document of at least one source of documents, performing a first frequency count for characterizing a number of times that each one of the tokens occurs within the text used as the current instance of the search criteria in comparison to each one of the documents in the at least one source of documents; for each one of the tokens, performing a second frequency count for characterizing an aggregate number of times that a particular one of the tokens occurs within all of the documents in the at least one source of documents; and for each document in the at least one source of documents, generating a similarity score between the text used as the current instance of the search criteria and a particular one of the documents, wherein the similarity score is a function of the first frequency count for the particular one of the documents and the second frequency count for each token in the particular one of the documents. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. The non-transitory computer-readable medium of claim 27 wherein normalizing the similarity scores includes:
-
for each one of the sources of documents, determining an arithmetic mean of the similarity scores for all of the documents in a particular one of the source of documents; for each one of the sources of documents, generating a dataset normalized similarity score for each document of the particular one of the source of documents dependent upon the arithmetic mean of the similarity scores for all of the documents therein; and for each one of the documents of each one of the sources of documents, determining relevance of each one of the documents dependent upon the normalized similarity score thereof.
-
Specification