Discovery engine
First Claim
Patent Images
1. A system for semantically searching a group of documents containing words, exclusive of stop words of the documents, thereby improving efficiency by flatly looking at the words being searched without attempting to understand the meaning of the words, comprising:
- a memory containing a set of instructions; and
a processor for processing the set of instructions, wherein the instructions cause the processor to perform a method comprising;
receiving by the processor a current instance of a search criteria containing words;
determining by the processor a first total number of the words, exclusive of stop words, in the current instance of the search criteria;
storing in the memory by the processor the first total number;
for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, determining by the processor a respective first number of times that the word appears in the current instance of the search criteria;
storing in the memory by the processor the respective first number of times;
for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, calculating by the processor a first uniqueness score, respectively, for the word, respectively, based on the respective first number and the first total number;
storing in the memory by the processor the first uniqueness score, respectively, for the word, respectively;
for each of the words, exclusive of stop words, respectively, of the current instance of the search criteria and the documents, determining by the processor a respective second number of times that the word appears in the current instance of the search criteria and the documents;
storing in the memory by the processor the respective second number of times, as a first frequency score, respectively;
for each of the words, exclusive of stop words, of the current instance of the search criteria and the each of the documents, respectively, calculating by the processor a respective first significance magnitude factor based on the first frequency score, respectively, and the first uniqueness score, respectively;
storing in the memory by the processor the respective first significance magnitude factor;
determining by the processor a second total number of the words, exclusive of stop words, in the documents of the group;
storing in the memory by the processor the second total number;
for each of the words, exclusive of stop words, respectively, of the documents, respectively, determining by the processor a respective third number of times that the word appears in the documents of the group;
storing in the memory by the processor the respective third number of times;
for each of the words, exclusive of stop words, respectively, of the documents, calculating by the processor a second uniqueness score, respectively, for the word, respectively, based on the respective third number and the second total number;
storing in the memory by the processor the second uniqueness score, respectively, for the word, respectively;
for each of the words, exclusive of stop words of the documents, respectively, in each of the documents, respectively, determining by the processor a respective fourth number of times that the word appears in the document;
storing in the memory by the processor the respective fourth number, as a second frequency score, respectively;
for each of the words, exclusive of stop words, of the documents, calculating by the processor a respective second significance magnitude factor based on the second frequency score, respectively, and the second uniqueness score, respectively;
storing in the memory by the processor the respective second significance magnitude factor; and
for each document of the group, generating by the processor a respective similarity score of contents of the document to the current instance of the search criteria, wherein generating the respective similarity score includes characterizing each document based on the respective second significance magnitude factor compared to the respective first significance magnitude factor.
1 Assignment
0 Petitions
Accused Products
Abstract
A method that is relatively inexpensive to implement and that permits a user to conduct searches of electronically stored documents using an entire document, multiple documents or portions of a document as the search criteria and to collect, store and to share the relevant documents from the search.
10 Citations
27 Claims
-
1. A system for semantically searching a group of documents containing words, exclusive of stop words of the documents, thereby improving efficiency by flatly looking at the words being searched without attempting to understand the meaning of the words, comprising:
-
a memory containing a set of instructions; and a processor for processing the set of instructions, wherein the instructions cause the processor to perform a method comprising; receiving by the processor a current instance of a search criteria containing words; determining by the processor a first total number of the words, exclusive of stop words, in the current instance of the search criteria; storing in the memory by the processor the first total number; for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, determining by the processor a respective first number of times that the word appears in the current instance of the search criteria; storing in the memory by the processor the respective first number of times; for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, calculating by the processor a first uniqueness score, respectively, for the word, respectively, based on the respective first number and the first total number; storing in the memory by the processor the first uniqueness score, respectively, for the word, respectively; for each of the words, exclusive of stop words, respectively, of the current instance of the search criteria and the documents, determining by the processor a respective second number of times that the word appears in the current instance of the search criteria and the documents; storing in the memory by the processor the respective second number of times, as a first frequency score, respectively; for each of the words, exclusive of stop words, of the current instance of the search criteria and the each of the documents, respectively, calculating by the processor a respective first significance magnitude factor based on the first frequency score, respectively, and the first uniqueness score, respectively; storing in the memory by the processor the respective first significance magnitude factor; determining by the processor a second total number of the words, exclusive of stop words, in the documents of the group; storing in the memory by the processor the second total number; for each of the words, exclusive of stop words, respectively, of the documents, respectively, determining by the processor a respective third number of times that the word appears in the documents of the group; storing in the memory by the processor the respective third number of times; for each of the words, exclusive of stop words, respectively, of the documents, calculating by the processor a second uniqueness score, respectively, for the word, respectively, based on the respective third number and the second total number; storing in the memory by the processor the second uniqueness score, respectively, for the word, respectively; for each of the words, exclusive of stop words of the documents, respectively, in each of the documents, respectively, determining by the processor a respective fourth number of times that the word appears in the document; storing in the memory by the processor the respective fourth number, as a second frequency score, respectively; for each of the words, exclusive of stop words, of the documents, calculating by the processor a respective second significance magnitude factor based on the second frequency score, respectively, and the second uniqueness score, respectively; storing in the memory by the processor the respective second significance magnitude factor; and for each document of the group, generating by the processor a respective similarity score of contents of the document to the current instance of the search criteria, wherein generating the respective similarity score includes characterizing each document based on the respective second significance magnitude factor compared to the respective first significance magnitude factor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
-
receiving a current instance of search criteria of words; determining a first total number of words in the current instance of the search criteria; for each of the words in the current instance of the search criteria, determining a respective first number of times that the word appears in the current instance of the search criteria; for each of the words in the current instance of the search criteria, calculating a first uniqueness score, respectively, for the word in the search criteria based on the respective first number and the first total number; for each of the words of the search criteria and each document of at least one dataset, determining a respective second number of times that the word appears in the search criteria and the document; for each of the words of the current instance of the search criteria and the documents, calculating a respective first significance magnitude factor based on the respective second number and the first uniqueness score, respectively; determining a second total number of words in the documents; for each of the words, respectively, of each of the documents, respectively, determining a respective third number of times that the word appears in the document; for each of the words, respectively, of the documents, calculating a second uniqueness score, respectively, for the word in the documents; for each of the words of each document, determining a fourth number of times that the word appears in the document for each of the words of the documents, calculating a respective second significance magnitude factor based on the respective fourth number and the second uniqueness score, respectively; for each document of the at least one dataset, generating a respective similarity score of contents of the document to the current instance of the search criteria, wherein generating the respective similarity score includes characterizing each document based on the respective second significance magnitude factor compared to the respective first significance magnitude factor; thereby improving efficiency of data processing by flatly looking at the words being searched without attempting to understand the meaning of the words. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium having tangibly embodied thereon and accessible therefrom processor-executable instructions that, when executed by at least one data processing device of at least one computer, causes said at least one data processing device to perform a method comprising:
-
receiving a current instance of search criteria, wherein the current instance of the search criteria includes a uniform resource locator (URL); determining a first total number of words in the current instance of the search criteria; for each of the words in the current instance of the search criteria, determining a respective first number of times that the word appears in the current instance of the search criteria; for each of the words in the current instance of the search criteria, calculating a first uniqueness score, respectively, for the word in the search criteria based on the respective first number and the first total number; for each of the words of the search criteria and each document of at least one source of documents, performing a respective second number of times that the word each token appears in the search criteria and the document of the at least one source of documents; for each of the words of the current instance of the search criteria and the documents, calculating a respective first significance magnitude factor based on the respective second number and the first uniqueness score, respective; determining a second total number of words in the documents; for each of the words, respectively, of each of the documents, respective, determining a respective third number of times that the word appears in the document; for each of the words, respectively, of the documents, calculating a second uniqueness score, respectively, for the word in the documents; for each of the words of each document, determining a fourth number of times that the word appears in the document; for each of the words of the documents, calculating a respective second significance magnitude factor based on the respective fourth number and the second uniqueness score, respectively; and for each document in the at least one source of documents, generating a respective similarity score between the text used as the current instance of the search criteria and the document, wherein the similarity score is a function of the respective second significance magnitude factor and the respective first significance magnitude factor for the document; thereby improving efficiency of data processing by flatly looking at the words being searched without attempting to understand the meaning of the words. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
-
27. A method of semantically searching a group of documents containing words, the words are exclusive of stop words of the documents, by a computer including at least a processor and memory, thereby improving the efficiency of computer resources by flatly looking at the words being searched without attempting to understand the meaning of the words, comprising:
-
(A) indexing by the processor each document of the group by (a) counting a first count of a total number of the words contained in the documents of the group, (b) storing in the memory the first count, (c) for each of the words, respectively, of the documents of the group, respectively, counting a second count, respectively, of a number of times that the word appears in the documents of the group, (d) for each of the words, respectively, of the documents of the group, storing in the memory the second count, respectively, (e) for each of the words, respectively, of the documents of the group, calculating a uniqueness score, respectively, based on the second count, respectively, for the word, and the first count, and (f) for each of the words, respectively, of the documents of the group, storing in the memory the uniqueness score for the word; (B) indexing by the processor each document of the group by (a) for each of the words and for each of the documents, counting a third count, respectively, of a number of times the word appears in the document, (b) for each of the words for each of the documents, respectively, storing in the memory the third count, respectively, as a frequency score, respectively, (c) for each of the words and for each of the documents, calculating a first significance magnitude factor, respectively, based on the frequency score, respectively, and the uniqueness score, respectively, and (d) for each of the words for each of the documents, respectively, storing in the memory the first significance magnitude factor, respectively; (C) receiving by the processor a search criteria, the search criteria selected from the group consisting of;
any of the documents, any search words, any other document not in the group, any URL, and combinations;(D) indexing by the processor the search criteria and the documents of the group using the same steps set forth in (A) and (B) above using only words, exclusive of stop words, of the search criteria, to obtain a second significance magnitude factor, respectively, for each of the words, respectively, of the search criteria; (E) comparing the second significance factor for each of the words in the search criteria to the first significance factor for the words, respectively, in each of the documents of the group; (F) for each of the documents of the group, aggregating results of the comparing, for each of the words of the search criteria, into a similarity score, respectively, for the document in comparison to the search criteria; (G) presenting the similarity scores, respectively, for the documents, respectively, so that the documents in the group having significance in respect of the similarity scores can be utilized by a person looking for documents in the group that are similar to the search criteria.
-
Specification