×

Discovery engine

  • US 9,507,867 B2
  • Filed: 03/06/2014
  • Issued: 11/29/2016
  • Est. Priority Date: 04/06/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for semantically searching a group of documents containing words, exclusive of stop words of the documents, thereby improving efficiency by flatly looking at the words being searched without attempting to understand the meaning of the words, comprising:

  • a memory containing a set of instructions; and

    a processor for processing the set of instructions, wherein the instructions cause the processor to perform a method comprising;

    receiving by the processor a current instance of a search criteria containing words;

    determining by the processor a first total number of the words, exclusive of stop words, in the current instance of the search criteria;

    storing in the memory by the processor the first total number;

    for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, determining by the processor a respective first number of times that the word appears in the current instance of the search criteria;

    storing in the memory by the processor the respective first number of times;

    for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, calculating by the processor a first uniqueness score, respectively, for the word, respectively, based on the respective first number and the first total number;

    storing in the memory by the processor the first uniqueness score, respectively, for the word, respectively;

    for each of the words, exclusive of stop words, respectively, of the current instance of the search criteria and the documents, determining by the processor a respective second number of times that the word appears in the current instance of the search criteria and the documents;

    storing in the memory by the processor the respective second number of times, as a first frequency score, respectively;

    for each of the words, exclusive of stop words, of the current instance of the search criteria and the each of the documents, respectively, calculating by the processor a respective first significance magnitude factor based on the first frequency score, respectively, and the first uniqueness score, respectively;

    storing in the memory by the processor the respective first significance magnitude factor;

    determining by the processor a second total number of the words, exclusive of stop words, in the documents of the group;

    storing in the memory by the processor the second total number;

    for each of the words, exclusive of stop words, respectively, of the documents, respectively, determining by the processor a respective third number of times that the word appears in the documents of the group;

    storing in the memory by the processor the respective third number of times;

    for each of the words, exclusive of stop words, respectively, of the documents, calculating by the processor a second uniqueness score, respectively, for the word, respectively, based on the respective third number and the second total number;

    storing in the memory by the processor the second uniqueness score, respectively, for the word, respectively;

    for each of the words, exclusive of stop words of the documents, respectively, in each of the documents, respectively, determining by the processor a respective fourth number of times that the word appears in the document;

    storing in the memory by the processor the respective fourth number, as a second frequency score, respectively;

    for each of the words, exclusive of stop words, of the documents, calculating by the processor a respective second significance magnitude factor based on the second frequency score, respectively, and the second uniqueness score, respectively;

    storing in the memory by the processor the respective second significance magnitude factor; and

    for each document of the group, generating by the processor a respective similarity score of contents of the document to the current instance of the search criteria, wherein generating the respective similarity score includes characterizing each document based on the respective second significance magnitude factor compared to the respective first significance magnitude factor.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×