×

Discriminating search results by phrase analysis

  • US 8,396,850 B2
  • Filed: 02/27/2009
  • Issued: 03/12/2013
  • Est. Priority Date: 02/27/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • parsing, by a server computing device, each document of a corpus of documents to determine phrases found in each of the documents;

    analyzing, by the server computing device, each determined phrase with respect to each document to determine a frequency of occurrence of the phrase in the document relative to a frequency of occurrence of the phrase in the corpus;

    identifying, by the server computing device, documents that comprise a same statistically improbable phrase, wherein the statistically improbably phrase is one of the determined phrases having both of;

    a probability of occurrence in a document of the corpus of documents that is higher than probability of occurrence of other phrases in the document; and

    a probability of occurrence in the corpus of documents that is lower than probability of occurrence of other phrases in the corpus of documents; and

    grouping, by the server computing device, the identified documents that comprise the statistically improbable phrase into a single group of documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×