Discriminating search results by phrase analysis
First Claim
Patent Images
1. A computer-implemented method comprising:
- parsing, by a server computing device, each document of a corpus of documents to determine phrases found in each of the documents;
analyzing, by the server computing device, each determined phrase with respect to each document to determine a frequency of occurrence of the phrase in the document relative to a frequency of occurrence of the phrase in the corpus;
identifying, by the server computing device, documents that comprise a same statistically improbable phrase, wherein the statistically improbably phrase is one of the determined phrases having both of;
a probability of occurrence in a document of the corpus of documents that is higher than probability of occurrence of other phrases in the document; and
a probability of occurrence in the corpus of documents that is lower than probability of occurrence of other phrases in the corpus of documents; and
grouping, by the server computing device, the identified documents that comprise the statistically improbable phrase into a single group of documents.
1 Assignment
0 Petitions
Accused Products
Abstract
A statistical analysis parses documents for phrases in the documents. Each document is analyzed with a phrase analysis engine to determine a key phrase that frequently occur throughout each document. One or more documents are grouped together based a corresponding statistically improbable phrase.
53 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
parsing, by a server computing device, each document of a corpus of documents to determine phrases found in each of the documents; analyzing, by the server computing device, each determined phrase with respect to each document to determine a frequency of occurrence of the phrase in the document relative to a frequency of occurrence of the phrase in the corpus; identifying, by the server computing device, documents that comprise a same statistically improbable phrase, wherein the statistically improbably phrase is one of the determined phrases having both of; a probability of occurrence in a document of the corpus of documents that is higher than probability of occurrence of other phrases in the document; and a probability of occurrence in the corpus of documents that is lower than probability of occurrence of other phrases in the corpus of documents; and grouping, by the server computing device, the identified documents that comprise the statistically improbable phrase into a single group of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A server comprising:
-
a processing device; a memory coupled to the processing device, the memory storing a corpus of documents; and a phrase analysis engine executable from the memory by the processing device, the phrase analysis engine comprising; a parser configured to parse each document of the corpus of documents to determine phrases found in each of the documents; an analyzer configured to; analyze each determined phrase with respect to each document to determine a frequency of occurrence of the phrase in the document relative to a frequency of occurrence of the phrase in the corpus; and identify documents that comprise a same statistically improbable phrase, wherein the statistically improbably phrase is one of the determined phrases having both of; a probability of occurrence in a document of the corpus of documents that is higher than probability of occurrence of other phrases in the document; and a probability of occurrence in the corpus of documents that is lower than probability of occurrence of other phrases in the corpus of documents; and a categorizer configured to group the identified documents that comprise the statistically improbable phrase into a single group of documents. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-accessible storage medium including data that, when accessed by a computer system, cause the computer system to perform a method comprising:
-
parsing, by a server computing device, each document of a corpus of documents to determine phrases found in each of the documents; analyzing, by the server computing device, each determined phrase with respect to each document to determine a frequency of occurrence of the phrase in the document relative to a frequency of occurrence of the phrase in the corpus; identifying, by the server computing device, documents that comprise a same statistically improbable phrase, wherein the statistically improbably phrase is one of the determined phrases having both of; a probability of occurrence in a document of the corpus of documents that is higher than probability of occurrence of other phrases in the document; and a probability of occurrence in the corpus of documents that is lower than probability of occurrence of other phrases in the corpus of documents; and grouping, by the server computing device, the identified documents that comprise the statistically improbable phrase into a single group of documents. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification