Fixed phrase detection for search
First Claim
1. A computer-implemented method facilitating a search of a collection of content, comprising:
- under control of one or more computer systems configured with executable instructions,associating keywords of previous searches of the collection of content with topics in accordance with a latent Dirichlet allocation;
for each of the topics,identifying candidate phrases of the previous searches that contain at least one of the keywords associated with the topic;
determining pointwise mutual information scores for the candidate phrases; and
selecting at least some of the candidate phrases as fixed phrases based at least in part on a greatest pointwise mutual information score determined for the topic;
determining relevance scores for each of at least some of the keywords and each of at least some of the fixed phrases with respect to the collection of content; and
providing at least one result of the search for presentation, said at least one result at least referencing content selected from the collection of content based at least in part on the determined relevance scores.
2 Assignments
0 Petitions
Accused Products
Abstract
A set of search requests may be analyzed to detect fixed phrases suitable for inclusion in a search index. Sets of candidate phrases may be identified among the search requests. Fixed phrases may be detected among the candidate phrases using statistical techniques, for example, by identifying phrases having a relatively high pointwise mutual information (PMI) with respect to component keywords. Fixed phrase detection may include keyword and/or phrase clustering. Clusters may correspond to topics defined using a latent Dirichlet allocation (LDA) procedure. Fixed phrase detection may include identifying phrases having relatively high PMI within particular clusters.
-
Citations
25 Claims
-
1. A computer-implemented method facilitating a search of a collection of content, comprising:
under control of one or more computer systems configured with executable instructions, associating keywords of previous searches of the collection of content with topics in accordance with a latent Dirichlet allocation; for each of the topics, identifying candidate phrases of the previous searches that contain at least one of the keywords associated with the topic; determining pointwise mutual information scores for the candidate phrases; and selecting at least some of the candidate phrases as fixed phrases based at least in part on a greatest pointwise mutual information score determined for the topic; determining relevance scores for each of at least some of the keywords and each of at least some of the fixed phrases with respect to the collection of content; and providing at least one result of the search for presentation, said at least one result at least referencing content selected from the collection of content based at least in part on the determined relevance scores. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-implemented method facilitating a search, comprising:
under control of one or more computer systems configured with executable instructions, clustering search terms extracted from previous searches into a plurality of search term clusters independent of human supervision; identifying at least one search phrase comprising at least two search terms from a distinguished cluster of the plurality of search term clusters as having a relative high pointwise mutual information score with respect to search phrases of the distinguished cluster; determining at least one relevance score for said at least one search phrase with respect to a collection of content; and providing at least one result of the search for presentation, said at least one result at least referencing content selected from the collection of content based at least in part on said at least one relevance score. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
15. A computer-implemented method facilitating a search, comprising:
under control of one or more computer systems configured with executable instructions, determining pointwise mutual information scores for search phrases of previous searches; selecting at least one of the search phrases for which the pointwise mutual information score is greater than a threshold; determining at least one relevance score for said at least one search phrase with respect to a collection of content; and providing at least one result of the search for presentation, said at least one result at least referencing content selected from the collection of content based at least in part on said at least one relevance score. - View Dependent Claims (16, 17, 18, 19)
-
20. A computerized system facilitating a search, comprising:
-
a fixed phrase detector configured to, at least; cluster keywords of previous searches into a plurality of keyword clusters; and detect at least one fixed phrase at least in part by identifying at least one search phrase comprising keywords of a distinguished cluster of the keyword clusters as having a relative high pointwise mutual information score with respect to counts of search phrases comprising keywords of the distinguished cluster; an index maintenance module configured at least to update a search index comprising a relevance score of each of a plurality of keywords and said at least one fixed phrase for at least some of a collection of content; a search module configured at least to provide at least one result of the search for presentation based at least in part on the search index; and one or more hardware processors collectively facilitating at least the fixed phrase detector, the index maintenance module and the search module. - View Dependent Claims (21, 22)
-
-
23. One or more non-transitory computer-readable media having collectively thereon computer-executable instructions that configure one or more computers to collectively, at least:
-
cluster keywords of previous searches into a plurality of keyword cluster independent of human supervision; identify at least one search phrase comprising keywords of a distinguished cluster of the keyword clusters as having a relative high pointwise mutual information score with respect to counts of search phrases comprising keywords of the distinguished cluster; determine at least one relevance score for said at least one search phrase with respect to a collection of content; and provide at least one search result for presentation, said at least one result at least referencing content selected from the collection of content based at least in part on said at least one relevance score. - View Dependent Claims (24, 25)
-
Specification