Substitute term identification based on over-represented terms identification
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving an original query that includes one or more query terms;
determining, by one or more computers, not to apply a weak query term substitution rule to the original query, wherein the weak query term substitution rule identifies a particular term as a substitute for one or more of the query terms;
after determining not to apply the weak query term substitution rule to the original query, obtaining an initial set of search results from a text corpus of indexed resources;
determining, using the particular term'"'"'s frequency-inverse document frequency (tf-idf) weight and by one or more computers, that the particular term occurs in text associated with a subset of the initial set of search results at a higher rate than the particular term occurs in the text corpus as a whole;
in response to determining that the particular term occurs in text associated with the subset of the initial set of search results at the higher rate than the particular term occurs in the text corpus as a whole, applying the weak query term substitution rule to the original query, to revise the original query to include the particular term; and
obtaining a subsequent set of search results in response to the revised query.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying substitute terms. According to one implementation, a method includes receiving an original query that includes one or more query terms; obtaining initial search results in response to the original query; identifying an over-represented term in text associated with a subset of the initial search results; determining that the over-represented term is associated with a particular query term; and in response to determining the over-represented term is associated with the particular query term, revising the original query to include the over-represented term.
-
Citations
36 Claims
-
1. A computer-implemented method comprising:
-
receiving an original query that includes one or more query terms; determining, by one or more computers, not to apply a weak query term substitution rule to the original query, wherein the weak query term substitution rule identifies a particular term as a substitute for one or more of the query terms; after determining not to apply the weak query term substitution rule to the original query, obtaining an initial set of search results from a text corpus of indexed resources; determining, using the particular term'"'"'s frequency-inverse document frequency (tf-idf) weight and by one or more computers, that the particular term occurs in text associated with a subset of the initial set of search results at a higher rate than the particular term occurs in the text corpus as a whole; in response to determining that the particular term occurs in text associated with the subset of the initial set of search results at the higher rate than the particular term occurs in the text corpus as a whole, applying the weak query term substitution rule to the original query, to revise the original query to include the particular term; and obtaining a subsequent set of search results in response to the revised query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium storing software having stored thereon instructions, which, when executed by one or more computers, cause the one or more computers to perform operations of:
-
receiving an original query that includes one or more query terms; determining not to apply a weak query term substitution rule to the original query, wherein the weak query term substitution rule identifies a particular term as a substitute for one or more of the query terms; after determining not to apply the weak query term substitution rule to the original query, obtaining an initial set of search results from a text corpus of indexed resources; determining, using the particular term'"'"'s frequency-inverse document frequency (tf-idf) weight, that the particular term occurs in text associated with a subset of the initial set of search results at a higher rate than the particular term occurs in the text corpus as a whole; in response to determining that the particular term occurs in text associated with the subset of the initial set of search results at the higher rate than the particular term occurs in the text corpus as a whole, applying the weak query term substitution rule to the original query, to revise the original query to include the particular term; and obtaining a subsequent set of search results in response to the revised query. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or computers, to cause the one or more computers to perform operations comprising; receiving an original query that includes one or more query terms; determining not to apply a weak query term substitution rule to the original query, wherein the weak query term substitution rule identifies a particular term as a substitute for one or more of the query terms; after determining not to apply the weak query term substitution rule to the original query, obtaining an initial set of search results from a text corpus of indexed resources; determining, using the particular term'"'"'s frequency-inverse document frequency (tf-idf) weight, that the particular term occurs in text associated with a subset of the initial set of search results at a higher rate than the particular term occurs in the text corpus as a whole; in response to determining that the particular term occurs in text associated with the subset of the initial set of search results at the higher rate than the particular term occurs in the text corpus as a whole, applying the weak query term substitution rule to the original query, to revise the original query to include the particular term; and obtaining a subsequent set of search results in response to the revised query. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification