Identification of semantic units from within a search query
First Claim
Patent Images
1. A method performed by a server device, the method comprising:
- receiving, by the server device, information identifying documents selected in response to a search query;
generating, by a processor of the server device, a plurality of substrings from the received search query, each of the plurality of substrings including at least two words;
detecting, by the processor of the server device, an actual occurrence of one or more substrings, of the plurality of substrings, in one or more of the documents;
calculating, by the processor of the server device, for each of the one or more substrings generated from the search query, a value that indicates a fraction of the documents in which an actual occurrence of the substring is detected;
determining, by the processor of the server device, based on the calculated value for a particular substring of the one or more substrings, that the particular substring consists of words that form a single compound unit, the determining including;
determining that the calculated value exceeds a threshold associated with compound units of words; and
identifying, by the processor of the server device, a set of relevant documents based on determining that the particular substring forms the single compound unit.
2 Assignments
0 Petitions
Accused Products
Abstract
A search engine for searching a corpus improves the relevancy of the results by classifying multiple terms in a search query as a single semantic unit. A semantic unit locator of the search engine generates a subset of documents that are generally relevant to the query based on the individual terms within the query. Combinations of search terms that define potential semantic units from the query are then evaluated against the subset of documents to determine which combinations of search terms should be classified as a semantic unit. The resultant semantic units are used to refine the results of the search.
-
Citations
12 Claims
-
1. A method performed by a server device, the method comprising:
-
receiving, by the server device, information identifying documents selected in response to a search query; generating, by a processor of the server device, a plurality of substrings from the received search query, each of the plurality of substrings including at least two words; detecting, by the processor of the server device, an actual occurrence of one or more substrings, of the plurality of substrings, in one or more of the documents; calculating, by the processor of the server device, for each of the one or more substrings generated from the search query, a value that indicates a fraction of the documents in which an actual occurrence of the substring is detected; determining, by the processor of the server device, based on the calculated value for a particular substring of the one or more substrings, that the particular substring consists of words that form a single compound unit, the determining including; determining that the calculated value exceeds a threshold associated with compound units of words; and identifying, by the processor of the server device, a set of relevant documents based on determining that the particular substring forms the single compound unit. - View Dependent Claims (2, 3)
-
-
4. A system comprising:
a server connected to a network, the server to receive a search query from a user device via the network, the server including; at least one processor; and a memory operatively coupled to the at least one processor, the memory storing program instructions that when executed by the at least one processor, cause the at least one processor to; identify documents relating to the received search query by comparing search terms in the received search query to an index of a corpus of documents; generate a plurality of substrings from the received search query, each substring, of the plurality of substrings, including at least two words of the received search query; detect an actual occurrence of a particular substring, of the plurality of substrings, in the identified documents; calculate a value that indicates a fraction of the identified documents in which an actual occurrence of the particular substring was detected; determine, based on the calculated value, that the particular substring consists of words that, when taken together, form a single compound unit, the at least one processor, when determining that the particular substring consists of words that, when taken together, form a single compound unit, is to; determine that the calculated value exceeds a threshold value associated with substring consisting of words that, when taken together, form a single compound unit; and identify a set of relevant documents based on determining that the particular substring consists of words that form the single compound unit. - View Dependent Claims (5, 6, 7, 8)
-
9. A computer-readable medium, comprising:
a plurality of computer-executable instructions, which, when executed by one or more processors, cause the one or more processors to; receive a search query; receive information identifying documents selected in response to the search query; generate a plurality of substrings from the search query, each of the substrings including at least two words; detect, for one or more substrings, of the plurality of substrings generated from the search query, one or more actual occurrences of the one or more substrings in the identified documents; calculate, for each of the one or more substrings generated from the search query, a value that indicates a fraction of the documents in which an actual occurrence of the each substring is detected; determine, based on the calculated values, that at least one of the one or more substrings consists of words that, when taken together, form a single compound unit, the computer-executable instructions that cause the one or more processors to determine which of the one or more substrings consist of words that, when taken together, form a single compound unit, including computer-executable instructions that cause the one or more processors to; determine which calculated values, of the calculated values, exceed a threshold value associated with substrings consisting of words that, when taken together, form respective single compound units; and identify a set of relevant documents based on determining that the at least one of the one or more substrings form the single compound unit. - View Dependent Claims (10)
-
11. A server device, comprising:
-
a memory device storing computer-executable instructions; and one or more processors to execute the computer-executable instructions, to; generate a list of relevant documents based on individual search terms of a search query; identify a subset of documents that are most relevant documents, to the search query, from the list of relevant documents; form a plurality of multiword substrings of the search query, each of the plurality of multiword substrings including at least two words; detect, for each of the plurality of multiword substrings, actual occurrences of the multiword substring in the subset of documents; calculate, for each of the plurality of multiword substrings generated from the search query, a value that indicates a fraction of the relevant documents in which an actual occurrence of the each substring is detected; identify, based on the detected actual occurrences of each of the plurality of multiword substrings, that words of at least one of the plurality of multiword substrings, when taken together, form a single compound unit, when identifying that words of at least one of the multiword substrings, when taken together, form a single compound unit, the one or more processors are to; determine that the calculated value associated with the at least one of the plurality of multiword substrings exceeds a threshold value associated with compound units; and identify a set of documents, the set of documents being identified based on the identified at least one of the plurality of multiword substrings. - View Dependent Claims (12)
-
Specification