Identification of semantic units from within a search query
First Claim
Patent Images
1. A method of identifying semantic units within a search query comprising:
- identifying documents relating to the query by comparing search terms in the query to an index of a corpus;
generating a plurality of multiword substrings from the query in which each of the substrings includes at least two words;
calculating, for each of the generated substrings, a value that corresponds to a comparison between one or more of the identified documents and the generated substring;
selecting semantic units from the generated multiword substrings based on the calculated values; and
storing the selected semantic units in a computer-readable memory,wherein the identification of the documents includes generating an initial list of relevant documents and selecting a predetermined number of most relevant ones of the documents in the initial list as the identified documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A search engine for searching a corpus improves the relevancy of the results by classifying multiple terms in a search query as a single semantic unit. A semantic unit locator of the search engine generates a subset of documents that are generally relevant to the query based on the individual terms within the query. Combinations of search terms that define potential semantic units from the query are then evaluated against the subset of documents to determine which combinations of search terms should be classified as a semantic unit. The resultant semantic units are used to refine the results of the search.
112 Citations
30 Claims
-
1. A method of identifying semantic units within a search query comprising:
-
identifying documents relating to the query by comparing search terms in the query to an index of a corpus; generating a plurality of multiword substrings from the query in which each of the substrings includes at least two words; calculating, for each of the generated substrings, a value that corresponds to a comparison between one or more of the identified documents and the generated substring; selecting semantic units from the generated multiword substrings based on the calculated values; and storing the selected semantic units in a computer-readable memory, wherein the identification of the documents includes generating an initial list of relevant documents and selecting a predetermined number of most relevant ones of the documents in the initial list as the identified documents. - View Dependent Claims (2, 3, 4)
-
-
5. A method of identifying semantic units within a search query comprising:
-
identifying documents relating to the query by comparing search terms in the query to an index of a corpus; generating a plurality of multiword substrings from the query in which each of the substrings includes at least two words; calculating, for each of the generated substrings, a value that corresponds to a comparison between one or more of the identified documents and the generated substring; selecting semantic units from the generated multiword substrings based on the calculated values; and storing the selected semantic units in a computer-readable memory, wherein the selection of the semantic units further includes selecting semantic units from the generated substrings that have calculated values above a predetermined threshold and discarding the generated substrings that overlap other ones of the generated substrings with higher calculated values.
-
-
6. A method of locating documents in response to a search query, the method comprising:
-
receiving the search query from a user; generating a list of relevant documents based on search terms of the query; identifying a subset of documents that are most relevant ones of the documents in the list of relevant documents; generating a plurality of multiword substrings of the query in which each of the multiword substrings includes at least two words; calculating, for each of the generated substrings, a value related to one or more documents in the subset of documents that contain the substring; selecting semantic units from the generated multiword substrings based on the calculated values, the selecting including selecting semantic units from the generated substrings that have calculated values above a predetermined threshold and discarding the generated substrings that overlap other ones of the generated substrings with higher calculated values; refining the generated list of relevant documents based on the selected semantic units; and transmitting the refined list of relevant documents to the user. - View Dependent Claims (7, 8, 9)
-
-
10. A computer-readable medium storing instructions for causing at least one processor to perform a method that identifies semantic units within a search query, the method comprising:
-
identifying documents relating to the query by matching individual search terms in the query to an index of a corpus, the identification of the documents further including generating an initial list of relevant documents and selecting a predetermined number of the most relevant documents in the initial list to include in the identified documents; forming a plurality of multiword substrings of the query in which each of the substrings includes at least two words; calculating, for each of the substrings, a value relating to the portion of the identified documents that contain the substring; selecting semantic units from the generated multiword substrings based on the calculated values; and storing the selected semantic units in a memory.
-
-
11. A computer-readable medium storing instructions for causing at least one processor to perform a method that identifies semantic units within a search query the method comprising:
-
identifying documents relating to the query by matching individual search terms in the query to an index of a corpus; forming a plurality of multiword substrings of the query in which each of the substrings includes at least two words; calculating for each of the substrings a value relating to the portion of the identified documents that contain the substring; selecting semantic units from the generated multiword substrings based on the calculated values; and storing the selected semantic units in a memory, wherein the selection of the semantic units further includes selecting semantic units from the generated substrings that have calculated values above a predetermined threshold and discarding substrings that overlap other substrings with a higher calculated value. - View Dependent Claims (12, 13)
-
-
14. A computer-readable medium storing instructions for causing a processor to perform a method, the method comprising:
-
receiving a search query from a user; generating a list of relevant documents based on individual search terms of the query; identifying a subset of documents that are the most relevant documents from the list of relevant documents; forming a plurality of multiword substrings of the query in which each of the multiword substrings includes at least two words; calculating, for each of the substrings, a value related to the portion of the subset of documents that contain the substring; selecting semantic units from the generated multiword substrings based on the calculated values; refining the generated list of relevant documents based on the selected semantic units; and transmitting the refined list of relevant documents to the user, wherein the selection of the semantic units further includes selecting semantic units from the generated substrings that have calculated values above a predetermined threshold and discarding substrings that overlap other substrings with a higher calculated value. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A system comprising:
a server connected to a network, the server receiving search queries from users via the network, the server including; at least one processor; and a memory operatively coupled to the processor, the memory storing program instructions that when executed by the processor, cause the processor to;
identify a list of documents relating to the search query by matching individual search terms in the query to an index of a corpus;
generate a plurality of multiword substrings from the query in which each of the substrings includes at least two words;
calculate, for each of the generated substrings, a value relating to one or more documents of the identified list of documents that contain the generated substring; and
select semantic units from the generated multiword substrings as semantic units that have calculated values above a predetermined threshold and in which semantic units that overlap other substrings with a higher calculated value are discarded, the selected semantic units being stored in the memory.- View Dependent Claims (20, 21, 22, 23, 24)
-
25. A server comprising:
-
a processor; and a memory operatively coupled to the processor, the memory including; a ranking component configured to return a list of documents ordered by relevance in response to a search query; and a semantic unit locator component configured to locate semantic units, each having a plurality of words, in search queries entered by a user based on a predetermined number of most relevant documents in the list of documents returned by the ranking component, the located semantic units being stored in the memory, wherein the semantic unit locator is further configured to generate a plurality of substrings of the query, calculate, for each generated substring, a value relating to the portion of the predetermined number of the most relevant documents that contain the substring, and locate the semantic units from the generated values. - View Dependent Claims (26, 27, 28, 29, 30)
-
Specification