Search lexicon expansion
First Claim
1. A method performed by at least one processing unit, the method comprising:
- identifying a seed lexicon term from a plurality of lexicon terms that are identified by a lexicon as members of a semantic class;
identifying result documents retrieved from a search engine by queries submitted to the search engine that include the seed lexicon term;
extracting a keyword that is shared by the result documents retrieved from the search engine by the queries that include the seed lexicon term; and
using the keyword shared by the result documents to expand the lexicon by;
identifying, in a query log, specific other result documents that also include the keyword shared by the result documents, wherein the query log does not indicate that the specific other result documents were retrieved from the search engine using the lexicon terms identified by the lexicon as members of the semantic class;
identifying another member of the semantic class in a particular query that has been submitted to the search engine to retrieve the specific other result documents; and
adding to the lexicon, as a new lexicon term, the another member of the semantic class.
2 Assignments
0 Petitions
Accused Products
Abstract
One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.
44 Citations
21 Claims
-
1. A method performed by at least one processing unit, the method comprising:
-
identifying a seed lexicon term from a plurality of lexicon terms that are identified by a lexicon as members of a semantic class; identifying result documents retrieved from a search engine by queries submitted to the search engine that include the seed lexicon term; extracting a keyword that is shared by the result documents retrieved from the search engine by the queries that include the seed lexicon term; and using the keyword shared by the result documents to expand the lexicon by; identifying, in a query log, specific other result documents that also include the keyword shared by the result documents, wherein the query log does not indicate that the specific other result documents were retrieved from the search engine using the lexicon terms identified by the lexicon as members of the semantic class; identifying another member of the semantic class in a particular query that has been submitted to the search engine to retrieve the specific other result documents; and adding to the lexicon, as a new lexicon term, the another member of the semantic class. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method performed by one or more processing units, the method comprising:
-
accessing a query log reflecting queries that users have submitted to a search engine to retrieve documents; identifying, within the query log, first documents returned by the search engine in response to first queries that include a dictionary term that appears in a dictionary; extracting a document pattern shared by the first documents; and using the extracted document pattern to expand the dictionary by; identifying second documents within the query log that also share the extracted document pattern shared by the first documents; identifying a specific query term within the query log that the users submitted to the search engine to retrieve the second documents; and expanding the dictionary by adding the specific query term to the dictionary as a new dictionary term. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more processing units; and memory comprising instructions which, when executed by the one or more processing units, cause the one or more processing units to; access a query log reflecting queries that users have submitted to a search engine to retrieve documents; identify, in the query log, first documents that were returned by the search engine in response to users submitting lexicon terms to the search engine, the lexicon terms being identified by a lexicon as members of particular query domain; extract a document pattern from the first documents that were returned by the search engine in response to the users submitting the lexicon terms to the search engine; and use the extracted document pattern to expand the lexicon by; identifying, in the query log, second documents that include the extracted document pattern and that the query log does not indicate were retrieved by submitting the lexicon terms to the search engine; identifying, in the query log, another member of the particular query domain that the users have submitted to the search engine to retrieve the second documents; and adding the another member of the particular query domain to the lexicon as a new lexicon term. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method performed by at least one processing unit, the method comprising:
-
obtaining a lexicon of previously-identified members of a semantic class; processing a query log to identify result documents that have been retrieved using the previously-identified members of the semantic class; identifying a characteristic that is shared by the result documents that have been retrieved using the previously-identified members of the semantic class; identifying further result documents that also have the characteristic that is shared by the result documents that have been retrieved using the previously-identified members of the semantic class; and identifying a new member of the semantic class that has been used to retrieve the further result documents, the new member of the semantic class not being previously identified by the lexicon as belonging to the semantic class.
-
Specification