Dimensionally reduction of linguistics information
First Claim
1. A method performed by a computing device, the method comprising:
- obtaining a query comprising one or more words from a vocabulary having a first dimension;
transforming the one or more words of the query into a phonetic representation of the one or more words;
processing the phonetic representation to obtain a lower-dimension representation comprising a plurality of n-grams in an n-gram space having a second dimension that is smaller than the first dimension;
performing a natural language processing operation on the lower-dimension representation, the natural language processing operation comprising determining similarity measures reflecting similarity of the one or more words of the query to a plurality of documents;
based at least on the similarity measures, selecting a subset of the documents that are relevant to the query; and
outputting the selected subset of documents in response to the query.
2 Assignments
0 Petitions
Accused Products
Abstract
A deep structured semantic module (DSSM) is described herein which uses a model that is discriminatively trained based on click-through data, e.g., such that a conditional likelihood of clicked documents, given respective queries, is maximized, and a condition likelihood of non-clicked documents, given the queries, is reduced. In operation, after training is complete, the DSSM maps an input item into an output item expressed in a semantic space, using the trained model. To facilitate training and runtime operation, a dimensionality-reduction module (DRM) can reduce the dimensionality of the input item that is fed to the DSSM. A search engine may use the above-summarized functionality to convert a query and a plurality of documents into the common semantic space, and then determine the similarity between the query and documents in the semantic space. The search engine may then rank the documents based, at least in part, on the similarity measures.
-
Citations
20 Claims
-
1. A method performed by a computing device, the method comprising:
-
obtaining a query comprising one or more words from a vocabulary having a first dimension; transforming the one or more words of the query into a phonetic representation of the one or more words; processing the phonetic representation to obtain a lower-dimension representation comprising a plurality of n-grams in an n-gram space having a second dimension that is smaller than the first dimension; performing a natural language processing operation on the lower-dimension representation, the natural language processing operation comprising determining similarity measures reflecting similarity of the one or more words of the query to a plurality of documents; based at least on the similarity measures, selecting a subset of the documents that are relevant to the query; and outputting the selected subset of documents in response to the query. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processing device; and a computer readable storage medium storing instructions which, when executed by the processing device, cause the processing device to; obtain a query comprising one or more words from a vocabulary having a first dimension; transform the one or more words of the query into a phonetic representation of the query; process the phonetic representation of the query to obtain a lower-dimension representation of the query, the lower-dimension representation comprising a plurality of n-grams in an n-gram space having a second dimension that is smaller than the first dimension; use the lower-dimension representation to determine similarity measures reflecting similarity of the query to a plurality of documents; based at least on the similarity measures, select a subset of the documents that are relevant to the query; and output the selected subset of documents in response to the query. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A computer readable storage medium storing computer readable instructions which, when executed by one or more processing devices, cause the one or more processing devices to perform acts comprising:
-
obtaining a query comprising one or more words from a vocabulary having a first dimension; transforming the query into a phonetic representation of query; processing the phonetic representation to obtain a lower-dimension representation of the query, the lower-dimension representation comprising a plurality of n-grams in an n-gram space having a second dimension that is smaller than the first dimension; using the lower-dimension representation of the query to determine similarity measures reflecting similarity of the query to a plurality of documents; based at least on the similarity measures, selecting a subset of the documents that are relevant to the query; and outputting the selected subset of documents in response to the query. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification