MINING PHRASE PAIRS FROM AN UNSTRUCTURED RESOURCE
First Claim
1. A method, using electrical data processing functionality, for creating a training set for use in training a statistical translation model, comprising:
- constructing queries;
presenting the queries to an electrical data retrieval module, the retrieval module configured to perform a searching operation within an unstructured resource based on the queries;
receiving result sets from the retrieval module, the result sets providing result items identified by the retrieval module as a result of the searching operation; and
performing processing on the result sets to produce a structured training set, the training set identifying pairs of the result items within the result sets,the training set providing a basis by which an electrical training system can learn the statistical translation model.
2 Assignments
0 Petitions
Accused Products
Abstract
A mining system applies queries to retrieve result items from an unstructured resource. The unstructured resource may correspond to a repository of network-accessible resource items. The result items that are retrieved may correspond to text segments (e.g., sentence fragments) associated with resource items. The mining system produces a structured training set by filtering the result items and establishing respective pairs of result items. A training system can use the training set to produce a statistical translation model. The translation model can be used in a monolingual context to translate between semantically-related phrases in a single language. The translation model can also be used in a bilingual context to translate between phrases expressed in two respective languages. Various applications of the translation model are also described.
-
Citations
20 Claims
-
1. A method, using electrical data processing functionality, for creating a training set for use in training a statistical translation model, comprising:
-
constructing queries; presenting the queries to an electrical data retrieval module, the retrieval module configured to perform a searching operation within an unstructured resource based on the queries; receiving result sets from the retrieval module, the result sets providing result items identified by the retrieval module as a result of the searching operation; and performing processing on the result sets to produce a structured training set, the training set identifying pairs of the result items within the result sets, the training set providing a basis by which an electrical training system can learn the statistical translation model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An electrical mining system for creating a training set for use in training a statistical translation model, comprising:
-
a query presentation module configured to construct queries; an interface module configured to; present the queries to a retrieval module, the retrieval module configured to perform a searching operation within an unstructured resource based on the queries; and receive result sets from the retrieval module, the result sets providing result items identified by the retrieval module as a result of the searching operation; and a training set preparation module configured to perform processing on the result sets to produce a structured training set, the training set identifying pairs of result items within the result sets, the training set providing a basis by which an electrical training system can learn the statistical translation model, the result items within the result sets comprising text segments retrieved by the retrieval module from the unstructured resource, the text segments corresponding to at least sentence fragments of respective resource items within the unstructured resource, the resource items having no pre-identified relation to each other. - View Dependent Claims (15, 16)
-
-
17. A computer readable medium for storing computer readable instructions, the computer readable instructions providing a mining system when executed by one or more processing devices, the computer readable instructions comprising:
-
interface logic configured to retrieve result items from an unstructured resource on the basis of queries submitted to the unstructured resource, the unstructured resource corresponding to network-accessible resource items; and training set preparation logic configured to establish a structured training set from the result items retrieved from the unstructured resource, the training set being constructed in a manner which is agnostic with respect to any similarity among the resource items as respective wholes and any parallelism within sentences contained within the resource items, the training set providing a basis by which an electrical training system can learn a statistical translation model. - View Dependent Claims (18, 19, 20)
-
Specification