Retrieval of domain relevant phrase tables
First Claim
Patent Images
1. A method for generating a phrase table for a target domain comprising:
- receiving a source corpus for a target domain;
for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table, the measure of similarity being computed as a function of counts of n-grams of each of a plurality of sizes in the source corpus that are also present in the respective phrase table;
based on the computed similarity measures, identifying a subset of the comparative domain phrase tables from the set of comparative domain phrase tables; and
generating a phrase table for the target domain based on the subset of phrase tables;
wherein the computing of the similarity measures, identifying the subset of the phrase tables, and generating the phrase table is performed with a computer processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for generating a phrase table for a target domain includes receiving a source corpus for a target domain and, for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table. Based on the computed similarity measures, a subset of the comparative domain phrase tables may be identified from the set of comparative domain phrase tables, and/or weights for combining them, and a phrase table is generated for the target domain based on the at least a subset of phrase tables.
-
Citations
20 Claims
-
1. A method for generating a phrase table for a target domain comprising:
-
receiving a source corpus for a target domain; for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table, the measure of similarity being computed as a function of counts of n-grams of each of a plurality of sizes in the source corpus that are also present in the respective phrase table; based on the computed similarity measures, identifying a subset of the comparative domain phrase tables from the set of comparative domain phrase tables; and generating a phrase table for the target domain based on the subset of phrase tables; wherein the computing of the similarity measures, identifying the subset of the phrase tables, and generating the phrase table is performed with a computer processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18)
-
-
14. A method, for generating a phrase table for a target domain comprising:
-
receiving a source corpus for a target domain; for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table, wherein the computing of the measure of similarity between the source corpus and the comparative domain phrase table comprises computing an occurrence of each of a set of phrases in the source corpus in biphrases of each of the phrase tables; based on the computed similarity measures, identifying a subset of the comparative domain phrase tables from the set of comparative domain phrase tables; and generating a phrase table for the target domain based on the subset of phrase tables; wherein the computing of the similarity measures, identifying the subset of the phrase tables, and generating the phrase table is performed with a computer processor.
-
-
19. A system for generating a phrase table for a target domain comprising:
-
a similarity computation component which, for each of a set of comparative domain phrase tables, computes a measure of similarity between an input source corpus and the comparative domain phrase table; a multi-model computation component which identifies a subset of the comparative domain phrase tables from the set of comparative domain phrase tables based on the computed similarity measures and generates a phrase table for the target domain based on the subset of phrase tables; and a processor which implements the similarity computation component and the multi-model computation component.
-
-
20. A method for generating a phrase table for a target domain comprising:
-
receiving a source corpus for a target domain; for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table; combining at least a subset of the comparative domain phrase tables from the set of comparative domain phrase tables in a weighted combination, weights for the combination being based on the computed similarity measures; and wherein at least one of the computing of the similarity measures, identifying the subset of the phrase tables, and combining is performed with a computer processor.
-
Specification