Adaptive Web Mining of Bilingual Lexicon for Query Translation
First Claim
1. A method for mining translation pairs for cross-language translation, the method comprising:
- querying a web search engine by each translation pair of an initial term translation list to retrieve bilingual webpages containing translations;
crawling websites hosting the retrieved bilingual webpages to retrieve additional bilingual webpages;
extracting additional translation pairs from the bilingual webpages retrieved; and
querying the web search engine by each additional translation pairs to retrieve more bilingual webpages for additional website crawling and translation pair extracting.
2 Assignments
0 Petitions
Accused Products
Abstract
Mining of translation pairs for cross-language translation uses a collective extraction model to exploit the similarity among the translation pairs and adaptively learn extraction patterns for each bilingual webpage. The process queries a web search engine by an initial term translation list to retrieve bilingual webpages containing translations, and crawls websites hosting the retreived bilingual webpages to retrieve additional bilingual webpages. The process then extracts additional translation pairs from the bilingual webpages retrieved by learning translation patterns of the bilingual webpages retrieved and adaptively extreacting translation pairs from the bilingual webpages using the learned translation patterns. More bilingual webpages may be acquired for additional website crawling and translation pair extracting by querying the web search engine by additional translation pairs.
-
Citations
20 Claims
-
1. A method for mining translation pairs for cross-language translation, the method comprising:
-
querying a web search engine by each translation pair of an initial term translation list to retrieve bilingual webpages containing translations; crawling websites hosting the retrieved bilingual webpages to retrieve additional bilingual webpages; extracting additional translation pairs from the bilingual webpages retrieved; and querying the web search engine by each additional translation pairs to retrieve more bilingual webpages for additional website crawling and translation pair extracting. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for extracting translation pairs from bilingual webpages, the method comprising:
-
learning webpage blocks containing translation pairs in the bilingual webpages and classifying the webpage blocks into at least two different block classes; learning translation patterns in the bilingual webpages and classifying candidate translation patterns in the classified webpage blocks into at least two different pattern classes; and adaptively extracting translation pairs from the bilingual webpages using the learned translation patterns. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. One or more computer readable media having stored thereupon a plurality of instructions that, when executed by a processor, causes the processor to:
-
query a web search engine by each translation pair of an initial term translation list to retrieve bilingual webpages containing translations; crawl websites hosting the retrieved bilingual webpages to retrieve additional bilingual webpages; extract additional translation pairs from the bilingual webpages retrieved; and query the web search engine by each additional translation pairs to retrieve more bilingual webpages for additional website crawling and translation pair extracting. - View Dependent Claims (20)
-
Specification