×

MINING BILINGUAL DICTIONARIES FROM MONOLINGUAL WEB PAGES

  • US 20090070095A1
  • Filed: 09/07/2007
  • Published: 03/12/2009
  • Est. Priority Date: 09/07/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying translation pairs from web pages, the method comprising:

  • receiving monolingual web page data of a source language;

    processing the web page data by;

    detecting the occurrence of a predefined pattern in the web page data;

    extracting a plurality of translation pair candidates, each of the translation pair candidates including a source language string and target language string;

    determining whether each translation pair candidate is a valid transliteration;

    for each translation pair that is determined not to be a valid transliteration, determining whether each translation pair candidate is a valid translation; and

    adding each translation pair that is determined to be a valid translation or transliteration to a dictionary.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×