WORD BREAKER FROM CROSS-LINGUAL PHRASE TABLE
First Claim
1. A method of automatically building a word breaker for segmenting words of a source language into morphemes comprising:
- accessing, at a processor, a cross-lingual phrase table comprising a plurality of source language phrases, each source language phrase having at least one target language translation;
using the cross-lingual phrase table to infer and store, for source language words from the cross-lingual phrase table, morphemes comprising stems and affixes of the words.
3 Assignments
0 Petitions
Accused Products
Abstract
Automatically creating word breakers which segment words into morphemes is described, for example, to improve information retrieval, machine translation or speech systems. In embodiments a cross-lingual phrase table, comprising source language (such as Turkish) phrases and potential translations in a target language (such as English) with associated probabilities, is available. In various examples, blocks of source language phrases from the phrase table are created which have similar target language translations. In various examples, inference using the target language translations in a block enables stem and affix combinations to be found for source language words without the need for input from human-judges or prior knowledge of source language linguistic rules or a source language lexicon.
-
Citations
20 Claims
-
1. A method of automatically building a word breaker for segmenting words of a source language into morphemes comprising:
-
accessing, at a processor, a cross-lingual phrase table comprising a plurality of source language phrases, each source language phrase having at least one target language translation; using the cross-lingual phrase table to infer and store, for source language words from the cross-lingual phrase table, morphemes comprising stems and affixes of the words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of automatically building a word breaker for segmenting words of a source language into morphemes comprising:
-
accessing, at a processor, a cross-lingual phrase table comprising a plurality of source language phrases, each source language phrase having at least one target language translation; using the cross-lingual phrase table to infer and store, for source language words from the cross-lingual phrase table, morphemes comprising stems and affixes of the words, the inference using similar target language translations identified from the cross-lingual phrase table. - View Dependent Claims (14, 15)
-
-
16. A word breaker building system comprising:
-
a processor arranged to access a cross-lingual phrase table comprising a plurality of source language phrases, each source language phrase having at least one target language translation; the processor arranged to use the cross-lingual phrase table to infer and store, for source language words from the cross-lingual phrase table, morphemes comprising stems and affixes of the words. - View Dependent Claims (17, 18, 19, 20)
-
Specification