UNSUPERVISED CHINESE WORD SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION
First Claim
1. In a computing environment, a method comprising:
- receiving an unsegmented sentence; and
segmenting the unsegmented sentence into a segmented sentence via a segmenter that includes a generative model.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is using a generative model in processing an unsegmented sentence into a segmented sentence. A segmenter includes the generative model, which given an unsegmented sentence (e.g., in Chinese) provides candidate segmented sentences to a probability-based decoder that selects the segmented sentence. For example, the segmented (e.g., Chinese-language) sentence may be provided to a statistical machine translator that outputs a translated (e.g., English-language) sentence. The generative model may include a word sub-model that generates hidden words using a word model, a spelling sub-model that generates characters from the hidden words, and an alignment sub-model that generates translated words and alignment data from the characters. The word sub-model may correspond to a unigram model having words and associated frequency data therein, and the alignment sub-model may correspond to a word aligned corpus having source sentence, translated target sentence pairings therein. Training is also described.
45 Citations
20 Claims
-
1. In a computing environment, a method comprising:
-
receiving an unsegmented sentence; and segmenting the unsegmented sentence into a segmented sentence via a segmenter that includes a generative model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
- 14. In a computing environment, a system comprising, a generative model, including a word sub-model that generates hidden words using a word model, a spelling sub-model that generates characters from the hidden words, and an alignment sub-model that generates translated words and alignment data from the characters.
- 18. In a computing environment, a method comprising, configuring a generative model for use in segmenting an unsegmented sentence, including generating hidden words using a word model, generating characters from the hidden words, and generating candidate segmented sentences from the characters.
Specification