Stimulus Description Collections
First Claim
1. In a computing environment, a method performed at least in part on at least one processor, comprising, presenting a stimulus to contributors, collecting a linguistic description from each responding contributor as to what the stimulus represented to the contributor, and maintaining at least some of the linguistic descriptions corresponding to that stimulus in association with one another as translation data for use in training a translation engine, or as paraphrase data for use in training a paraphrasing system, or both as translation data for use in training a translation engine, and as paraphrase data for use in training a paraphrasing system.
3 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure generally describes a technology by which text and/or speech descriptions are collected by showing a stimulus such as video clips to contributors (e.g., of a crowd-sourcing service). The descriptions, which are in the language of each contributor'"'"'s choice, are of the same stimulus and thus associated with one another. While each contributor may be monolingual, the technique allows for the collection of approximately bilingual data, since more than one language may be represented among the different contributors. The descriptions may be used as translation data for training a machine translation engine, and as paraphrase data (grouped by the same language) for training a machine paraphrasing system. Also described is evaluating the quality of a machine paraphrasing system via a distinctiveness metric.
36 Citations
20 Claims
- 1. In a computing environment, a method performed at least in part on at least one processor, comprising, presenting a stimulus to contributors, collecting a linguistic description from each responding contributor as to what the stimulus represented to the contributor, and maintaining at least some of the linguistic descriptions corresponding to that stimulus in association with one another as translation data for use in training a translation engine, or as paraphrase data for use in training a paraphrasing system, or both as translation data for use in training a translation engine, and as paraphrase data for use in training a paraphrasing system.
-
14. One or more computer-readable media having computer-executable instructions, which when executed perform steps of a process, comprising:
-
inputting input data corresponding to a set of words to a machine paraphrase system; receiving output data from the machine paraphrase system corresponding to a paraphrase of the input data; and evaluating quality of the machine paraphrase system, including obtaining a first score representing how well the output data retained the input data'"'"'s original meaning, and a second score representing how distinct the output data is from the input data. - View Dependent Claims (15, 16)
-
- 17. A system comprising, a source that provides a stimulus to contributors, a data collection mechanism configured to collect a linguistic description of that stimulus from each contributor, the data collection mechanism further configured to maintain translation data that associates linguistic descriptions of that stimulus that are in different languages with one another, and to maintain paraphrase data which, for at least one language, associates linguistic descriptions of that stimulus in that same language with one another.
Specification