System and method to acquire paraphrases
First Claim
1. A method for acquiring paraphrases for use in natural language processing applications, the method being conducted using an automated survey platform that is configured to permit access to a group of crowd-workers for performing portions of a task while other portions are performed by the automated survey platform, and using a processor configured to perform the method, the method comprising:
- receiving raw text as input to the processor, the raw text being a result of obtaining opinions of a group of consumers or customers of a product or service;
sentence breaking the raw text into individual sentences by the processor by use of automatic natural language processing techniques;
providing the individual sentences and a first survey by the processor through the automated survey platform to a plurality of annotating sources,wherein each annotating source reviews the individual sentences and determines an assessment of the individual sentences based on the first survey, andwherein the automated survey platform is accessed by the plurality of annotating sources for the first survey, the plurality of annotating sources for the first survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform;
receiving results of the first survey from the each of the annotating sources for the first survey by the processor;
filtering the results of the first survey by the processor to group the individual sentences that were the subject of the first survey into groups that have received like assessments from the annotating sources;
providing the filtered survey results and a second survey by the processor to a plurality of annotating sources for the second survey, the plurality of annotating sources for the second survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform,wherein each annotating source for the second survey conducts the second survey based on the filtered results to determine portions of the individual sentences included in the filtered results that indicate the assessment, andwherein the plurality of annotating sources for the first survey and the plurality of annotating sources for the second survey each comprises a sampling of people to complete the first survey and the second survey, respectively, and wherein the sampling of people comprises respondents to the automated survey platform;
receiving results of the second survey from the plurality of annotating sources for the second survey by the processor; and
automatically generating by the processor paraphrases based on the results of the second survey, wherein the paraphrases are pairs of expression that have a same meaning.
6 Assignments
0 Petitions
Accused Products
Abstract
An automatic paraphrase acquisition technique is provided. A common theme of the various embodiments described herein resides in careful design of simple tasks that can elicit the necessary information for the automated process. These tasks are performed quickly and inexpensively. By gathering the results produced, paraphrases can be generated automatically using the method and/or system.
67 Citations
16 Claims
-
1. A method for acquiring paraphrases for use in natural language processing applications, the method being conducted using an automated survey platform that is configured to permit access to a group of crowd-workers for performing portions of a task while other portions are performed by the automated survey platform, and using a processor configured to perform the method, the method comprising:
-
receiving raw text as input to the processor, the raw text being a result of obtaining opinions of a group of consumers or customers of a product or service; sentence breaking the raw text into individual sentences by the processor by use of automatic natural language processing techniques; providing the individual sentences and a first survey by the processor through the automated survey platform to a plurality of annotating sources, wherein each annotating source reviews the individual sentences and determines an assessment of the individual sentences based on the first survey, and wherein the automated survey platform is accessed by the plurality of annotating sources for the first survey, the plurality of annotating sources for the first survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform; receiving results of the first survey from the each of the annotating sources for the first survey by the processor; filtering the results of the first survey by the processor to group the individual sentences that were the subject of the first survey into groups that have received like assessments from the annotating sources; providing the filtered survey results and a second survey by the processor to a plurality of annotating sources for the second survey, the plurality of annotating sources for the second survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform, wherein each annotating source for the second survey conducts the second survey based on the filtered results to determine portions of the individual sentences included in the filtered results that indicate the assessment, and wherein the plurality of annotating sources for the first survey and the plurality of annotating sources for the second survey each comprises a sampling of people to complete the first survey and the second survey, respectively, and wherein the sampling of people comprises respondents to the automated survey platform; receiving results of the second survey from the plurality of annotating sources for the second survey by the processor; and automatically generating by the processor paraphrases based on the results of the second survey, wherein the paraphrases are pairs of expression that have a same meaning. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for acquiring paraphrases for use in natural language processing applications, the system including an automated survey platform that is configured to permit access to a group of crowd-workers for performing portions of a task while other portions are performed by the automated survey platform,
the system comprising: -
an input interface for raw text, the raw text being a result of obtaining opinions of a group of consumers or customers of a product or service; a processor configured to break the raw text into individual sentences by use of automatic natural language processing techniques, provide the individual sentences and a first survey through the automated survey platform to a plurality of annotating sources, wherein each annotating source reviews the individual sentences and determines an assessment of the individual sentences based on the first survey, and wherein the automated survey platform is accessed by the plurality of annotating sources for the first survey, the plurality of annotating sources for the first survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform; receive results of the first survey from the each of the annotating sources for the first survey, filter the results of the first survey to group the individual sentences that were the subject of the first survey into groups that have received like assessments from the annotating sources, provide the filtered survey results and a second survey to a plurality of annotating sources for the second survey to conduct the second survey to determine portions of the individual sentences included in the filtered survey results that indicate the assessment, the plurality of annotating sources for the second survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform, receive results of the second survey from the plurality of annotating sources for the second survey, and automatically generate paraphrases based on the results of the second survey, wherein the plurality of annotating sources for the first survey and the plurality of annotating sources for the second survey each comprises a sampling of people to complete the first survey and the second survey, respectively, and wherein the sampling of people comprises respondents to the automated survey platform; and an output interface for the paraphrases, wherein the paraphrases are pairs of expression that have a same meaning. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for acquiring paraphrases for use in natural language processing applications, the system including an automated survey platform that is configured to permit access to a group of crowd-workers for performing portions of a task while other portions are performed by the automated survey platform,
the system comprising: -
means for receiving raw text as input, the raw text being a result of obtaining opinions of a group of consumers or customers of a product or service; means for sentence breaking the raw text into individual sentences by use of automatic natural language processing techniques; means for providing the individual sentences and a first survey through the automated survey platform to a plurality of annotating sources, wherein each annotating source reviews the individual sentences and determines an assessment of the individual sentences based on the first survey, and wherein the automated survey platform is accessed by the plurality of annotating sources for the first survey, the plurality of annotating sources for the first survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform; means for receiving results of the first survey from each of the annotating sources for the first survey; means for filtering the results of the first survey to group the individual sentences that were the subject of the first survey into groups that have received like assessments from the annotating sources; means for providing the filtered results of the first survey and a second survey to a plurality of annotating sources for the second survey, to determine portions of the individual sentences included in the filtered results of the first survey that indicate the assessment, the plurality of annotating sources for the second survey being crowd-workers connected via their respective networked computers over the internet with the automated survey platform means for receiving results of the second survey from the plurality of annotating sources for the second survey; means for automatically generating paraphrases based on the results of the second survey, wherein the plurality of annotating sources for the first survey and the plurality of annotating sources for the second survey each comprises a sampling of people to complete the first survey and the second survey, respectively, and wherein the sampling of people comprises respondents to the automated survey platform; and means for output of the paraphrases, wherein the paraphrases are pairs of expression that have a same meaning. - View Dependent Claims (16)
-
Specification