Paraphrase acquisition
First Claim
1. A method, comprising:
- receiving textual input;
processing, by one or more computers, the textual input to produce a first output comprising first keys, wherein each first key is an ngram occurring in the textual input;
processing, by the one or more computers, the first output to produce a second output comprising second keys and associated second values, wherein each second key is an anchor extracted from at least one of the first keys, wherein each anchor is defined by a beginning portion and an ending portion of a respective first key ngram, the beginning portion and ending portion being separated by a middle portion of the respective first key ngram, and wherein each second value is a set of middle portions associated with each distinct anchor; and
processing, by the one or more computers, the second output to produce a third output comprising third keys and associated third values, wherein each third key is a potential paraphrase pair identified from a second value set of middle portions, and each third value is a set of anchors associated with the potential paraphrase pair.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus, including systems and computer program products, to acquire potential paraphrases from textual input. In one aspect, textual input is received, a first map is generated, where the key of the first map is an ngram identified in the textual input and the value associated with the key of the first map is a unique identifier, a second map is generated, where the key of the second map is an anchor identified from the ngram and the value associated with the key of the second map is one or more middle portions associated with the anchor, and a third map is generated, where the key of the third map is a potential paraphrase pair identified from the middle portions and the value associated with the key of the third map is the one or more unique anchors associated with the potential paraphrase pair.
79 Citations
24 Claims
-
1. A method, comprising:
-
receiving textual input; processing, by one or more computers, the textual input to produce a first output comprising first keys, wherein each first key is an ngram occurring in the textual input; processing, by the one or more computers, the first output to produce a second output comprising second keys and associated second values, wherein each second key is an anchor extracted from at least one of the first keys, wherein each anchor is defined by a beginning portion and an ending portion of a respective first key ngram, the beginning portion and ending portion being separated by a middle portion of the respective first key ngram, and wherein each second value is a set of middle portions associated with each distinct anchor; and processing, by the one or more computers, the second output to produce a third output comprising third keys and associated third values, wherein each third key is a potential paraphrase pair identified from a second value set of middle portions, and each third value is a set of anchors associated with the potential paraphrase pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer storage medium encoded with a computer program, the program comprising instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
-
receiving textual input; processing the textual input to produce a first output comprising first keys, wherein each first key is an ngram occurring in the textual input; processing the first output to produce a second output comprising second keys and associated second values, wherein each second key is an anchor extracted from at least one of the first keys, wherein each anchor is defined by a beginning portion and an ending portion of a respective first key ngram, the beginning portion and ending portion being separated by a middle portion of the respective first key ngram, and wherein each second value is a set of middle portions associated with each distinct anchor; and processing the second output to produce a third output comprising third keys and associated third values, wherein each third key is a potential paraphrase pair identified from a second value set of middle portions, and each third value is a set of anchors associated with the potential paraphrase pair. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
one or more computers; and a computer-readable storage device having stored thereon instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving textual input; processing the textual input to produce a first output comprising first keys, wherein each first key is an ngram occurring in the textual input; processing the first output to produce a second output comprising second keys and associated second values, wherein each second key is an anchor extracted from at least one of the first keys, wherein each anchor is defined by a beginning portion and an ending portion of a respective first key ngram, the beginning portion and ending portion being separated by a middle portion of the respective first key ngram, and wherein each second value is a set of middle portions associated with each distinct anchor; and processing the second output to produce a third output comprising third keys and associated third values, wherein each third key is a potential paraphrase pair identified from a second value set of middle portions, and each third value is a set of anchors associated with the potential paraphrase pair. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification