Method and system for event phrase identification
First Claim
1. A method for identifying text in a word set comprising:
- retrieving a target term set including a plurality of target terms;
retrieving the word set including a plurality of text words;
normalizing target terms in the target term set to generate normalized terms;
normalizing text words in the word set to generate normalized words;
comparing the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides a system and method for identifying text in a word set. The method may include retrieving a target term set including a plurality of target terms; retrieving the word set including a plurality of text words; normalizing target terms in the target term set to generate normalized terms; normalizing text words in the word set to generate normalized words; comparing the normalized terms with the normalized words to determine (1) a first match between a first normalized term and a first normalized word; and (2) a second match between a second normalized term and a second normalized word. The method may further include determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.
55 Citations
22 Claims
-
1. A method for identifying text in a word set comprising:
-
retrieving a target term set including a plurality of target terms;
retrieving the word set including a plurality of text words;
normalizing target terms in the target term set to generate normalized terms;
normalizing text words in the word set to generate normalized words;
comparing the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 20, 21, 22)
-
-
12. A system for identifying text in a word set comprising:
-
an input portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
a normalizing portion that normalizes target terms in the target term set to generate normalized terms, the normalizing portion further normalizing text words in the word set to generate normalized words;
a comparing portion that compares the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
a locations array processing portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the locations array processing portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computer readable medium for identifying text in a word set, the computer readable medium comprising:
-
a first portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
a second portion that normalizes target terms in the target term set to generate normalized terms, the second portion further normalizing text words in the word set to generate normalized words;
a third portion that compares the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
a fourth portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the fourth portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.
-
-
18. A method for identifying text in a word set comprising:
-
retrieving a target term set including a plurality of target terms;
retrieving the word set including a plurality of text words;
normalizing target terms in the target term set to generate normalized terms;
normalizing text words in the word set to generate normalized words;
comparing the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria;
wherein normalizing words in the word set includes normalizing significant words and non-significant words, the normalizing words in the word set further includes applying a stop list against normalized words, so as to eliminate non-significant words; and
wherein comparing the normalized terms with the normalized words includes generating a normalized word list containing base words, each base word being associated with a respective text word position in the word set, and generating a normalized term list of all normalized terms; and
wherein identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria, includes outputting the text word that corresponds to the first text word position and outputting the text word that corresponds to the second text word position.
-
-
19. A system for identifying text in a word set comprising:
-
an input portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
a normalizing portion that normalizes target terms in the target term set to generate normalized terms, the normalizing portion further normalizing text words in the word set to generate normalized words;
a comparing portion that compares the normalized terms with the normalized words to determine;
a first match between a first normalized term and a first normalized word; and
a second match between a second normalized term and a second normalized word; and
a locations array processing portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the locations array processing portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria;
wherein the comparing portion compares a normalized word list containing base words, each base word being associated with a respective text word position in the word set, with a normalized term list, the normalizing portion using a stop list to determine if any of the normalized terms or any of the normalized words are insignificant; and
wherein the system outputs all text words between and including the text word that corresponds to the first text word position and the text word that corresponds to the second text word position, so as to output an identified phrase.
-
Specification