Method and system for event phrase identification

US 20040034649A1
Filed: 08/15/2002
Published: 02/19/2004
Est. Priority Date: 08/15/2002
Status: Active Grant

First Claim

Patent Images

1. A method for identifying text in a word set comprising:

retrieving a target term set including a plurality of target terms;

retrieving the word set including a plurality of text words;

normalizing target terms in the target term set to generate normalized terms;

normalizing text words in the word set to generate normalized words;

comparing the normalized terms with the normalized words to determine;

a first match between a first normalized term and a first normalized word; and

a second match between a second normalized term and a second normalized word; and

determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a system and method for identifying text in a word set. The method may include retrieving a target term set including a plurality of target terms; retrieving the word set including a plurality of text words; normalizing target terms in the target term set to generate normalized terms; normalizing text words in the word set to generate normalized words; comparing the normalized terms with the normalized words to determine (1) a first match between a first normalized term and a first normalized word; and (2) a second match between a second normalized term and a second normalized word. The method may further include determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.

55 Citations

View as Search Results

22 Claims

1. A method for identifying text in a word set comprising:
- retrieving a target term set including a plurality of target terms;
  
  retrieving the word set including a plurality of text words;
  
  normalizing target terms in the target term set to generate normalized terms;
  
  normalizing text words in the word set to generate normalized words;
  
  comparing the normalized terms with the normalized words to determine;
  
  a first match between a first normalized term and a first normalized word; and
  
  a second match between a second normalized term and a second normalized word; and
  
  determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 20, 21, 22)
- - 2. The method of claim 1, wherein normalizing words in the word set includes normalizing significant words and non-significant words.
  - 3. The method of claim 1, wherein normalizing words in the word set further includes applying a stop list against normalized words, so as to eliminate non-significant words.
  - 4. The method of claim 1, wherein the word set is at least a portion of a document.
  - 5. The method of claim 1, wherein retrieving the word set including a plurality of text words includes inputting a text document.
  - 6. The method of claim 5, wherein inputting the text document includes scanning the text document.
  - 7. The method of claim 1, wherein comparing the normalized terms with the normalized words includes generating a normalized word list containing base words, each base word being associated with a respective text word position in the word set.
  - 8. The method of claim 1, wherein comparing the normalized terms with the normalized words further includes generating a normalized word list of all normalized words, each normalized word in the normalized word list being associated with all the positions of a corresponding text word, in the word set;
    - and generating a normalized term list of all normalized terms.
  - 9. The method of claim 8, further including comparing each normalized term in the normalized term list with each normalized word in the normalized word list.
  - 10. The method of claim 1, wherein identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria, includes outputting the text word that corresponds to the first text word position and outputting the text word that corresponds to the second text word position.
  - 11. The method of claim 10, wherein the method further includes outputting all text words between the text word that corresponds to the first text word position and the text word that corresponds to the second text word position, so as to output an identified phrase.
  - 20. The method of claim 1, wherein the comparing the normalized terms with the normalized words further includes determining:
    - a third match between a third normalized term and a third normalized word; and
      
      determining a distance between the second text word position of the second normalized word and text word positions of the third normalized word to determine if any relative positions satisfy the threshold criteria, and identifying the second text word position and a third text word position as constituting possible identified text once a relative position of the second text word position and a text word position of the third normalized word satisfies the threshold criteria.
  - 21. The method of claim 20, further including outputting a phrase based on the first text word position, the second text word position and the third text word position if there are only three normalized terms.
  - 22. The method of claim 20, wherein the comparing the normalized terms with the normalized words further includes determining:
    - a fourth match between a fourth normalized term and a fourth normalized word; and
      
      determining a distance between the third text word position of the third normalized word and text word positions of the fourth normalized word to determine if any relative positions satisfy the threshold criteria, and identifying the third text word position and a fourth text word position as constituting possible identified text once a relative position of the third text word position and a text word position of the fourth normalized word satisfies the threshold criteria.

12. A system for identifying text in a word set comprising:
- an input portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
  
  a normalizing portion that normalizes target terms in the target term set to generate normalized terms, the normalizing portion further normalizing text words in the word set to generate normalized words;
  
  a comparing portion that compares the normalized terms with the normalized words to determine;
  
  a first match between a first normalized term and a first normalized word; and
  
  a second match between a second normalized term and a second normalized word; and
  
  a locations array processing portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the locations array processing portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The system of claim 12, wherein the comparing portion compares a normalized word list containing base words, each base word being associated with a respective text word position in the word set, with a normalized term list.
  - 14. The system of claim 12, wherein the normalizing portion uses a stop list to determine if any of the normalized terms or any of the normalized words are insignificant.
  - 15. The system of claim 12, wherein the system outputs the text word that corresponds to the first text word position and outputs the text word that corresponds to the second text word position.
  - 16. The system of claim 15, wherein the system outputs all text words between the text word that corresponds to the first text word position and the text word that corresponds to the second text word position, so as to output an identified phrase.

17. A computer readable medium for identifying text in a word set, the computer readable medium comprising:
- a first portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
  
  a second portion that normalizes target terms in the target term set to generate normalized terms, the second portion further normalizing text words in the word set to generate normalized words;
  
  a third portion that compares the normalized terms with the normalized words to determine;
  
  a first match between a first normalized term and a first normalized word; and
  
  a second match between a second normalized term and a second normalized word; and
  
  a fourth portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the fourth portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria.

18. A method for identifying text in a word set comprising:
- retrieving a target term set including a plurality of target terms;
  
  retrieving the word set including a plurality of text words;
  
  normalizing target terms in the target term set to generate normalized terms;
  
  normalizing text words in the word set to generate normalized words;
  
  comparing the normalized terms with the normalized words to determine;
  
  a first match between a first normalized term and a first normalized word; and
  
  a second match between a second normalized term and a second normalized word; and
  
  determining a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria;
  
  wherein normalizing words in the word set includes normalizing significant words and non-significant words, the normalizing words in the word set further includes applying a stop list against normalized words, so as to eliminate non-significant words; and
  
  wherein comparing the normalized terms with the normalized words includes generating a normalized word list containing base words, each base word being associated with a respective text word position in the word set, and generating a normalized term list of all normalized terms; and
  
  wherein identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria, includes outputting the text word that corresponds to the first text word position and outputting the text word that corresponds to the second text word position.

19. A system for identifying text in a word set comprising:
- an input portion that retrieves a target term set including a plurality of target terms, and that retrieves the word set including a plurality of text words;
  
  a normalizing portion that normalizes target terms in the target term set to generate normalized terms, the normalizing portion further normalizing text words in the word set to generate normalized words;
  
  a comparing portion that compares the normalized terms with the normalized words to determine;
  
  a first match between a first normalized term and a first normalized word; and
  
  a second match between a second normalized term and a second normalized word; and
  
  a locations array processing portion that determines a distance between a text word position of the first normalized word and a text word position of the second normalized word to determine if any relative positions satisfy threshold criteria, and the locations array processing portion identifying a first text word position and a second text word position as constituting possible identified text once a relative position of the text word position of the first normalized word and a text word position of the second normalized word satisfies the threshold criteria;
  
  wherein the comparing portion compares a normalized word list containing base words, each base word being associated with a respective text word position in the word set, with a normalized term list, the normalizing portion using a stop list to determine if any of the normalized terms or any of the normalized words are insignificant; and
  
  wherein the system outputs all text words between and including the text word that corresponds to the first text word position and the text word that corresponds to the second text word position, so as to output an identified phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
General Electric Capital Corporation (General Electric Company)
Original Assignee
General Electric Capital Corporation (General Electric Company)
Inventors
Bufi, Corey Nicholas, Czarnecki, David Anthony, Simmons, Melvin Kurt

Granted Patent

US 7,058,652 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/102
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 40/205   Parsing

Y10S 707/99943   Generating database or data...

Method and system for event phrase identification

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

55 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for event phrase identification

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links