×

System, method and apparatus for discovering phrases in a database

  • US 6,721,728 B2
  • Filed: 03/02/2001
  • Issued: 04/13/2004
  • Est. Priority Date: 03/02/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of discovering phrases from a database comprising:

  • providing a selection of text;

    extracting a plurality of phrases from the provided text, by a proces comprising;

    determining a plurality of phrase processing positions within the selection of text, by a process comprising;

    determining a plurality of phase processing starting positions within the selection of text, by a process comprising;

    identifying a phrase starting position (T1);

    initializing values for an iterative process by a process comprising setting an interior stopterm counter to zero and setting a tuple size to two;

    determining a plurality of phrase processing ending positions within the selection of relevant text, by a process comprising;

    identifying a phrase ending position (T2);

    identifying a position immediately subsequent to the phrase ending position T2 as the phrase ending position T2;

    extracting a plurality of phrases, wherein the first position of each of the plurality of phrases is one of the plurality of phrase processing starting positions (T1) and the last position of each of the plurality of phrases is a one of the plurality of phrase processing ending positions (T2), by a process comprising;

    identifying an indicated phrase, wherein an indicated phrase is a sequence of positions staffing at T1 and ending at T2;

    determining a tuple size, wherein tuple size is a count of positions within the indicated phrase;

    determining if the tuple size is greater than a maximum phrase length, and when the tuple size is not greater than the maximum phrase length, outputting the indicated phrase as an extracted phrase;

    culling the extracted plurality of phrases;

    gathering a plurality of phrases, wherein the gathered plurality of phrases are related by relevance to a plurality of contextual patterns included within and among the culled and extracted plurality of phrases; and

    outputting the plurality of gathered phrases.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×