×

Text-representation code, system, and method

  • US 20040059565A1
  • Filed: 07/01/2003
  • Published: 03/25/2004
  • Est. Priority Date: 07/03/2002
  • Status: Active Grant
First Claim
Patent Images

1. A computer-executed method for representing a natural-language document in a vector form suitable for text manipulation operations, comprising (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), determining a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and (b) representing the document as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×