×

Method for calculating relevance between words based on document set and system for executing the method

  • US 8,407,233 B2
  • Filed: 12/10/2007
  • Issued: 03/26/2013
  • Est. Priority Date: 12/12/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method, using a processor, of calculating relevance among words based on a relevance of each word in a document, the method comprising:

  • generating statistical information associated with relevance among words by calculating a crossing frequency of words associated with a number of times of each of cross-word being appeared in a document, an appearance frequency of a word, or a word-word combination frequency associated with an appearance and a non-appearance of a combination of a first word and a second word, wherein the appearance frequency is a number of times that a word appears and frequency information is generated based on one of the appearance frequency or the crossing frequency, or the word-word combination frequency to provide the statistical information, the calculation being performed by the processor according to word-word or word-document classification;

    standardizing the statistical information by applying a parameter to the calculated statistical information, wherein the standardizing the statistical information comprises generating a combination probability distribution of a random variable corresponding to a pair of words and standardizing the statistical information based on the word-word combination frequency, wherein the word-word combination frequency associated with the pair of words is a number of documents that include all words in the pair, a number of documents that do not include any word in the pair, and a number of documents that include one of the words in the pair, and wherein the random variable is defined in a point space of columns and rows that comprise appearance or non-appearance points of the word;

    determining, by the processor, the relevance among the words as a numerical value based on the standardization; and

    providing the numerical value associated with the relevance among words to a search system.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×