×

Method and system for lexical mapping between document sets having a common topic

  • US 7,565,361 B2
  • Filed: 04/12/2005
  • Issued: 07/21/2009
  • Est. Priority Date: 04/22/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer readable storage medium comprising instructions recorded thereon for causing a computer to perform a method of detecting, from a first document set and a second document set having a common topic, at least one of (a) terms in the second document set that correspond to specific terms in the first document set, and (b) terms in the first document set that correspond to specific terms in the second document set, the first and second document sets having been retrieved on the basis of a term list, said instructions comprising:

  • creating a first term matrix from the first document set on the basis of the frequency of each term listed in a first term list;

    creating a second term matrix from the second document set on the basis of the frequency of each term listed in a second term list;

    calculating a lexical mapping matrix from a product of the first term matrix and the second term matrix;

    selecting a predetermined number of terms in a specific row in the lexical mapping matrix in a descending order of values of elements to adopt the selected terms in the specific row as terms in the first document set that correspond to the specific terms in the second document set; and

    selecting a predetermined number of terms in a specific column in the lexical mapping matrix in the descending order of elements to adopt the selected terms in the specific column as terms in the second document set that correspond to the specific terms in the first document set;

    wherein;

    (a) the number of terms in the term list is s, (b) the number of terms selected from the first document set is n, (c) the first term matrix is represented by an s-by-n matrix P, (d) the frequency of the i-th term in the k-th document of the first document set is Exp(k,i), (e) the overall frequency of the i-th term is Etf(i), and (f) the total number of terms in the k-th document is Ewf(k), each of the elements (We(k,i)) of the matrix P is given by;

    We

    ( k , i )
    = Exp

    ( k , i )
    ( Etf

    ( i )
    * Ewf

    ( k )
    )
    [ Equation



    1
    ]
    (g) the number of terms selected from the second document set is m, (h) the second term matrix is represented by an s-by-m matrix Q, and (i) the frequency of the r-th term appearing in the k-th document of the second document set is Naive(k,r), (j) the overall frequency of the r-th term is Ntf(r), and (k) the total number of terms in the k-th document is Nwf(k), each of the elements (Wn(k,r)) of the matrix Q is given by;

    Wn

    ( k , r )
    = Naive

    ( k , r )
    ( Ntf

    ( r )
    * Nwf

    ( k )
    )
    .
    [ Equation



    2
    ]

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×