×

System and method for identifying query-relevant keywords in documents with latent semantic analysis

  • US 7,440,947 B2
  • Filed: 11/12/2004
  • Issued: 10/21/2008
  • Est. Priority Date: 11/12/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for identifying a set of query-relevant keywords comprised in one or more documents obtained from a query, wherein the query need not comprise the keywords, comprising:

  • a) creating a term-weight matrix M comprising one or more document term-weight vectors d, wherein each document term-weight vector d comprises information on the frequency in one of the one or more documents obtained from the terms in the query;

    b) creating an expanded query term-weight vector qexpanded from the query term-weight vector q and the term-weight matrix M, wherein the expanded query term-weight vector qexpanded identifies terms related to the query, wherein the query need not comprise the terms within the expanded query term-weight vector qexpanded;

    c) creating a keyword vector using the document term weight vectors d and the expanded query term-weight vector qexpanded, wherein the keyword vector identifies terms from the expanded query term-weight vector qexpanded that do not occur in at least one of the one or more documents, each term included in the keyword vector having a ranking value;

    d) generating an ordered list of document terms corresponding to the terms included in the keyword vector that occur in at least one of the one or more documents;

    e) selecting keywords from the ordered list of document terms having the highest ranking value; and

    f) highlighting the keywords in the one or more documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×