×

System, method, and program product for identifying and describing topics in a collection of electronic documents

  • US 6,775,677 B1
  • Filed: 03/02/2000
  • Issued: 08/10/2004
  • Est. Priority Date: 03/02/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer system for identifying and describing one or more topics in one or more documents in a document set, the system comprising:

  • one or more central processing units and one or more memories;

    a term set process that creates a basic term set from the document set being a set of one or more basic terms of one or more words;

    a document vector process that creates a document vector for each document that has a document vector direction representing what the document is about;

    a topic vector process that creates one or more topic vectors from the document vectors, each topic vector having topic vector direction representing a topic in the document set;

    a topic term set process that creates a topic term set for each topic vector that comprises one or more of the basic terms describing the topic represented by the topic vector, each of the basic terms in the topic term set associated with relevancy of the basic term;

    a topic-document relevance process that creates a topic-document relevance for each topic vector and each document vector, the topic-document relevance representing relevance of the document to the topic, wherein the topic-document relevance for a given topic vector and a given document vector is determined using corresponding ones of the topic vector directions and the document vector directions, where the topic-document relevance for a given topic vector and a given document vector is determined by computing a cosine between the given topic vector and the given document vector; and

    a topic sentence set process that creates a topic sentence set for each topic vector that comprises of one or more topic sentences that describe the topic represented by the topic vector, each of the topic sentences associated with relevance of the topic sentence to the topic represented by the topic vector.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×