×

System for information discovery

  • US 20030097375A1
  • Filed: 11/16/2002
  • Published: 05/22/2003
  • Est. Priority Date: 09/13/1996
  • Status: Active Grant
First Claim
Patent Images

1. A method for analyzing and characterizing a database of electronically formatted natural language based documents comprising the steps of:

  • a) subjecting the database to a sequence of word filters to eliminate terms in the database which do not discriminate document content, resulting in a filtered word set whose members are highly predictive of content;

    b) defining a subset of the filtered word set as the topic set, said topic set being characterized as the set of filtered words which best discriminate the content of the documents which contain them, c) forming a two dimensional matrix with the words contained within the topic set defining one dimension of said matrix and the words contained within the filtered word set comprising the other dimension of said matrix d) calculating matrix entries as the conditional probability that a document in the database will contain each word in the topic set given that it contains each word in the filtered word set, and e) providing said matrix entries as vectors to interpret the document contents of said database.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×