×

Method of thematic classification of documents, themetic classification module, and search engine incorporating such a module

  • US 7,003,519 B1
  • Filed: 09/22/2000
  • Issued: 02/21/2006
  • Est. Priority Date: 09/24/1999
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of thematically classifying documents, in particular for making up or updating thematic databases for a search engine, the method comprising the following steps:

  • manually and/or automatically selecting a sample of documents representative of each theme;

    automatically identifying within the selected documents elements that are characteristic of each said theme;

    automatically allocating a coefficient to each identified element, wherein said coefficient is representative of a relevance of said element to a corresponding theme;

    downloading documents from a computer network;

    for each downloaded document to be classified, identifying said theme-characterizing elements that are contained in the document for each said theme, and for each theme corresponding to the elements, using the coefficients allocated to said elements to calculate a characteristic value representative of the relevance of that theme for the document, in order to decide whether or not the document relates to the theme, said theme—

    characterizing elements identification and calculation steps being performed automatically for each document downloaded from the computer network;

    automatically classifying the downloaded documents as a function of themes with which they deal;

    automatically storing the documents classified thematically in databases that can be interrogated on the basis of themes contained in a request; and

    making the databases available to users who interrogate the databases on the basis of themes contained in a request;

    and the step of allocating said coefficient to each identified element comprises the following steps for each theme;

    automatically calculating a frequency of the element in the selected documents relating to the theme;

    automatically calculating a frequency of the element in the selected documents that do not relate to the theme; and

    automatically calculating a ratio of the calculated frequencies of the theme-related element and of the non-theme-related element.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×