×

Methods for document indexing and analysis

  • US 7,333,984 B2
  • Filed: 03/18/2005
  • Issued: 02/19/2008
  • Est. Priority Date: 08/09/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for analysis of a collection of documents which comprises the steps of:

  • a) selecting a set of documents for analysis;

    b) preparing for electronic analysis of the documents by incorporating selected document sections into a database compatible format;

    c) selecting at least one document section;

    d) forming a list of analysis key words from one or more document sections into an initial word list;

    e) removing duplicate or nonessential words;

    f) standardizing word forms of the initial word list to a common form;

    g) setting a word list threshold;

    h) removing word forms from the initial word list that are present with a frequency less than the word list threshold;

    i) sorting the resulting word list by frequency;

    j) forming a first word correlation matrix using the initial word list;

    k) counting the frequency with which a word pair is found in a collection of selected documents;

    l) setting a first frequency count threshold;

    m) forming a first technology topics collection by selecting word pairs with frequency counts above the first frequency count threshold for each column in the first word correlation matrix;

    n) forming an additional word correlation matrix from the words in the collection of the first technology topics;

    o) forming an additional technology topics collection by associating word pairs from one or more first technology topics;

    p) optionally repeating steps n and o;

    q) counting, for each technology topic, the number of words appearing in each document of the document collection and optionally applying a weighting factor;

    r) assigning each document in the collection to an additional technology topic; and

    s) forming a standard picture of technology evolution by plotting individual documents on a graph with selected technology topics along one axis and a date along another axis;

    wherein the collection of documents is chosen from one or more of the group consisting of patents, scientific papers, trade journal articles, newspaper articles, press releases, web pages and magazine articles.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×