Methods for document indexing and analysis
First Claim
1. A method for analysis of a collection of documents which comprises the steps of:
- a) selecting a set of documents for analysis;
b) preparing for electronic analysis of the documents by incorporating selected document sections into a database compatible format;
c) selecting at least one document section;
d) forming a list of analysis key words from one or more document sections into an initial word list;
e) removing duplicate or nonessential words;
f) standardizing word forms of the initial word list to a common form;
g) setting a word list threshold;
h) removing word forms from the initial word list that are present with a frequency less than the word list threshold;
i) sorting the resulting word list by frequency;
j) forming a first word correlation matrix using the initial word list;
k) counting the frequency with which a word pair is found in a collection of selected documents;
l) setting a first frequency count threshold;
m) forming a first technology topics collection by selecting word pairs with frequency counts above the first frequency count threshold for each column in the first word correlation matrix;
n) forming an additional word correlation matrix from the words in the collection of the first technology topics;
o) forming an additional technology topics collection by associating word pairs from one or more first technology topics;
p) optionally repeating steps n and o;
q) counting, for each technology topic, the number of words appearing in each document of the document collection and optionally applying a weighting factor;
r) assigning each document in the collection to an additional technology topic; and
s) forming a standard picture of technology evolution by plotting individual documents on a graph with selected technology topics along one axis and a date along another axis;
wherein the collection of documents is chosen from one or more of the group consisting of patents, scientific papers, trade journal articles, newspaper articles, press releases, web pages and magazine articles.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention describes a method that is based on an analysis of document information, and which can be used for conducting and potentially accelerating business opportunity assessments and technology investment decisions. Documents include patent documents, scientific and trade literature, magazines, e-zines, Internet search results and the like. The system can be implemented as an automatic or semi-automatic analysis system, and it ultimately provides a document index or visual index from which technology, investment or business decisions can be drawn. The method comprises the steps of selecting a document data set for analysis, selecting one or more analysis means, such as means to determine relationships among the documents in the selected set, and then, forming a representation of the result for further analysis, display or planning.
-
Citations
4 Claims
-
1. A method for analysis of a collection of documents which comprises the steps of:
-
a) selecting a set of documents for analysis; b) preparing for electronic analysis of the documents by incorporating selected document sections into a database compatible format; c) selecting at least one document section; d) forming a list of analysis key words from one or more document sections into an initial word list; e) removing duplicate or nonessential words; f) standardizing word forms of the initial word list to a common form; g) setting a word list threshold; h) removing word forms from the initial word list that are present with a frequency less than the word list threshold; i) sorting the resulting word list by frequency; j) forming a first word correlation matrix using the initial word list; k) counting the frequency with which a word pair is found in a collection of selected documents; l) setting a first frequency count threshold; m) forming a first technology topics collection by selecting word pairs with frequency counts above the first frequency count threshold for each column in the first word correlation matrix; n) forming an additional word correlation matrix from the words in the collection of the first technology topics; o) forming an additional technology topics collection by associating word pairs from one or more first technology topics; p) optionally repeating steps n and o; q) counting, for each technology topic, the number of words appearing in each document of the document collection and optionally applying a weighting factor; r) assigning each document in the collection to an additional technology topic; and s) forming a standard picture of technology evolution by plotting individual documents on a graph with selected technology topics along one axis and a date along another axis; wherein the collection of documents is chosen from one or more of the group consisting of patents, scientific papers, trade journal articles, newspaper articles, press releases, web pages and magazine articles.
-
-
2. A method for analysis of a collection of documents which comprises the steps of:
-
a) selecting a set of documents for analysis; b) preparing for electronic analysis of the documents by incorporating selected document sections into a database compatible format; c) selecting at least one document section; d) forming a list of analysis key words from one or more document sections into an initial word list; e) removing duplicate or nonessential words; f) standardizing word forms of the initial word list to a common form; g) setting a word list threshold; h) removing word forms from the initial word list that are present with a frequency less than the word list threshold; i) sorting the resulting word list by frequency; j) forming a first word correlation matrix using the initial word list; k) counting the frequency with which a word pair is found in a collection of selected documents; l) setting a first frequency count threshold; m) forming a first technology topics collection by selecting word pairs with frequency counts above the first frequency count threshold for each column in the first word correlation matrix; n) forming an additional word correlation matrix from the words in the collection of the first technology topics; o) forming an additional technology topics collection by associating word pairs from one or more first technology topics; p) optionally repeating steps n and o; q) counting, for each technology topic, the number of words appearing in each document of the document collection and optionally applying a weighting factor; r) assigning each document in the collection to an additional technology topic; and s) forming a standard picture of technology evolution by plotting individual documents on a graph with selected technology topics along one axis and a date along another axis; wherein the document section is chosen from one or more of the group consisting of document titles, headlines, abstracts, bodies, methods, examples, summaries, results and claims.
-
-
3. A method for analysis of a collection of documents which comprises the steps of:
-
a) selecting a set of documents for analysis; b) preparing for electronic analysis of the documents by incorporating selected document sections into a database compatible format; c) selecting at least one document section; d) forming a list of analysis key words from one or more document sections into an initial word list; e) removing duplicate or nonessential words; f) standardizing word forms of the initial word list to a common form; g) setting a word list threshold; h) removing word forms from the initial word list that are present with a frequency less than the word list threshold; i) sorting the resulting word list by frequency j) forming a first word correlation matrix using the initial word list; k) counting the frequency with which a word pair is found in a collection of selected documents; l) setting a first frequency count threshold; m) forming a first technology topics collection by selecting word pairs with frequency counts above the first frequency count threshold for each column in the first word correlation matrix; n) forming an additional word correlation matrix from the words in the collection of the first technology topics; o) forming an additional technology topics collection by associating word pairs from one or more first technology topics; p) optionally repeating steps n and o; q) counting, for each technology topic, the number of words appearing in each document of the document collection and optionally applying a weighting factor; r) assigning each document in the collection to an additional technology topic; and s) forming a standard picture of technology evolution by plotting individual documents on a graph with selected technology topics along one axis and a date along another axis, thereby creating an x-y plot; wherein the x-axis of the x-y plot is time-based and the time base of the x-y plot is chosen from the group consisting of publication date, filing date, issue date and priority date. - View Dependent Claims (4)
-
Specification