Document relationship analysis system
First Claim
Patent Images
1. A system for analyzing relationships between documents, the system comprising:
- a user interface;
an ingest memory configured to store source documents retrieved from an external document source;
a text index memory configured to store a text index;
a cluster index memory configured to store document vectors associated with each source document;
a text extraction pipeline automatically extracting text from source documents added to the ingest memory;
a document vector calculator automatically computing document vectors for source documents by applying term weights to the extracted text associated with the source document, the document vector calculator generating a plurality of profile document vectors associated with profile documents selected for use in a query against a target dataset;
an indexer automatically building an index of the extracted text and storing the text index in the text index memory;
a dataset manager component generating a result dataset containing documents of interest from a target dataset containing selected source documents based on a query by evaluating similarities between each profile document vector and the document vector calculated for each source document in the target dataset; and
a relationship analyzer component automatically selecting a visualization model for clustering the documents of interest based the number of documents of interest in the result dataset and rendering the result set using selected visualization model in the user interface.
2 Assignments
0 Petitions
Accused Products
Abstract
A document relationship analysis system. Aspects of the system include ingesting, discovering, recommending, analyzing, and exporting documents of interest. The system dynamically searches large or streaming datasets using a tiered, multi-step approach that includes discovery techniques and recommender components to filter and refine these larger datasets to smaller datasets of documents of interest. The system dynamically selects and renders an appropriate visualization for result datasets based on predetermined measures that allow for facilitate analysis of the documents of interest.
27 Citations
20 Claims
-
1. A system for analyzing relationships between documents, the system comprising:
-
a user interface; an ingest memory configured to store source documents retrieved from an external document source; a text index memory configured to store a text index; a cluster index memory configured to store document vectors associated with each source document; a text extraction pipeline automatically extracting text from source documents added to the ingest memory; a document vector calculator automatically computing document vectors for source documents by applying term weights to the extracted text associated with the source document, the document vector calculator generating a plurality of profile document vectors associated with profile documents selected for use in a query against a target dataset; an indexer automatically building an index of the extracted text and storing the text index in the text index memory; a dataset manager component generating a result dataset containing documents of interest from a target dataset containing selected source documents based on a query by evaluating similarities between each profile document vector and the document vector calculated for each source document in the target dataset; and a relationship analyzer component automatically selecting a visualization model for clustering the documents of interest based the number of documents of interest in the result dataset and rendering the result set using selected visualization model in the user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of analyzing relationships between documents, the method comprising the acts of:
-
extracting text from source documents received from an external document source; storing the extracted text; creating an index of the extracted text; computing a document vector for each source document using the extracted text automatically when the extracted text is stored; storing the document vectors for each source document; extracting text from profile documents received from an external document source; storing the extracted text from the profile documents; computing a document vector for each profile document using the extracted text automatically when the extracted text is stored; computing a combined profile document vector from the profile document vectors of selected profile documents associated with a query; receiving a selection of a plurality of source documents as a target dataset and parameters of a query via a user interface; generating a result dataset containing documents of interest from the target dataset based on the query by evaluating similarities between the combined profile document vector and the document vector calculated for each source document in the target dataset; and automatically selecting a visualization model for clustering the documents of interest based the number of documents of interest in the result dataset and rendering the result set using selected visualization model in a user interface. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer readable medium containing computer executable instructions which, when executed by a computer, perform a method for analyzing relationships between documents, the method comprising the acts of:
-
extracting text from source documents received from an external document source; storing the extracted text; creating an index of the extracted text; indexing the extracted text from each source document automatically when the extracted text is stored; storing the indexed text; computing a document vector for each source document using the extracted text automatically when the extracted text is stored; storing the document vectors for each source document; extracting text from profile documents received from an external document source; storing the extracted text from the profile documents; computing a document vector for each profile document using the extracted text automatically when the extracted text is stored; computing a combined profile document vector from the profile document vectors of selected profile documents associated with a query; receiving a selection of a plurality of source documents as a target dataset and parameters of a query via a user interface; generating a result dataset containing documents of interest from the target dataset based on the query by evaluating similarities between the combined profile document vector and the document vector calculated for each source document in the target dataset; and automatically selecting a visualization model for clustering the documents of interest based the number of documents of interest in the result dataset and rendering the result set using selected visualization model in a user interface.
-
Specification