Financial event and relationship extraction
First Claim
Patent Images
1. A computer-implemented method of identifying and extracting by a computer financial information from tables in documents, the method comprising:
- automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database;
screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain a desired relation without performing a detailed extraction process;
identifying within the identified document a table from a set of tables that contains at least one predetermined desired relation, wherein the at least one predetermined desired relation comprises a plurality of desired attributes and desired values;
partitioning by the computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label;
determining by the computer a set of attribute-value pairs by associating each value of the one or more values partitioned from the identified table with a plurality of the labels, with an abstract table including the set of attribute-value pairs; and
generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion in the database of financial information based on the determined set of attribute-value pairs.
4 Assignments
0 Petitions
Accused Products
Abstract
For automated text processing, the inventors devised, among other things, an exemplary system that automatically extracts financial events from various unstructured text based sources, such as press releases and news articles. Extracted events, such as mergers & acquisitions, earnings guidance reports, and actual earnings announcements, are represented as structured data records which can be linked, searched, and displayed and used as a basis for controlling accessing to the source documents and other related financial documents for named entities.
-
Citations
29 Claims
-
1. A computer-implemented method of identifying and extracting by a computer financial information from tables in documents, the method comprising:
-
automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database; screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain a desired relation without performing a detailed extraction process; identifying within the identified document a table from a set of tables that contains at least one predetermined desired relation, wherein the at least one predetermined desired relation comprises a plurality of desired attributes and desired values; partitioning by the computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label; determining by the computer a set of attribute-value pairs by associating each value of the one or more values partitioned from the identified table with a plurality of the labels, with an abstract table including the set of attribute-value pairs; and generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion in the database of financial information based on the determined set of attribute-value pairs. - View Dependent Claims (2, 3)
-
-
4. A computer-based information extraction system having at least one processor and at least one non-transitory memory for storing code, the system comprising:
-
a document identifier set of code, stored in the memory, when executed by the processor adapted to automatically, without further intervention from a user, identify a document from a set of documents, the set of documents retrieved by the system from a document source database; a document screening set of code, stored in the memory, when executed by the processor adapted to screen the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables in the identified document that contain information of interest without performing a detailed extraction process; a table identifier set of code, stored in the memory, when executed by the processor adapted to automatically, without further intervention from a user, identify within the identified document a table from a set of tables that contains the information of interest, wherein the information of interest comprises a plurality of desired attributes and desired values; a normalization set of code, stored in the memory, when executed by the processor adapted to normalize information contained in the identified table by partitioning the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label; a value association set of code, stored in a memory, when executed by the processor adapted to determine a set of attribute-value pairs by associating each value of the one more values partitioned from the identified table with a plurality of the labels resulting in the set of attribute-value pairs; and a database set of code, stored in a memory, when executed by the processor adapted to generate a set of data for inclusion into a database of financial information, the set of data generated for inclusion into the database of financial information based at least in part on the determined set of attribute-value pairs. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-based method for extracting information, the method comprising:
-
automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database; screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain information of interest without performing a detailed extraction process; identifying within the identified document a table from a set of tables that contains the information of interest, wherein the information of interest comprises a plurality of desired attributes and desired values; normalizing by the computer information contained in the identified table by partitioning by a computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label; determining by the computer a set of attribute-pairs by associating each value of the one more values partitioned from the identified table with a plurality of the labels resulting in the set of attribute-value pairs; and generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion into the database of financial information based at least in part on the determined set of attribute-value pairs. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification