Financial event and relationship extraction

US 10,049,100 B2
Filed: 01/30/2009
Issued: 08/14/2018
Est. Priority Date: 01/30/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of identifying and extracting by a computer financial information from tables in documents, the method comprising:

automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database;

screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain a desired relation without performing a detailed extraction process;

identifying within the identified document a table from a set of tables that contains at least one predetermined desired relation, wherein the at least one predetermined desired relation comprises a plurality of desired attributes and desired values;

partitioning by the computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label;

determining by the computer a set of attribute-value pairs by associating each value of the one or more values partitioned from the identified table with a plurality of the labels, with an abstract table including the set of attribute-value pairs; and

generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion in the database of financial information based on the determined set of attribute-value pairs.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

For automated text processing, the inventors devised, among other things, an exemplary system that automatically extracts financial events from various unstructured text based sources, such as press releases and news articles. Extracted events, such as mergers & acquisitions, earnings guidance reports, and actual earnings announcements, are represented as structured data records which can be linked, searched, and displayed and used as a basis for controlling accessing to the source documents and other related financial documents for named entities.

Citations

29 Claims

1. A computer-implemented method of identifying and extracting by a computer financial information from tables in documents, the method comprising:
- automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database;
  
  screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain a desired relation without performing a detailed extraction process;
  
  identifying within the identified document a table from a set of tables that contains at least one predetermined desired relation, wherein the at least one predetermined desired relation comprises a plurality of desired attributes and desired values;
  
  partitioning by the computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label;
  
  determining by the computer a set of attribute-value pairs by associating each value of the one or more values partitioned from the identified table with a plurality of the labels, with an abstract table including the set of attribute-value pairs; and
  
  generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion in the database of financial information based on the determined set of attribute-value pairs.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein identifying in the document a table that contains at least one predetermined desired relationship includes using a support vector machine.
  - 3. The method of claim 1, wherein the document is a Securities and Exchange Commission filing.

4. A computer-based information extraction system having at least one processor and at least one non-transitory memory for storing code, the system comprising:
- a document identifier set of code, stored in the memory, when executed by the processor adapted to automatically, without further intervention from a user, identify a document from a set of documents, the set of documents retrieved by the system from a document source database;
  
  a document screening set of code, stored in the memory, when executed by the processor adapted to screen the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables in the identified document that contain information of interest without performing a detailed extraction process;
  
  a table identifier set of code, stored in the memory, when executed by the processor adapted to automatically, without further intervention from a user, identify within the identified document a table from a set of tables that contains the information of interest, wherein the information of interest comprises a plurality of desired attributes and desired values;
  
  a normalization set of code, stored in the memory, when executed by the processor adapted to normalize information contained in the identified table by partitioning the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label;
  
  a value association set of code, stored in a memory, when executed by the processor adapted to determine a set of attribute-value pairs by associating each value of the one more values partitioned from the identified table with a plurality of the labels resulting in the set of attribute-value pairs; and
  
  a database set of code, stored in a memory, when executed by the processor adapted to generate a set of data for inclusion into a database of financial information, the set of data generated for inclusion into the database of financial information based at least in part on the determined set of attribute-value pairs.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 5. The system of claim 4, wherein the database set of code comprises a rule-based inference module when executed by the processor is adapted to populate the database of financial information based on the set of attribute-value pairs.
  - 6. The system of claim 4, wherein the table identifier set of code when executed by the processor is adapted to identify within a document a table by using relation-specific classifiers.
  - 7. The method of claim 4, wherein one or both of the table identifier set of code and the normalization set of code involve using supervised machine learning.
  - 8. The system of claim 7, wherein one or both of the table identifier set of code and the normalization set of code involve using annotation in performing the supervised machine learning.
  - 9. The system of claim 4, wherein the value association set of code when executed by the processor is further adapted to define a set of relations derived at least in part from the set of attribute-value pairs.
  - 10. The system of claim 9, wherein the set of relations includes a combination of two or more of the following:
    - name;
      
      age;
      
      title;
      
      salary;
      
      bonus;
      
      fiscal year;
      
      options;
      
      compensation.
  - 11. The system of claim 4, wherein the table identifier set of code when executed by the processor is adapted to identify within a document a table that contains information of interest based on determining the presence of a set of desired relations.
  - 12. The system of claim 11, wherein the table identifier set of code is based on using a support vector machine to train a set of at least one model for each of the set of desired relations.
  - 13. The system of claim 4, wherein the labels conform to a set of semantic rules with respect to a desired set of relations.

14. A computer-based method for extracting information, the method comprising:
- automatically, without further intervention from a user, identifying by a computer a document from a set of documents retrieved by the computer from a document source database;
  
  screening the identified document by a support vector machine classifier to distinguish between tables and non-tables and identify one or more tables that contain information of interest without performing a detailed extraction process;
  
  identifying within the identified document a table from a set of tables that contains the information of interest, wherein the information of interest comprises a plurality of desired attributes and desired values;
  
  normalizing by the computer information contained in the identified table by partitioning by a computer the identified table into a plurality of labels and one or more values, with one or more of the labels identified as a column label and one or more identified as a row label;
  
  determining by the computer a set of attribute-pairs by associating each value of the one more values partitioned from the identified table with a plurality of the labels resulting in the set of attribute-value pairs; and
  
  generating by the computer a set of data for inclusion into a database of financial information, the set of data generated for inclusion into the database of financial information based at least in part on the determined set of attribute-value pairs.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 15. The method of claim 14, wherein populating a database involves a rule-based process.
  - 16. The method of claim 14, wherein identifying a table of interest involves using relation-specific classifiers.
  - 17. The method of claim 14, wherein one or both of identifying a table of interest and normalizing information involve using supervised machine learning.
  - 18. The method of claim 17, wherein one or both of identifying a table of interest and normalizing information involve using annotation in performing the supervised machine learning.
  - 19. The method of claim 14, further comprising defining a set of relations derived at least in part from the set of attribute-value pairs.
  - 20. The method of claim 19, wherein the set of relations includes a combination of two or more of the following:
    - name;
      
      age;
      
      title;
      
      salary;
      
      bonus;
      
      fiscal year;
      
      options;
      
      compensation.
  - 21. The method of claim 14, wherein identifying a table of interest is based at least in part on determining the presence of a set of desired relations.
  - 22. The method of claim 21, wherein identifying a table of interest is based at least in part on using a support vector machine to train a set of at least one model for each of the set of desired relations.
  - 23. The method of claim 14, wherein the labels conform to a set of semantic rules with respect to a desired set of relations.
  - 24. The method of claim 14, further comprising:
    - automatically identifying and tagging a text segment in the document, the text segment comprising one or more of entity names, monetary expressions, and temporal expressions;
      
      automatically tagging the entity names, monetary expressions, and temporal expressions within the text segment in the document;
      
      identifying a financial event described within the automatically tagged text segment; and
      
      defining in memory a data record associated with the financial event, the data record including data derived from the tagged text segment.
  - 25. The method of claim 24, further comprising displaying on a display device at least a portion of the data record in association with a user selectable command feature of a graphical user interface for causing retrieval of a document including the text segment.
  - 26. The method of claim 24, wherein the text segment is a grammatically complete sentence.
  - 27. The method of claim 24, wherein the data record includes:
    - a company field including text identifying a named entity tagged in the text segment;
      
      a company ID field including a alphanumeric code identifying the named entity; and
      
      a time period field including an alphanumeric code identifying a financial reporting period.
  - 28. The method of claim 24, wherein the data record includes a field indicating whether a monetary expression tagged in the text segment is trending up or down based on a comparison to a previous value.
  - 29. The method of claim 24, further comprising:
    - automatically tagging entity names within a text segment as being one of a person, company, and location; and
      
      automatically associating one or more of the tagged entity names with an entry in a data set of named entities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Thomson Reuters Enterprise Centre GmbH (The Woodbridge Co. Ltd.)
Original Assignee
Thomson Reuters Global Resources Unlimited Company (The Woodbridge Co. Ltd.)
Inventors
Schilder, Frank, Shaw, James
Primary Examiner(s)
BORLINGHAUS, JASON M

Application Number

US12/363,524
Publication Number

US 20090327115A1
Time in Patent Office

3,483 Days
Field of Search

358462, 358448, 358453
US Class Current
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/36   Creation of semantic tools,...

G06F 40/295   Named entity recognition

G06Q 40/00   Finance; Insurance; Tax str...

G06Q 40/02   Banking, e.g. interest calc...

Financial event and relationship extraction

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Financial event and relationship extraction

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links