Concept Driven Automatic Section Identification
First Claim
1. A method, in a data processing system comprising a processor and a memory, for generating section metadata for an electronic document, the method comprising:
- receiving, by the data processing system, an electronic document for processing;
analyzing, by the data processing system, the electronic document to identify concepts present within textual content of the electronic document;
correlating, by the data processing system, concepts within the textual content with one another to identify concept groups within the textual content based on the application of one or more rules defining related concepts or concept patterns;
determining, by the data processing system, at least one section of text within the textual content based on the correlation of concepts within the textual content;
generating, by the data processing system, based on results of the determining, section metadata for the electronic document to thereby identify the at least one section in the electronic document; and
storing, by the data processing system, the section metadata in association with the electronic document for use by a document processing system.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms are provided for generating section metadata for an electronic document. These mechanisms receive a document and analyze the document to identify concepts present within textual content of the document. The mechanisms correlate concepts within the textual content with one another to identify concept groups based on the application of one or more rules defining related concepts or concept patterns. The mechanisms determine sections of text within the textual content based on the correlation of concepts within the textual content. Based on results of the determining, the mechanisms generate section metadata for the document and store the section metadata in association with the document for use by a document processing system.
-
Citations
20 Claims
-
1. A method, in a data processing system comprising a processor and a memory, for generating section metadata for an electronic document, the method comprising:
-
receiving, by the data processing system, an electronic document for processing; analyzing, by the data processing system, the electronic document to identify concepts present within textual content of the electronic document; correlating, by the data processing system, concepts within the textual content with one another to identify concept groups within the textual content based on the application of one or more rules defining related concepts or concept patterns; determining, by the data processing system, at least one section of text within the textual content based on the correlation of concepts within the textual content; generating, by the data processing system, based on results of the determining, section metadata for the electronic document to thereby identify the at least one section in the electronic document; and storing, by the data processing system, the section metadata in association with the electronic document for use by a document processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to:
-
receive, an electronic document for processing; analyze the electronic document to identify concepts present within textual content of the electronic document; correlate concepts within the textual content with one another to identify concept groups within the textual content based on the application of one or more rules defining related concepts or concept patterns; determine at least one section of text within the textual content based on the correlation of concepts within the textual content; generate, based on results of the determining, section metadata for the electronic document to thereby identify the at least one section in the electronic document; and store the section metadata in association with the electronic document for use by a document processing system. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; receive, an electronic document for processing; analyze the electronic document to identify concepts present within textual content of the electronic document; correlate concepts within the textual content with one another to identify concept groups within the textual content based on the application of one or more rules defining related concepts or concept patterns; determine at least one section of text within the textual content based on the correlation of concepts within the textual content; generate, based on results of the determining, section metadata for the electronic document to thereby identify the at least one section in the electronic document; and store the section metadata in association with the electronic document for use by a document processing system.
-
Specification