Method and system for information extraction and modeling
First Claim
Patent Images
1. A method for visually modeling information sought from a set of documents implemented using a computer having a processor and a display, comprising:
- identifying a set of documents;
applying a filter to the set of documents to produce raw text;
analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor;
creating a set of POS (part of speech) tagged documents based on the analysis of the raw text, the set of POS (part of speech) tagged documents corresponding to the set of documents;
presenting the analysis of the raw text to a user;
creating a plurality of concepts based on the analysis of the raw text;
creating a visual model comprising visual elements corresponding to the plurality of concepts;
presenting the visual model to the user on the display;
enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
receiving a definition of a concept from the user via a selection of a visual model corresponding to the concept;
generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;
based on a user selection of one of the visual elements or the relations, extracting a POS (part of speech) tagged document from the set of POS (part of speech) tagged documents using the corresponding extractor, the extracted POS (part of speech) tagged document containing information related to the concept corresponding to the selected visual element or the selected relation;
presenting the extracted POS (part of speech) tagged document to the user;
customizing the visual model based on user input in response to the extracted POS (part of speech) tagged document; and
exporting the customized model.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for modeling information from a set of documents are disclosed. A tool allows a user to extract and model concepts of interest and relations among the concepts from a set of documents. The tool automatically configures a database of the model so that the model and extracted concepts from the documents may be customized, modified, and shared.
-
Citations
38 Claims
-
1. A method for visually modeling information sought from a set of documents implemented using a computer having a processor and a display, comprising:
-
identifying a set of documents; applying a filter to the set of documents to produce raw text; analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor; creating a set of POS (part of speech) tagged documents based on the analysis of the raw text, the set of POS (part of speech) tagged documents corresponding to the set of documents; presenting the analysis of the raw text to a user; creating a plurality of concepts based on the analysis of the raw text; creating a visual model comprising visual elements corresponding to the plurality of concepts; presenting the visual model to the user on the display; enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept; enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements; receiving a definition of a concept from the user via a selection of a visual model corresponding to the concept; generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; based on a user selection of one of the visual elements or the relations, extracting a POS (part of speech) tagged document from the set of POS (part of speech) tagged documents using the corresponding extractor, the extracted POS (part of speech) tagged document containing information related to the concept corresponding to the selected visual element or the selected relation; presenting the extracted POS (part of speech) tagged document to the user; customizing the visual model based on user input in response to the extracted POS (part of speech) tagged document; and exporting the customized model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for visually modeling information sought from a set of documents implemented using a processor and a display, comprising:
-
identifying a set of documents; applying a filter to the set of documents to produce raw text; analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor; presenting the analysis of the raw text to a user; creating a plurality of concepts based on the analysis of the raw text; creating a visual model comprising visual elements corresponding to the plurality of concepts; presenting the visual model to the user on the display; enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept; enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements; receiving a definition of a concept from the user via selection of a visual model corresponding to the concept; generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation; customizing the visual model based on user input in response to the extracted documents; and exporting the customized model. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A system for visually modeling information sought from a set of documents, comprising:
-
a processor; an identifying component configured to select a set of documents; a filter component configured to apply a filter to the set of documents to produce raw text; an analyzing component configured in the processor to analyze the raw text using a lexica module and a POS (part of speech) tagger; a concept component configured to create a plurality of concepts based on the analysis of the raw text; a visual model component configured in the processor to create a visual model comprising visual elements corresponding to the plurality of concepts; a display configured to present the analysis of the raw text and the visual model to a user; a graphical user interface configured to enable a user to add a new visual element to the visual model, the new visual element corresponding to a new concept; the graphical user interface further configured to enable the user to add a new relation between the visual elements in the visual mode, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements; a concept definition component configured to receive a definition of a concept from the user via a selection of a visual model corresponding to the concept; a generation component configured in the processor to generate extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; an extraction component configured in the processor to extract a document from the set of documents using the corresponding extractor, based on a user selection of one of the visual elements or the relations, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation; an customization component configured in the processor to customize the visual model based on user input in response to the extracted documents; and an export component configured to export the customized model.
-
-
37. A system for visually modeling information sought from a set of documents, comprising:
-
means for identifying a set of documents; means for applying a filter to the set of documents to product raw text; means for analyzing the raw text using a lexica module and a POS (part of speech) tagger; means for presenting the analysis of the raw text to a user; means for creating a plurality of concepts based on the analysis of the raw text; means for creating a visual model comprising visual elements corresponding to the plurality of concepts; means for presenting the visual model to the user; means for enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept; means for enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements; means for receiving a definition of a concept from the user via a selection of a visual element corresponding to the concept; means for generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; means for, based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation; means for customizing the visual model based on user input in response to the extracted documents; and means for exporting the customized model.
-
-
38. A computer-readable medium including instructions for performing a method for visually modeling information sought from a set of documents, the method comprising:
-
identifying a set of documents; applying a filter to the set of documents to produce raw text; analyzing the raw text using a lexica module and a POS (part of speech) tagger; presenting the analysis of the raw text to a user; creating a plurality of concepts based on the analysis of the raw text; creating a visual model comprising visual elements corresponding to the plurality of concepts; presenting the visual model to the user; enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept; enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements; receiving a definition of a concept from the user via a selection of a visual element corresponding to the concept; generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; and based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation, customizing the visual model based on user input in response to the extracted documents, and exporting the customized model.
-
Specification