Method and system for information extraction and modeling

US 7,890,533 B2
Filed: 05/17/2006
Issued: 02/15/2011
Est. Priority Date: 05/17/2006
Status: Active Grant

First Claim

Patent Images

1. A method for visually modeling information sought from a set of documents implemented using a computer having a processor and a display, comprising:

identifying a set of documents;

applying a filter to the set of documents to produce raw text;

analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor;

creating a set of POS (part of speech) tagged documents based on the analysis of the raw text, the set of POS (part of speech) tagged documents corresponding to the set of documents;

presenting the analysis of the raw text to a user;

creating a plurality of concepts based on the analysis of the raw text;

creating a visual model comprising visual elements corresponding to the plurality of concepts;

presenting the visual model to the user on the display;

enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;

enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;

receiving a definition of a concept from the user via a selection of a visual model corresponding to the concept;

generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;

based on a user selection of one of the visual elements or the relations, extracting a POS (part of speech) tagged document from the set of POS (part of speech) tagged documents using the corresponding extractor, the extracted POS (part of speech) tagged document containing information related to the concept corresponding to the selected visual element or the selected relation;

presenting the extracted POS (part of speech) tagged document to the user;

customizing the visual model based on user input in response to the extracted POS (part of speech) tagged document; and

exporting the customized model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for modeling information from a set of documents are disclosed. A tool allows a user to extract and model concepts of interest and relations among the concepts from a set of documents. The tool automatically configures a database of the model so that the model and extracted concepts from the documents may be customized, modified, and shared.

Citations

38 Claims

1. A method for visually modeling information sought from a set of documents implemented using a computer having a processor and a display, comprising:
- identifying a set of documents;
  
  applying a filter to the set of documents to produce raw text;
  
  analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor;
  
  creating a set of POS (part of speech) tagged documents based on the analysis of the raw text, the set of POS (part of speech) tagged documents corresponding to the set of documents;
  
  presenting the analysis of the raw text to a user;
  
  creating a plurality of concepts based on the analysis of the raw text;
  
  creating a visual model comprising visual elements corresponding to the plurality of concepts;
  
  presenting the visual model to the user on the display;
  
  enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
  
  enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
  
  receiving a definition of a concept from the user via a selection of a visual model corresponding to the concept;
  
  generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;
  
  based on a user selection of one of the visual elements or the relations, extracting a POS (part of speech) tagged document from the set of POS (part of speech) tagged documents using the corresponding extractor, the extracted POS (part of speech) tagged document containing information related to the concept corresponding to the selected visual element or the selected relation;
  
  presenting the extracted POS (part of speech) tagged document to the user;
  
  customizing the visual model based on user input in response to the extracted POS (part of speech) tagged document; and
  
  exporting the customized model.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the extractors are defined by the user.
  - 3. The method of claim 1, wherein the extractors are assigned automatically.
  - 4. The method of claim 1, further comprising:
    - extracting a POS (part of speech) tagged document from the set of POS (part of speech) tagged documents according to the customized model.
  - 5. The method of claim 1, wherein customizing comprises:
    - identifying the visual elements related to the set of documents based on a lexica module;
      
      receiving a selection of the visual elements from the user; and
      
      updating the visual model according to the selection.
  - 6. The method of claim 1, wherein customizing comprises:
    - associating a unique identifier selected from the group consisting of a color, a font, and a shape with one of the elements of the visual display.

7. A method for visually modeling information sought from a set of documents implemented using a processor and a display, comprising:
- identifying a set of documents;
  
  applying a filter to the set of documents to produce raw text;
  
  analyzing the raw text using a lexica module and a POS (part of speech) tagger by operation of the processor;
  
  presenting the analysis of the raw text to a user;
  
  creating a plurality of concepts based on the analysis of the raw text;
  
  creating a visual model comprising visual elements corresponding to the plurality of concepts;
  
  presenting the visual model to the user on the display;
  
  enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
  
  enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
  
  receiving a definition of a concept from the user via selection of a visual model corresponding to the concept;
  
  generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;
  
  based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation;
  
  customizing the visual model based on user input in response to the extracted documents; and
  
  exporting the customized model.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 8. The method of claim 7, wherein the extractors are defined by the user.
  - 9. The method of claim 7, wherein the extractors are assigned automatically.
  - 10. The method of claim 7, wherein analyzing the raw text using a lexica module and a POS (part of speech) tagger comprises:
    - searching the raw text to identify grammatical parts of speech and lexica.
  - 11. The method of claim 7, wherein the visual elements include:
    - a synonym for the associated concept.
  - 12. The method of claim 7, wherein creating a plurality of concepts based on the analysis of the raw text further comprises:
    - creating a data structure of the plurality of concepts in a database; and
      
      updating the data structure to indicate which documents in the set of documents include at least one of the plurality of concepts.
  - 13. The method of claim 7, further comprising:
    - creating a POS (part of speech) tagged document based on the analysis of the raw text, the POS (part of speech) tagged document corresponding to the extracted document; and
      
      marking the POS (part of speech) tagged document based on the information related to the concept corresponding to the selected visual element or the selected relation.
  - 14. The method of claim 7, wherein the extracted document comprises a subset of the set of documents.
  - 15. The method of claim 7, wherein the extracted document is presented in a list of documents.
  - 16. The method of claim 7, further comprising:
    - displaying the visual model using an entity-relationship diagram, wherein the visual elements are depicted as entities, and the relations between the visual elements are depicted as relations between the entities.
  - 17. The method of claim 7, further comprising:
    - receiving a color specified by the user for each of the visual elements; and
      
      adding the specified colors to the visual elements.
  - 18. The method of claim 17, further comprising:
    - displaying the information related to the concept corresponding to the selected visual element using the specified colors for each of the visual elements.
  - 19. The method of claim 7, further comprising:
    - exporting the set of documents together with the customized model.
  - 20. The method of claim 7, further comprising:
    - exporting the set of documents together with the customized model using formats that facilitate at least one of sale, exchange, and reuse of the customized model with matching sets of documents.
  - 21. The method of claim 7, further comprising:
    - extracting a document from the set of documents according to the customized model, the extracted document containing information related to the concept corresponding to the selected visual element.
  - 22. The method of claim 7, further comprising:
    - displaying the model using one of;
      
      a document, a graph, a table, a map, a spreadsheet, and a chart.
  - 23. The method of claim 7, further comprising:
    - modifying a document in the set of documents.
  - 24. The method of claim 7, further comprising:
    - receiving a series of user inputs to extract a document from the set of documents using at least two visual elements in the visual model and the corresponding extractors; and
      
      creating a new concept, wherein the new concept is based on the series of user inputs.
  - 25. The method of claim 24, further comprising:
    - updating the visual model to include a visual element that represents the new concept.
  - 26. The method of claim 25, further comprising:
    - automatically assigning a new extractor to the visual element of the new concept based on the series of user inputs; and
      
      automatically updating a database based on the new extractor.
  - 27. The method of claim 24, wherein the new concept is created by the user after extracting a document from the set of documents.
  - 28. The method of claim 24, further comprising:
    - assigning an extractor to an existing concept based on the series of user inputs.
  - 29. The method of claim 24, wherein the series of user inputs include a user selection of text to retrieve from the set of documents.
  - 30. The method of claim 7, further comprising:
    - processing the extracted document to create post-extraction information.
  - 31. The method of claim 30, wherein processing the extracted document to create post-extraction information includes:
    - aggregating the extracted document into one of;
      
      a document, a graph, a table, a map, a spreadsheet, and a chart.
  - 32. The method of claim 30, wherein processing the extracted document to create post-extraction information includes:
    - receiving a user selection of categories for the extracted document.
  - 33. The method of claim 30, wherein processing the extracted document to create post-extraction information includes:
    - assigning a status to the extracted document.
  - 34. The method of claim 7, further comprising:
    - presenting the extracted document to the user; and
      
      marking the information related to the concept contained in the extracted document with an adjustable indicator to indicate to the user where the information is located in the document.
  - 35. The method of claim 34, wherein the adjustable indicator comprises at least one of color, underlining, and font change.

36. A system for visually modeling information sought from a set of documents, comprising:
- a processor;
  
  an identifying component configured to select a set of documents;
  
  a filter component configured to apply a filter to the set of documents to produce raw text;
  
  an analyzing component configured in the processor to analyze the raw text using a lexica module and a POS (part of speech) tagger;
  
  a concept component configured to create a plurality of concepts based on the analysis of the raw text;
  
  a visual model component configured in the processor to create a visual model comprising visual elements corresponding to the plurality of concepts;
  
  a display configured to present the analysis of the raw text and the visual model to a user;
  
  a graphical user interface configured to enable a user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
  
  the graphical user interface further configured to enable the user to add a new relation between the visual elements in the visual mode, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
  
  a concept definition component configured to receive a definition of a concept from the user via a selection of a visual model corresponding to the concept;
  
  a generation component configured in the processor to generate extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;
  
  an extraction component configured in the processor to extract a document from the set of documents using the corresponding extractor, based on a user selection of one of the visual elements or the relations, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation;
  
  an customization component configured in the processor to customize the visual model based on user input in response to the extracted documents; and
  
  an export component configured to export the customized model.

37. A system for visually modeling information sought from a set of documents, comprising:
- means for identifying a set of documents;
  
  means for applying a filter to the set of documents to product raw text;
  
  means for analyzing the raw text using a lexica module and a POS (part of speech) tagger;
  
  means for presenting the analysis of the raw text to a user;
  
  means for creating a plurality of concepts based on the analysis of the raw text;
  
  means for creating a visual model comprising visual elements corresponding to the plurality of concepts;
  
  means for presenting the visual model to the user;
  
  means for enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
  
  means for enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
  
  means for receiving a definition of a concept from the user via a selection of a visual element corresponding to the concept;
  
  means for generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model;
  
  means for, based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation;
  
  means for customizing the visual model based on user input in response to the extracted documents; and
  
  means for exporting the customized model.

38. A computer-readable medium including instructions for performing a method for visually modeling information sought from a set of documents, the method comprising:
- identifying a set of documents;
  
  applying a filter to the set of documents to produce raw text;
  
  analyzing the raw text using a lexica module and a POS (part of speech) tagger;
  
  presenting the analysis of the raw text to a user;
  
  creating a plurality of concepts based on the analysis of the raw text;
  
  creating a visual model comprising visual elements corresponding to the plurality of concepts;
  
  presenting the visual model to the user;
  
  enabling the user to add a new visual element to the visual model, the new visual element corresponding to a new concept;
  
  enabling the user to add a new relation between visual elements in the visual model, the new relation between visual elements representing a new relation between concepts corresponding to the visual elements;
  
  receiving a definition of a concept from the user via a selection of a visual element corresponding to the concept;
  
  generating extractors, each extractor corresponding to one of the visual elements or the relations between the visual elements in the visual model; and
  
  based on a user selection of one of the visual elements or the relations, extracting a document from the set of documents using the corresponding extractor, the extracted document containing information related to the concept corresponding to the selected visual element or the selected relation,customizing the visual model based on user input in response to the extracted documents, andexporting the customized model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Noblis, Inc.
Original Assignee
Noblis, Inc.
Inventors
Pollara, Victor J.
Primary Examiner(s)
Ali; Mohammad
Assistant Examiner(s)
Ruiz; Angelica

Application Number

US11/434,847
Publication Number

US 20100169299A1
Time in Patent Office

1,735 Days
Field of Search

None
US Class Current

707/790
CPC Class Codes

G06F 16/353   into predefined classes

G06F 40/247   Thesauruses; Synonyms

G06F 40/284   Lexical analysis, e.g. toke...

Y10S 707/99931   Database or file accessing

Method and system for information extraction and modeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for information extraction and modeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links