Method and system for maximum-informativeness information extraction using a domain-specific ontology

US 8,176,048 B2
Filed: 11/10/2009
Issued: 05/08/2012
Est. Priority Date: 11/10/2009
Status: Active Grant

First Claim

Patent Images

1. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:

recording the unstructured text using an information extraction module (IEM);

discovering text phrases contained in the recorded unstructured text via the IEM;

retrieving lexical data from a knowledge source using the IEM and the discovered text phrases;

processing each of the discovered text phrases and the lexical data using the IEM to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the plurality of nodes represents a corresponding single concept as a cluster of synonyms;

using the plurality of generated nodes to classify the discovered text phrases by corresponding objects of interest, thereby transforming the unstructured text into structured data;

generating a list of sub-phrases of the discovered text phrases;

mapping each sub-phrase in the generated list of sub-phrases into the domain-specific ontology via the IEM;

using an informativeness function to quantify each of the sub-phrases of the discovered text phrases by a normalized relative importance informativeness score of between 0 and 1; and

eliminating all sub-phrases from the domain-specific ontology that have an informativeness score that is less than a calibrated threshold.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method transforms unstructured text into structured data in a domain-specific ontology. The method includes recording an input block of text using an information extraction module (IEM), accessing a domain-specific ontology and supplemental data in a knowledge source(s) via the IEM, processing the input text block, and using the IEM to generate a plurality of nodes in the domain-specific ontology. Each node classifies the unstructured text to corresponding objects of interest, thereby transforming the unstructured text into the structured data. An IEM is also provided having a computer device and an algorithm executable thereby to transform unstructured text into structured data in a domain-specific ontology. The IEM is adapted for recording a text phrase using the computer device, accessing and retrieving the domain-specific ontology and supplemental data from a knowledge source(s), and processing the text block using the computer device to generate a plurality of nodes in the domain-specific ontology.

19 Citations

View as Search Results

16 Claims

1. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:
- recording the unstructured text using an information extraction module (IEM);
  
  discovering text phrases contained in the recorded unstructured text via the IEM;
  
  retrieving lexical data from a knowledge source using the IEM and the discovered text phrases;
  
  processing each of the discovered text phrases and the lexical data using the IEM to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the plurality of nodes represents a corresponding single concept as a cluster of synonyms;
  
  using the plurality of generated nodes to classify the discovered text phrases by corresponding objects of interest, thereby transforming the unstructured text into structured data;
  
  generating a list of sub-phrases of the discovered text phrases;
  
  mapping each sub-phrase in the generated list of sub-phrases into the domain-specific ontology via the IEM;
  
  using an informativeness function to quantify each of the sub-phrases of the discovered text phrases by a normalized relative importance informativeness score of between 0 and 1; and
  
  eliminating all sub-phrases from the domain-specific ontology that have an informativeness score that is less than a calibrated threshold.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - using the informativeness function to disambiguate different matches in the domain-specific ontology.
  - 3. The method of claim 1, further comprising:
    - automatically expanding a list of the synonyms in the cluster of synonyms using a thesaurus.
  - 4. The method of claim 1, further comprising:
    - using an un-annotated domain-specific text archive to process the unstructured text into a new block of text;
      
      automatically classifying phrases near the unstructured text; and
      
      using the classified phrases to infer a classification of the unstructured text.
  - 5. The method of claim 1, further comprising:
    - comparing a name of each node in the domain-specific ontology with all other nodes in the domain-specific ontology to thereby identify logical inconsistencies and linguistic inconsistencies.
  - 6. The method of claim 1, further comprising:
    - automatically cleaning the unstructured text by at least one of;
      
      splitting joined words in the unstructured text, joining split words in the unstructured text, expanding an abbreviation in the unstructured text, executing a spell check process on the unstructured text, removing from the unstructured text any words lacking a domain-specific meaning, and stemming words or phrases in the unstructured text using a stemmer program.

7. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:
- inputting the unstructured text into an information extraction module (IEM);
  
  embedding the unstructured text in a domain-specific text archive;
  
  retrieving data from a plurality of different knowledge sources, including the domain-specific text archive, via the IEM using the unstructured text;
  
  processing the unstructured text using the IEM based on the retrieved data to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the nodes represents a corresponding single concept as a cluster of synonyms for the unstructured text;
  
  transforming the unstructured text into the structured data via the plurality of nodes, including classifying the unstructured text to predetermined corresponding objects of interest;
  
  quantifying all sub-phrases of the classified unstructured text by relative informativeness as a normalized value between 0 and 1 using an informativeness function; and
  
  eliminating all quantified sub-phrases from the domain-specific ontology having a normalized value that is less than a calibrated threshold.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7, further comprising:
    - using the informativeness function to disambiguate different matches in the domain-specific ontology.
  - 9. The method of claim 7, further comprising:
    - automatically expanding the list of synonyms in the cluster of synonyms using the IEM and a thesaurus.
  - 10. The method of claim 7, further comprising:
    - automatically classifying phrases near the unstructured text; and
      
      using the classified phrases to infer a classification of the unstructured text.
  - 11. The method of claim 7, further comprising:
    - comparing a name of each node in the domain-specific ontology with all other nodes in the domain-specific ontology to thereby identify logical inconsistencies and linguistic inconsistencies.

12. An information extraction module (IEM) comprising:
- a computer device; and
  
  an algorithm recorded in memory of the computer device and executable by the computer device, wherein the computer device executes the algorithm to thereby;
  
  receive and record an input block of unstructured text;
  
  discover text phrases contained in the received input block of unstructured text;
  
  access and retrieve lexical data from a knowledge source using the discovered text phrases;
  
  process the discovered text phrases and the retrieved lexical data to thereby generate a plurality of nodes in the domain-specific ontology;
  
  map sub-phrases of the discovered text phrases into the domain-specific ontology;
  
  use an informativeness function to quantify the sub-phrases by their relative informativeness as a normalized value between 0 and 1; and
  
  eliminate all quantified sub-phrases from the domain-specific ontology having a normalized value that is less than a calibrated threshold;
  
  wherein each of the plurality of nodes classifies the unstructured text to predetermined corresponding objects of interest, thereby transforming the unstructured text into the structured data.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The IEM of claim 12, wherein the IEM uses the informativeness function to disambiguate different matches in the domain-specific ontology.
  - 14. The IEM of claim 12, wherein the IEM automatically expands a list of synonyms using a general thesaurus.
  - 15. The IEM of claim 12, wherein the IEM automatically classifies phrases near the unstructured text, and uses the classified phrases to infer a classification of the unstructured text.
  - 16. The IEM of claim 12, wherein the IEM compares a name of each node in the domain-specific ontology with all other nodes in the domain-specific ontology to thereby identify logical inconsistencies and linguistic inconsistencies.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GM Global Technology Operations LLC (General Motors Company)
Original Assignee
GM Global Technology Operations LLC (General Motors Company)
Inventors
Morgan, Alexander P.
Primary Examiner(s)
Pham, Hung Q

Application Number

US12/615,463
Publication Number

US 20110113069A1
Time in Patent Office

910 Days
Field of Search

None
US Class Current

707/736
CPC Class Codes

G06F 40/247 Thesauruses; Synonyms

G06F 40/289 Phrasal analysis, e.g. fini...

Method and system for maximum-informativeness information extraction using a domain-specific ontology

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for maximum-informativeness information extraction using a domain-specific ontology

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links