Method and apparatus for extracting entity names and their relations

US 9,430,742 B2
Filed: 06/02/2014
Issued: 08/30/2016
Est. Priority Date: 09/28/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

generating a number of Information-Gain (IG)-Trees based on a memory learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document;

extracting entity names and relations between entity names based on the IG-Trees;

receiving annotated data;

parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and

extracting the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one embodiment of the invention, a method includes generating a person-name Information Gain (IG)-Tree and a relation IG-Tree from annotated data. The method also includes tagging and partial parsing of an input document. The names of the persons are extracted within the input document using the person-name IG-tree. Additionally, names of organizations are extracted within the input document. The method also includes extracting entity names that are not names of persons and organizations within the input document. Further, the relations between the identified entity names are extracted using the relation-IG-tree.

Citations

12 Claims

1. A method comprising:
- generating a number of Information-Gain (IG)-Trees based on a memory learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document;
  
  extracting entity names and relations between entity names based on the IG-Trees;
  
  receiving annotated data;
  
  parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and
  
  extracting the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the number of IG-Trees is generated based on raw data that has been annotated.
  - 3. The method of claim 1, wherein the number of IG-Trees is generated based on a number of features of the annotated data.
  - 4. The method of claim 1, wherein the number of IG-Trees is selected from a group consisting of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree and a relation IG-Tree.

5. A non-transitory machine-readable medium comprising instructions which, when executed by a machine, cause the machine to:
- generate a number of Information-Gain (IG)-Trees based on a memory-learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document;
  
  extract entity names and relations between entity names based on the IG-Trees;
  
  receive annotated data;
  
  parse, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and
  
  extract the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus.
- View Dependent Claims (6, 7, 8)
- - 6. The non-transitory machine-readable medium of claim 5, wherein the number of IG-Trees is generated based on raw data that has been annotated.
  - 7. The non-transitory machine-readable medium of claim 5, wherein the number of IG-Trees is generated based on a number of features of the annotated data.
  - 8. The non-transitory machine-readable medium of claim 5, wherein the number of IG-Trees is selected from a group consisting of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree and a relation IG-Tree.

9. A system having a memory to store instructions, and a processing device to execute the instructions, wherein the instructions cause the processing device to:
- generate a number of Information-Gain (IG)-Trees based on a memory-learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document;
  
  extract entity names and relations between entity names based on the IG-Trees;
  
  receive annotated data;
  
  parse, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and
  
  extract the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, global context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9, wherein the number of IG-Trees is generated based on raw data that has been annotated.
  - 11. The system of claim 9, wherein the number of IG-Trees is generated based on a number of features of the annotated data.
  - 12. The system of claim 9, wherein the number of IG-Trees is selected from a group consisting of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree and a relation IG-Tree.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Zhang, Yimin, Zhou, Joe F.
Primary Examiner(s)
BURKE, JEFF A

Application Number

US14/293,898
Publication Number

US 20140289176A1
Time in Patent Office

820 Days
Field of Search

706 16- 18, 706/20, 706 25- 26, 707/755, 707/797
US Class Current

1/1
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/81   Indexing, e.g. XML tags; Da...

G06F 16/94   Hypermedia Hyperlinking G06...

G06N 20/00   Machine learning

Method and apparatus for extracting entity names and their relations

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for extracting entity names and their relations

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links