Mixing knowledge sources with auto learning for improved entity extraction

US 8,499,008 B2
Filed: 07/24/2009
Issued: 07/30/2013
Est. Priority Date: 07/24/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

extracting by a processor instances from a plurality of sources using a plurality of knowledge extractors;

aggregating the instances from the plurality of sources;

extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and

building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances,wherein building the model comprises automatically generating training sets of instances.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.

Citations

20 Claims

1. A computer-implemented method, comprising:
- extracting by a processor instances from a plurality of sources using a plurality of knowledge extractors;
  
  aggregating the instances from the plurality of sources;
  
  extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and
  
  building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances,wherein building the model comprises automatically generating training sets of instances.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, wherein building the model further comprises utilizing trusted positive instances in generating at least a portion of the training sets.
  - 3. The computer-implemented method of claim 2, further comprising automatically generating the trusted positive instances with a trusted knowledge extractor of the plurality of knowledge extractors.
  - 4. The computer-implemented method of claim 1, wherein building the decoder further comprises utilizing external positive instances in generating at least a portion of the training sets.
  - 5. The computer-implemented method of claim 1, wherein building the decoder further comprises utilizing same class negative instances in generating at least a portion of the training sets.
  - 6. The computer-implemented method of claim 1, wherein building the decoder further comprises utilizing near class negative instances in generating at least a portion of the training sets.
  - 7. The computer-implemented method of claim 5, further comprising acquiring the same class negative instances by taking a random sample of instances extracted by a distributional knowledge extractor of the plurality of knowledge extractors.
  - 8. The computer-implemented method of claim 5, further comprising acquiring the same class negative instances by taking a random sample of instances extracted by a pattern based knowledge extractor of the plurality of knowledge extractors.
  - 9. The computer-implemented method of claim 1, wherein building the decoder further comprises utilizing generic negative instances in generating at least a portion of the training sets.

10. A computer system, comprising:
- a processor; and
  
  a memory, at least one of the processor or the memory being configured to;
  
  extract instances from a plurality of sources using a plurality of knowledge extractors;
  
  aggregate the instances;
  
  extract a plurality of feature vectors using a plurality of feature generators, wherein one of the feature generatorsextracts contexts of a query log for a plurality of seeds;
  
  calculates an association statistic between the contexts and seeds;
  
  sorts the contexts by the calculated association statistics and selects a group of the sorted contexts;
  
  for each selected context, generates a feature for a candidate instance comprising the association statistic between the candidate instance and the context; and
  
  automatically generate labeled training sets based, at least in part, upon extracted feature vectors and one or more sources of negative instances.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer system of claim 10, wherein at least a portion of the training sets are automatically generated based, at least in part, upon trusted positive instances.
  - 12. The computer system of claim 11, wherein the trusted positive instances are generated via a trusted knowledge extractor of the plurality of knowledge extractors.
  - 13. The computer system of claim 10, wherein at least a portion of the training sets are automatically generated based, at least in part, upon external positive instances.
  - 14. The computer system of claim 10, wherein at least a portion of the training sets are automatically generated based, at least in part, upon same class negative instances.
  - 15. The computer system of claim 14, wherein the same class negative instances are acquired as a random sample of instances extracted by a distributional knowledge extractor of the plurality of knowledge extractors.
  - 16. The computer system of claim 14, wherein the same class negative instances are acquired as a random sample of instances extracted by a pattern based knowledge extractor of the plurality of knowledge extractors.
  - 17. The computer system of claim 10, wherein at least a portion of the training sets are automatically generated based, at least in part, upon generic negative instances.

18. A non-transitory computer-readable medium, comprising:
- instructions for extracting instances from a plurality of sources using a plurality of knowledge extractors;
  
  instructions for aggregating the instances from the plurality of sources;
  
  instructions for extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and
  
  instructions for building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances, wherein building the model comprises automatically generating training sets of instances.
- View Dependent Claims (19, 20)
- - 19. The non-transitory computer-readable medium of claim 18, wherein building the model further comprises utilizing trusted positive instances in generating at least a portion of the training sets.
  - 20. The non-transitory computer-readable medium of claim 19, further comprising:
    - instructions for generating the trusted positive instances using a trusted knowledge extractor of the plurality of knowledge extractors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Pennacchiotti, Marco, Pantel, Patrick
Primary Examiner(s)
NGUYEN, CAM LINH T

Application Number

US12/509,310
Publication Number

US 20110022550A1
Time in Patent Office

1,467 Days
Field of Search

707/694, 707/723, 707/736, 707748-752, 707/793, 707/802, 707/803, 707/810
US Class Current

707/803
CPC Class Codes

G06F 16/953 Querying, e.g. by the use o...

G06N 20/00 Machine learning

Mixing knowledge sources with auto learning for improved entity extraction

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Mixing knowledge sources with auto learning for improved entity extraction

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links