Mixing knowledge sources with auto learning for improved entity extraction
First Claim
Patent Images
1. A computer-implemented method, comprising:
- extracting by a processor instances from a plurality of sources using a plurality of knowledge extractors;
aggregating the instances from the plurality of sources;
extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and
building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances,wherein building the model comprises automatically generating training sets of instances.
9 Assignments
0 Petitions
Accused Products
Abstract
The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
extracting by a processor instances from a plurality of sources using a plurality of knowledge extractors; aggregating the instances from the plurality of sources; extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances, wherein building the model comprises automatically generating training sets of instances. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system, comprising:
-
a processor; and a memory, at least one of the processor or the memory being configured to; extract instances from a plurality of sources using a plurality of knowledge extractors; aggregate the instances; extract a plurality of feature vectors using a plurality of feature generators, wherein one of the feature generators extracts contexts of a query log for a plurality of seeds; calculates an association statistic between the contexts and seeds; sorts the contexts by the calculated association statistics and selects a group of the sorted contexts; for each selected context, generates a feature for a candidate instance comprising the association statistic between the candidate instance and the context; and automatically generate labeled training sets based, at least in part, upon extracted feature vectors and one or more sources of negative instances. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory computer-readable medium, comprising:
-
instructions for extracting instances from a plurality of sources using a plurality of knowledge extractors; instructions for aggregating the instances from the plurality of sources; instructions for extracting a plurality of feature vectors for a plurality of instances using a plurality of feature generators, each feature vector extracted by a plurality of feature generators; and instructions for building a model by using a modeler based, at least in part, upon the feature vectors extracted by the plurality of feature generators and extracted instances, wherein building the model comprises automatically generating training sets of instances. - View Dependent Claims (19, 20)
-
Specification