Attribute extraction
First Claim
Patent Images
1. A method comprising:
- receiving a document from a user device;
analyzing the document to extract a plurality of values having corresponding attributes associated with attribute-value pairs in one or more domain models, wherein the analysis is based at least in part on associating the document with the one or more domain models of a plurality of domain models generated through analysis of a plurality of documents from one or more corpora and using an ontology associated with the one or more domain models to determine one or more attributes from the document, wherein the ontology includes terms relevant to particular domains corresponding to the one or more domain models, and wherein the attributes from the document are associated with particular terms of the ontology;
associating the determined one or more attributes with the document; and
using the associated attributes to generate search results responsive to a received search query.
2 Assignments
0 Petitions
Accused Products
Abstract
A data object submitted for storage is analyzed, and a set of values is extracted from the data object that can correspond to a set of attributes. The analysis of the data object can also identify possible new ontology terms. One or more extracted values are presented to the entity which submitted the data object for approval and feedback. This feedback can be used to characterize the data object with appropriate terms, train the extraction process for future extractions, and/or expand the set of known ontology terms.
-
Citations
16 Claims
-
1. A method comprising:
-
receiving a document from a user device; analyzing the document to extract a plurality of values having corresponding attributes associated with attribute-value pairs in one or more domain models, wherein the analysis is based at least in part on associating the document with the one or more domain models of a plurality of domain models generated through analysis of a plurality of documents from one or more corpora and using an ontology associated with the one or more domain models to determine one or more attributes from the document, wherein the ontology includes terms relevant to particular domains corresponding to the one or more domain models, and wherein the attributes from the document are associated with particular terms of the ontology; associating the determined one or more attributes with the document; and using the associated attributes to generate search results responsive to a received search query. - View Dependent Claims (2, 3, 4)
-
-
5. A system comprising:
one or more computers including one or more storage devices and one or more processor devices configured to perform operations comprising; receiving a document from a user device; analyzing the document to extract a plurality of values having corresponding attributes associated with attribute-value pairs in one or more domain models, wherein the analysis is based at least in part on associating the document with the one or more domain models generated through analysis of a plurality of documents from one or more corpora and using an ontology associated with the one or more domain models to determine one or more attributes from the document, wherein the ontology includes terms relevant to particular domains corresponding to the one or more domain models, and wherein the attributes from the document are associated with particular terms of the ontology; associating the determined one or more attributes with the document; and using the associated attributes to generate search results responsive to a received search query. - View Dependent Claims (6, 7, 8)
-
9. A method comprising:
-
analyzing a document using one or more trained classifiers to extract a plurality of text values from the document and determine corresponding attributes for the text values based at least in part on an ontology containing a plurality of ontology terms relevant to a particular area of interest and that are associated with particular attributes, wherein each of the corresponding attributes is associated with a respective confidence value determined by one of the trained classifiers, wherein the confidence value represents a level of confidence in assigning the respective attribute to the document; in response to a particular text value that is associated with a confidence value that does not satisfy a threshold but is greater than a minimum confidence value, presenting the corresponding attribute as a question to a client user; assigning each of one or more of the plurality of text values to a corresponding attribute for the text value that is associated with a respective confidence value that satisfies the threshold; associating the attributes with the document; and using the associated attributes to generate search results responsive to a received search query. - View Dependent Claims (10, 11, 12)
-
-
13. A system comprising:
-
one or more computers including one or more storage devices and one or more processor devices configured to perform operations comprising; analyzing a document using one or more trained classifiers to extract a plurality of text values from the document and determine corresponding attributes for the text values based at least in part on an ontology, wherein each of the corresponding attributes is associated with a respective confidence value determined by one of the trained classifiers; in response to a particular text value that is associated with a confidence value that does not satisfy a threshold but is greater than a minimum confidence value, presenting the corresponding attribute as a question to a client user; assigning each of one or more of the plurality of text values to a corresponding attribute for the text value that is associated with a respective confidence value that satisfies the threshold; associating the attributes with the document; and using the associated attributes to generate search results responsive to a received search query. - View Dependent Claims (14, 15, 16)
-
Specification