Automatic Crowd Sourcing for Machine Learning in Information Extraction
First Claim
1. A method for enabling machine learning from unstructured documents, the method comprisinganalyzing, at an electronic device, one or more structured databases, thereby providing a mapping between a plurality of referenced character strings and a corresponding plurality of type labels;
- providing, at the electronic device, a first unstructured document comprising a plurality of unstructured character strings;
analyzing the first unstructured document to identify a first character string of the plurality of unstructured character strings which is associated with a first referenced character string of the plurality of referenced character strings;
annotating, within the first unstructured document, the first character string with a first type label which is mapped to the first referenced character string; and
determining a training set for machine learning from the first unstructured document comprising the annotation with the first type label.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for enabling machine learning from unstructured documents is described. The method comprises analyzing at an electronic device, one or more structured databases, thereby providing a mapping between a plurality of referenced character strings and a corresponding plurality of type labels; providing, at the electronic device, a first unstructured document comprising a plurality of unstructured character strings; analyzing the first unstructured document to identify a first character string of the plurality of unstructured character strings which is associated with a first referenced character string of the plurality of referenced character strings; associating, within the first unstructured document, a first type label which is mapped to the first referenced character string to the first character string; and determining a training set for machine learning from the first unstructured document comprising the association to the first type label.
-
Citations
15 Claims
-
1. A method for enabling machine learning from unstructured documents, the method comprising
analyzing, at an electronic device, one or more structured databases, thereby providing a mapping between a plurality of referenced character strings and a corresponding plurality of type labels; -
providing, at the electronic device, a first unstructured document comprising a plurality of unstructured character strings; analyzing the first unstructured document to identify a first character string of the plurality of unstructured character strings which is associated with a first referenced character string of the plurality of referenced character strings; annotating, within the first unstructured document, the first character string with a first type label which is mapped to the first referenced character string; and determining a training set for machine learning from the first unstructured document comprising the annotation with the first type label. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system configured for enabling machine learning from unstructured documents, the system comprising an electronic device configured to
analyze one or more structured databases, thereby providing a mapping between a plurality of referenced character strings and a corresponding plurality of type labels; -
provide a first unstructured document comprising a plurality of unstructured character strings; analyze the first unstructured document to identify a first plurality of character strings of the plurality of unstructured character strings which is associated with a first plurality of referenced character strings of the plurality of referenced character strings; associate, within the first unstructured document, the first plurality of type labels which is mapped to the first plurality of referenced character strings to the corresponding first plurality of character strings; determine a training set for machine learning from the first unstructured document comprising the association to the first type label. - View Dependent Claims (15)
-
Specification