Information extraction apparatus and methods
First Claim
1. A non-transitory information extraction computing apparatus including at least one processor with accessible input/output and at least one data store, said at least one processor being programmed for extracting data for review by a human curator from digital representations of documents comprising natural language text, the information extraction computing apparatus, in use, executing computer program instructions that cause the apparatus to provide a plurality of selectable operating modes in which the automatic information extraction apparatus extracts different data for review by a human curator, and, in at least two of the plurality of selectable operating modes, to extract data with different balances between precision and recall, whereby a balance that favours precision over recall will lead to fewer incorrect instances of data being extracted than a balance that favours recall over precision, but will omit more data which should have been extracted.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatic information extraction apparatus for extracting data for review by a human curator from digital representations of documents comprising natural language text, the automatic information extraction apparatus having a plurality of selectable operating modes in which the automatic information extraction apparatus is operable to extract different data for review by a human curator. In the different operating modes, the information extraction apparatus may extract data with a different balance between recall and precision.
-
Citations
24 Claims
- 1. A non-transitory information extraction computing apparatus including at least one processor with accessible input/output and at least one data store, said at least one processor being programmed for extracting data for review by a human curator from digital representations of documents comprising natural language text, the information extraction computing apparatus, in use, executing computer program instructions that cause the apparatus to provide a plurality of selectable operating modes in which the automatic information extraction apparatus extracts different data for review by a human curator, and, in at least two of the plurality of selectable operating modes, to extract data with different balances between precision and recall, whereby a balance that favours precision over recall will lead to fewer incorrect instances of data being extracted than a balance that favours recall over precision, but will omit more data which should have been extracted.
-
19. A non-transitory computer readable carrier tangibly embodying computer program instructions which, when executed by a computing apparatus, cause the computing apparatus to carry out a method of optimising an automatic information extraction apparatus that extracts data for review by a human curator from digital representations of documents comprising natural language text, the method comprising the steps of:
-
(i) extracting data from at least one digital representation of a document comprising natural language text, using an information extraction module; (ii) providing a computer-user interface which presents the extracted data to a human curator for review and which analyses the interactions between the human curator and the computer-user interface; and (iii) modifying the information extraction module responsive to the analysed interactions to facilitate an improvement in the subsequent performance of a or the human curator using the information extraction module and computer-user interface to review extracted data, the information extraction module being modified so as to change the balance between precision and recall of the information extraction module, whereby a balance that favours precision over recall will lead to fewer incorrect instances of data being extracted than a balance that favours recall over precision, but will omit more data which should have been extracted. - View Dependent Claims (20, 21, 22, 23)
-
Specification