Automated unit finding for numeric information retrieval
First Claim
1. A method using a computer system for automated retrieval of numeric information from a set of electronic source documents, comprising:
- receiving an electronic query comprising one or more keywords from a remote user computer system via a telecommunications network;
accessing a computer-searchable index comprising a plurality of searchable index entries, at least a plurality of the entries each including a representation of an associated unit of measure extracted from the electronic source documents;
automatically determining a query unit, at least partly by identifying one or more related occurrences within the electronic source documents of said one or more keywords and one or more of said units of measure and by scoring the one or more related occurrences to reflect how closely related each of said one or more keywords and one or more of said units of measure appear to be within the source document; and
assessing the relevance of one or more of the index entries to the electronic query, at least partly based on a comparison of the query unit and the unit associated with one or more of said index entries.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention is related to the task of retrieving numeric information in response to a textual keyword-based query by automatically associating a unit to the type of data being retrieved. An information retrieval system is presented which suggests a unit for data exploration by leveraging the local environment of numeric data across the corpus. This local environment is parsed, including through natural language processing and proximity-based techniques, to determine units relevant to particular keyword phrases. The system also relies on knowledge of semantically and scientifically related units to optimize their binning for suggested unit scoring.
-
Citations
22 Claims
-
1. A method using a computer system for automated retrieval of numeric information from a set of electronic source documents, comprising:
-
receiving an electronic query comprising one or more keywords from a remote user computer system via a telecommunications network; accessing a computer-searchable index comprising a plurality of searchable index entries, at least a plurality of the entries each including a representation of an associated unit of measure extracted from the electronic source documents; automatically determining a query unit, at least partly by identifying one or more related occurrences within the electronic source documents of said one or more keywords and one or more of said units of measure and by scoring the one or more related occurrences to reflect how closely related each of said one or more keywords and one or more of said units of measure appear to be within the source document; and assessing the relevance of one or more of the index entries to the electronic query, at least partly based on a comparison of the query unit and the unit associated with one or more of said index entries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. Apparatus for numeric information retrieval, comprising:
-
electronic memory storing a computer-searchable index, the index comprising a plurality of searchable index entries, at least a plurality of the entries each including a representation of an associated unit of measure extracted from the electronic source documents; a computer system, being programmed to receive a query, said query comprising one or more keywords; said computer system being further programmed to automatically generate a query unit, at least partly by determining a correlation within the electronic source documents between the keywords and one or more of the units included in the index, wherein determining the correlation includes scoring how closely related each of said one or more keywords and one or more of said units appear to be within the electronic source documents; and said computer system being further programmed to assess the relevance of one or more of the index entries to the query, based at least partly on a comparison of the query unit and the unit associated with each of said index entries. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
Specification