System and Methods for Units-Based Numeric Information Retrieval
First Claim
1. An apparatus for information retrieval, comprising:
- a computer system being programmed to receive a query from a remotely located user system via a network, the query specifying one or more numerical data constraints and contextual constraints;
said computer system including a memory storing a computer-searchable electronic index, the index comprising a plurality of entries, each of the index entries representing an item of numerical data extracted from at least one of a plurality of electronic source documents including one or more natural language documents;
said computer system being further programmed to determine the relevancy of the query to each of one or more of the index entries based at least partly on a comparison between the numerical data constraint and the index entry'"'"'s item of numerical data, and at least partly on a comparison between the contextual constraint and contextual information extracted from the corresponding source document from which the index entry'"'"'s item of numerical data was extracted.
4 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval and analysis system for numeric data which provides high precision and recall for numeric search and uses a methodology for determining contextualization of the extracted data. The capabilities include extracting, parsing, and contextualizing numeric data including both a numeric value and an accompanying unit. This system facilitates the organization of largely unstructured numeric data into an inverted index and other database formats. An information retrieval system which enables the exploration and refinement of an extracted numeric data set defined by a search input that may be precise or initially vague. This system also facilitates analyzing and portraying numeric data graphically, creating knowledge by combining data from multiple sources, extracting correlations between seemingly disparate variables, and recognizing numeric data trends. This system uses local natural language processing, mathematical analysis, and expert-based scientific heuristics to score the numeric and contextual relevancy of the data to the query parameters.
84 Citations
44 Claims
-
1. An apparatus for information retrieval, comprising:
-
a computer system being programmed to receive a query from a remotely located user system via a network, the query specifying one or more numerical data constraints and contextual constraints; said computer system including a memory storing a computer-searchable electronic index, the index comprising a plurality of entries, each of the index entries representing an item of numerical data extracted from at least one of a plurality of electronic source documents including one or more natural language documents; said computer system being further programmed to determine the relevancy of the query to each of one or more of the index entries based at least partly on a comparison between the numerical data constraint and the index entry'"'"'s item of numerical data, and at least partly on a comparison between the contextual constraint and contextual information extracted from the corresponding source document from which the index entry'"'"'s item of numerical data was extracted. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer memory apparatus storing a computer-searchable electronic index, the index comprising:
-
a plurality of searchable index entries, each of the entries representing an associated item of numerical data as a set of one or more tokens specifying a number and a unit of measure and being searchable thereby, the items of numerical data being extracted from a plurality of electronic source documents, each of the index entries being associated with an identifier for accessing the document from which the numerical data represented by said index entry was extracted; information regarding the position of at least one of the extracted items of numerical data within text of the source document in which said item occurs; and contextual information comprising one or more keywords occurring in the source documents, and further comprising information regarding the position of at least one of the keywords within text of the source document in which said keyword occurs.
-
-
32. A computer-automated method of information retrieval, comprising:
-
on a computing system, receiving a user-defined query specifying one or more numerical data constraints and contextual constraints, said query transmitted from a user system to the computing system via a network; on said computing system, automatically determining the relevancy of the query to each of one or more entries in a computer-searchable electronic index, each of the index entries representing an item of numerical data extracted from at least one of a plurality of electronic source documents, including one or more natural language documents; said determination of relevancy for each of the entries based at least partly on a comparison between the numerical data constraint and the entry'"'"'s item of numerical data, and at least partly on a comparison between the contextual constraint and contextual information extracted from the corresponding source document from which the entry'"'"'s item of numerical data was extracted; on said computing system, generating a response to the query based on the relevancy determination; transmitting the response from the computing system to the user system.
-
-
33. A computer-automated method of information retrieval, comprising:
-
on a user computing system, interactively specifying a query comprising one or more numerical data constraints and contextual constraints; transmitting said query from the user system to a remote computing system via a network; receiving a response to the query via the network from the remote computing system describing a plurality of responsive items of numerical data occurring in one or more of a plurality of electronic source documents, including one or more natural language documents, said responsive items determined by the remote system to be relevant to the query based at least partly on a comparison between the numerical data constraint and the items of numerical data, and at least partly on a comparison between the contextual constraint and contextual information extracted from the corresponding source document in which the item of numerical data occurred; and displaying an interactive graph on the user system based on the response.
-
-
34. A computer apparatus for information retrieval, comprising:
-
a computer system being programmed to receive a user-defined query; said computer system automatically determining the relevancy of the query to each of one or more entries in a computer-searchable electronic index, each of the index entries representing an item of data extracted from at least one of a plurality of electronic source documents; said computer system being further programmed to generate a graph, based on the relevancy determination, comprising a separate data point for each of the index entries determined to be most relevant to the query, wherein a plurality of the most relevant index entries can be extracted from a single one of the source documents. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. A computer-automated method of information retrieval, comprising:
-
on a user computing system, interactively specifying a query comprising one or more numerical data constraints; transmitting said query from the user system to a remote computing system via a network; receiving a response to the query via the network from the remote computing system, said response defining a graph comprising a separate data point for each of a plurality of responsive items of numerical data occurring in one or more of a plurality of electronic source documents, said responsive items determined by the remote system to be relevant to the query and wherein a plurality of said relevant items can occur in a single one of the source documents; interactively displaying the graph on the user system.
-
Specification