System and methods for units-based numeric information retrieval
First Claim
1. A computer apparatus, comprising:
- a processor;
a memory communicatively coupled to the processor and storing program instructions causing the processor to;
receive a user-defined query comprising one or more numerical data constraints;
determine the relevancy of the query to each of one or more entries in a computer-searchable electronic index, each of the index entries representing an item of data extracted from at least one of a plurality of electronic source documents, wherein the determining further comprises;
determine a spectral vector for at least a part of the query across the plurality of electronic source documents, the spectral vector comprising a first column of binned numerical values associated with the query and a second column of frequencies of occurrence of the binned numerical values within the plurality of electronic source documents,identify at least one spectral feature comprising a local maximum or local minimum in the frequencies of occurrence within the spectral vector, andlimit the computer-searchable electronic index to those documents of the plurality of electronic source documents that are associated with the at least one spectral feature, the relevancy being based on association of documents with the at least one spectral feature;
generate a graph, based on the relevancy determination, comprising a separate data point for each of the index entries determined to be most relevant to the query, wherein a plurality of the most relevant index entries can be extracted from a single one of the source documents; and
provide an interactive display of the graph.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval and analysis system for numeric data which provides high precision and recall for numeric search and uses a methodology for determining contextualization of the extracted data. The capabilities include extracting, parsing, and contextualizing numeric data including both a numeric value and an accompanying unit. This system facilitates the organization of largely unstructured numeric data into an inverted index and other database formats. An information retrieval system which enables the exploration and refinement of an extracted numeric data set defined by a search input that may be precise or initially vague. This system also facilitates analyzing and portraying numeric data graphically, creating knowledge by combining data from multiple sources, extracting correlations between seemingly disparate variables, and recognizing numeric data trends. This system uses local natural language processing, mathematical analysis, and expert-based scientific heuristics to score the numeric and contextual relevancy of the data to the query parameters.
-
Citations
12 Claims
-
1. A computer apparatus, comprising:
-
a processor; a memory communicatively coupled to the processor and storing program instructions causing the processor to; receive a user-defined query comprising one or more numerical data constraints; determine the relevancy of the query to each of one or more entries in a computer-searchable electronic index, each of the index entries representing an item of data extracted from at least one of a plurality of electronic source documents, wherein the determining further comprises; determine a spectral vector for at least a part of the query across the plurality of electronic source documents, the spectral vector comprising a first column of binned numerical values associated with the query and a second column of frequencies of occurrence of the binned numerical values within the plurality of electronic source documents, identify at least one spectral feature comprising a local maximum or local minimum in the frequencies of occurrence within the spectral vector, and limit the computer-searchable electronic index to those documents of the plurality of electronic source documents that are associated with the at least one spectral feature, the relevancy being based on association of documents with the at least one spectral feature; generate a graph, based on the relevancy determination, comprising a separate data point for each of the index entries determined to be most relevant to the query, wherein a plurality of the most relevant index entries can be extracted from a single one of the source documents; and provide an interactive display of the graph. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A processor-implemented method, comprising:
-
interactively specifying a query comprising one or more numerical data constraints; transmitting said query to a remote computing system via a network; receiving a response to the query via the network from the remote computing system, said response defining a graph comprising a separate data point for each of a plurality of responsive items of numerical data occurring in one or more of a plurality of electronic source documents, said responsive items determined by the remote system to be relevant to the query and wherein a plurality of said relevant items can occur in a single one of the source documents, wherein determining relevance further comprises; determining a spectral vector for at least a part of the query across the plurality of electronic source documents, the spectral vector comprising a first column of binned numerical values associated with the query and a second column of frequencies of occurrence of the binned numerical values within the plurality of electronic source documents, identifying at least one spectral feature comprising a local maximum or local minimum in the frequencies of occurrence within the spectral vector, and limiting the computer-searchable electronic index to those documents of the plurality of electronic source documents that are associated with the at least one spectral feature, the relevance being based on association of documents with the at least one spectral feature; and interactively displaying the graph.
-
Specification