Method and system for searching documents with numbers
First Claim
Patent Images
1. A computer-implemented method for searching documents comprising:
- receiving a query having at least one query number qi but not necessarily specifying an attribute name or unit for the query number the query number qi being only an approximation of a desired attribute numeric value;
for at least one document, matching each number qi with a document number nm, each document number being a number contained in the document, such that a distance score is minimized; and
based on the distance scores, returning at least one document in response to the query wherein the documents are selected from a set of documents, and the method further comprises limiting the number of documents in the set that are processed in the matching act using at least one lower bound on at least one distance score.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for using numbers to query a corpus of documents, particularly but not exclusively for data spaces that have low reflectivity, i.e., for a point xi described by one or more numbers, the data space does not contain very many permutations of the numbers. For each document to be searched, each query number is matched with one and only one document number preferably using a bipartite graph or heuristic rule such that a distance function is minimized. The distance function can, but not must, take into account attribute names and unit names. A limiting algorithm can be used to limit the number of documents that must be searched.
-
Citations
43 Claims
-
1. A computer-implemented method for searching documents comprising:
-
receiving a query having at least one query number qi but not necessarily specifying an attribute name or unit for the query number the query number qi being only an approximation of a desired attribute numeric value; for at least one document, matching each number qi with a document number nm, each document number being a number contained in the document, such that a distance score is minimized; and based on the distance scores, returning at least one document in response to the query wherein the documents are selected from a set of documents, and the method further comprises limiting the number of documents in the set that are processed in the matching act using at least one lower bound on at least one distance score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 43)
-
-
14. A computer programmed with instructions for retrieving numbers from a corpus of documents, the instructions comprising:
-
in response to a numeric query containing at least one numeric query string not necessarily associated with an attribute name or unit name, accessing at least some documents in the corpus the string being only an approximation of a desired attribute numeric value; comparing each numeric query string with one or more document strings; associating each numeric query string with one and only one document string to optimize at least one distance function; and returning at least a portion of at least one document based on the associating instruction wherein the documents are selected from a set of documents, and the computer further comprises instructions for limiting the number of documents in the set that are processed using lower bounds on the distance scores. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer program device having computer readable code thereon for searching a set of documents, the code comprising:
-
means for receiving a user query including at least one query number corresponding to respective desired attribute numeric values, the query not necessarily including attribute names or unit names, the query including at least one numeric value that is only an approximation of a desired attribute numeric value; and means for returning documents with values close to the query number, wing only the query number wherein the means for returning includes means for returning a ranked list of documents, each document having a value associated with each query number, such that a distance function between the query numbers and corresponding values is minimized. - View Dependent Claims (28, 29, 30)
-
-
31. A computer for searching a set of unstructured and/or semi-structured and/or structured documents, comprising:
-
instructions for receiving a user query consisting of a set of numbers corresponding to desired attribute values and being only approximations of desired attribute numeric values, the query also consisting of respective attribute names of the numbers; and instructions for returning documents containing document values close to the set of numbers in the query using the query numbers and attribute names wherein the means for returning includes means for returning a ranked list of documents, each document having a value associated with each query number, such that a distance function between the query numbers and corresponding values is minimized. - View Dependent Claims (32, 33, 34)
-
-
35. A computer-implemented method for searching a set of documents in response to a user query including a set of query numbers corresponding to desired attribute values together with units of the values the query numbers being only approximations of desired attribute numeric values, comprising:
returning documents with values approximating the set of query numbers, using the query numbers and units wherein the act of returning Includes returning a ranked list of documents, each document having at least one value associated with each query number, such that a distance function between the query numbers and corresponding values is minimized. - View Dependent Claims (36, 37, 38)
-
39. A system for searching a set of documents, comprising:
-
means for receiving a query comprising at least one query number, at least one unit of the query number, and at least one attribute name associated with the query number the query number being an approximation of a desired attribute numeric value; means for returning documents with at least one document value approximating but not necessarily equalling the query number, using the query number, the attribute name, and the unit wherein the means for returning includes means for returning a ranked list of documents, each document having a value associated with each query number, such that a distance function between the query numbers and corresponding values is minimized. - View Dependent Claims (40, 41, 42)
-
Specification