System and method for topical document searching
First Claim
1. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:
- one or more processors programmed at least to (1) store, in a memory operatively coupled to at least one of the processors, a search level that is a whole number that is at least two,(2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search,(3) define a subset of a collection of electronic documents, the subset comprising a plurality of electronic documents,(4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the entire query such that each responsive document includes each of the one or more criteria of the search query,(5) retrieving a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;
(6) filtering the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space; and
(7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;
wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
(1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and(2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset.
10 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are providing for searching for documents within topically-defined clusters. A search space is defined, starting with one or more source documents, by examining references from one documents to another and following the networks of references to some level of indirection. Depending on the embodiment, references may be followed from a document containing a reference to a referred-to document, or from a referred-to document to a document containing a reference, or both. Once a search space has been defined, a query is executed, and documents within the search space that satisfy the query parameters are identified.
In certain embodiments of the invention, the documents primarily relate to legal materials, and one or more source documents are associated with one or more topics within a topic directory. In such embodiments, a search query may be limited to one or more selected topics by executing the search query within a search space defined using the associated document or documents as the source.
37 Citations
24 Claims
-
1. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:
-
one or more processors programmed at least to (1) store, in a memory operatively coupled to at least one of the processors, a search level that is a whole number that is at least two, (2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search, (3) define a subset of a collection of electronic documents, the subset comprising a plurality of electronic documents, (4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the entire query such that each responsive document includes each of the one or more criteria of the search query, (5) retrieving a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search; (6) filtering the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space; and
(7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising; (1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset. - View Dependent Claims (2, 3, 4)
-
-
5. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:
-
one or more processors programmed at least to (1) store in a memory operatively coupled to a least one of the processors, a search level that is a whole number that is at least two, (2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search, (3) define a subset of a collection of electronic documents that comprises plurality of electronic documents, (4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the query such that each responsive document includes each of the one or more criteria of the search query, (5) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search; (6) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
(7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising; (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset. - View Dependent Claims (6, 7, 8)
-
-
9. A method of identifying one or more documents within a collection of documents, the method being performed by a computer system that comprises one or more processors, a memory operatively coupled to at least one of the processors, and a computer-readable storage medium encoded with instructions executable by at least one of the processors and operatively coupled to at least one of the processors, the method comprising:
-
storing in the memory a search level that is a whole number that is at least two;
storing in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;(1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset; at least one of the processors receiving through at least one interface operatively coupled to the processor a definition of a search query through an interlace operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search; at least one of the processors executing instructions retrieved from the computer-readable storage medium to (i) identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, (ii) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search, (iii) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and (iv) removing from further consideration an responsive document not found in the definition of the search space; and
at least one of the processors executing instructions retrieved from the computer-readable storage medium to transmit through the at least one interface information for display to the user that identifies one or more of the remaining responsive documents. - View Dependent Claims (10, 11, 12)
-
-
13. A method of identifying one or more document within a collection of documents, the method being performed by a computer system that comprises one or more processors, a memory operatively coupled to at least one of the processors, and a computer-readable storage medium encoded with instructions executable by at least one of the processors and operatively coupled to at least one of the processors, the method comprising:
-
storing in the memory a search level that is a whole number that is at least two; storing in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising; (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset, at least one of the processors receiving through at least one interface operatively coupled to the processor a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search; at least one of the processors executing instructions retrieved from the computer readable storage medium to (i) identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, (ii) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search, (iii) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and (iv) removing from further consideration an responsive document not found in the definition of the search space;
at least one of the processors executing instructions retrieved fromthe computer-readable storage medium to transmit through the at least one interface information for display to the user that identifies one or more of the remaining responsive documents. - View Dependent Claims (14, 15, 16)
-
-
17. A computer program product comprising a
computer-readable storage medium encoded with instructions that, when executed by at least one processor within a computer system that comprises one or more processors and a memory operatively coupled to at least one of the processors, cause the computer system at least to: -
store in the memory a search level that is a whole number that is at least two;
store in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;(1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset; receive through at least one interface operatively coupled to at least one of the processors a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search; identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, and retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search; filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
transmit through the at least one interface information for display the user that identifies one or more of the remaining responsive documents. - View Dependent Claims (18, 19, 20)
-
-
21. A computer program product comprising a computer-readable storage medium encoded with instructions that, when executed by at least one processor within a computer system that comprises one or more processors and a memory operatively coupled to at least one of the processors, cause the computer system at least to:
-
store in the memory a search level that is a whole number that is at least two; store in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising; (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset, receive through at least one interface operatively coupled to at least one of the processors a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search; identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, and retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search; filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
transmit through the at least one interface information for display the user that identifies one or more of the remaining responsive documents. - View Dependent Claims (22, 23, 24)
-
Specification