System and method for topical document searching

US 9,519,707 B2
Filed: 06/14/2010
Issued: 12/13/2016
Est. Priority Date: 04/26/2006
Status: Active Grant

First Claim

Patent Images

1. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:

one or more processors programmed at least to (1) store, in a memory operatively coupled to at least one of the processors, a search level that is a whole number that is at least two,(2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search,(3) define a subset of a collection of electronic documents, the subset comprising a plurality of electronic documents,(4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the entire query such that each responsive document includes each of the one or more criteria of the search query,(5) retrieving a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;

(6) filtering the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space; and

(7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;

wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;

(1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and(2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are providing for searching for documents within topically-defined clusters. A search space is defined, starting with one or more source documents, by examining references from one documents to another and following the networks of references to some level of indirection. Depending on the embodiment, references may be followed from a document containing a reference to a referred-to document, or from a referred-to document to a document containing a reference, or both. Once a search space has been defined, a query is executed, and documents within the search space that satisfy the query parameters are identified.

In certain embodiments of the invention, the documents primarily relate to legal materials, and one or more source documents are associated with one or more topics within a topic directory. In such embodiments, a search query may be limited to one or more selected topics by executing the search query within a search space defined using the associated document or documents as the source.

37 Citations

View as Search Results

24 Claims

1. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:
- one or more processors programmed at least to (1) store, in a memory operatively coupled to at least one of the processors, a search level that is a whole number that is at least two,(2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search,(3) define a subset of a collection of electronic documents, the subset comprising a plurality of electronic documents,(4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the entire query such that each responsive document includes each of the one or more criteria of the search query,(5) retrieving a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;
  
  (6) filtering the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space; and
  
  (7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;
  
  wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and(2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset.
- View Dependent Claims (2, 3, 4)
- - 2. The computer system of claim 1, wherein:
    - the one or more processors are programmed at least to compute rankings of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset, and provide information that identifies a plurality of the responsive documents through an interface operatively coupled to at least one of the processors; and
      
      the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 3. The computer system of claim 1, wherein the search query is a Boolean query that the user has entered.
  - 4. The computer system of claim 1:
    - wherein the search query is a natural-language query that the user has entered.

5. A computer system for identifying one or more electronic documents within a collection of electronic documents, the system comprising:
- one or more processors programmed at least to (1) store in a memory operatively coupled to a least one of the processors, a search level that is a whole number that is at least two,(2) accept a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search,(3) define a subset of a collection of electronic documents that comprises plurality of electronic documents,(4) execute the search query against all documents in the subset, thereby identifying as responsive documents all documents in the subset that satisfy the query such that each responsive document includes each of the one or more criteria of the search query, (5) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;
  
  (6) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
  
  (7) provide information that identifies one or more of the remaining responsive documents through an interface operatively coupled to at least one of the processors;
  
  wherein the subset comprises one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and(2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset.
- View Dependent Claims (6, 7, 8)
- - 6. The computer system of claim 5, wherein:
    - the one or more processors are programmed at least to compute ranking of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset, and provide information that identifies a plurality of the responsive documents through an interface operatively coupled to at least one of the processors; and
      
      the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 7. The computer system of claim 5, wherein the search query is a Boolean query that the user has entered.
  - 8. The computer system of claim 5, wherein the search query is a natural language query that the user has entered.

9. A method of identifying one or more documents within a collection of documents, the method being performed by a computer system that comprises one or more processors, a memory operatively coupled to at least one of the processors, and a computer-readable storage medium encoded with instructions executable by at least one of the processors and operatively coupled to at least one of the processors, the method comprising:
- storing in the memory a search level that is a whole number that is at least two;
  
  storing in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset;
  
  at least one of the processors receiving through at least one interface operatively coupled to the processor a definition of a search query through an interlace operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search;
  
  at least one of the processors executing instructions retrieved from the computer-readable storage medium to (i) identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, (ii) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search, (iii) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and (iv) removing from further consideration an responsive document not found in the definition of the search space; and
  
  at least one of the processors executing instructions retrieved from the computer-readable storage medium to transmit through the at least one interface information for display to the user that identifies one or more of the remaining responsive documents.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9, comprising:
    - at least one of the processors executing instructions retrieved from the computer-readable storage medium to compute rankings of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset; and
      
      at least one of the processors executing instructions retrieved from the computer-readable storage medium to transmit through one of the interfaces information that identifies a plurality of the responsive documents;
      
      wherein the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 11. The method of claim 9, wherein the search query is a Boolean query that the user has entered.
  - 12. The method of claim 9, wherein the search query is a natural-language query that the user has entered.

13. A method of identifying one or more document within a collection of documents, the method being performed by a computer system that comprises one or more processors, a memory operatively coupled to at least one of the processors, and a computer-readable storage medium encoded with instructions executable by at least one of the processors and operatively coupled to at least one of the processors, the method comprising:
- storing in the memory a search level that is a whole number that is at least two;
  
  storing in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset,at least one of the processors receiving through at least one interface operatively coupled to the processor a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search;
  
  at least one of the processors executing instructions retrieved from the computer readable storage medium to (i) identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, (ii) retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search, (iii) filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and (iv) removing from further consideration an responsive document not found in the definition of the search space;
  
  at least one of the processors executing instructions retrieved fromthe computer-readable storage medium to transmit through the at least one interface information for display to the user that identifies one or more of the remaining responsive documents.
- View Dependent Claims (14, 15, 16)
- - 14. The method of claim 13, comprising:
    - at least one of the processors executing instructions retrieved from the computer-readable storage medium to compute rankings of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset; and
      
      at least one of the processors executing instructions retrieved from the computer-readable storage medium to transmit through one of the interfaces information that identifies a plurality of the responsive documents;
      
      wherein the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 15. The method of claim 13, wherein the search query is a Boolean query that the user has entered.
  - 16. The method of claim 13, wherein the search query is a natural-language query that the user has entered.

17. A computer program product comprising acomputer-readable storage medium encoded with instructions that, when executed by at least one processor within a computer system that comprises one or more processors and a memory operatively coupled to at least one of the processors, cause the computer system at least to:
- store in the memory a search level that is a whole number that is at least two;
  
  store in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more references in one or more of the electronic source documents, each of the references identifying a respective document in the collection, and adding to the subset each document in thecollection that is identified by any of the found references but is not already in the subset, and(2) one or more subsequent iterations, each of which comprises finding one or more references in one or more of the documents added to the subset in the immediately previous iteration, each of the references identifying a respective document in the collection, and adding to the subset each document in the collection that is identified by any of the found references but is not already in the subset;
  
  receive through at least one interface operatively coupled to at least one of the processors a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search;
  
  identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, and retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;
  
  filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
  
  transmit through the at least one interface information for display the user that identifies one or more of the remaining responsive documents.
- View Dependent Claims (18, 19, 20)
- - 18. Tile computer program product of claim 17, wherein the instructions comprise instructions that, when executed by at least one of the processors, cause the computer system at least to compute rankings of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset;
    - andtransmit through one of the interlaces information that identifies a plurality of the responsive documents;
      
      wherein the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 19. The computer-readable storage medium of claim 17, wherein the search query is a Boolean query that the user has entered.
  - 20. The computer-readable storage medium of claim 17, wherein the search query is a natural language query that the user has entered.

21. A computer program product comprising a computer-readable storage medium encoded with instructions that, when executed by at least one processor within a computer system that comprises one or more processors and a memory operatively coupled to at least one of the processors, cause the computer system at least to:
- store in the memory a search level that is a whole number that is at least two;
  
  store in the memory a definition of a subset of a collection of electronic documents, the collection of documents comprising a plurality of documents, and the subset comprising one or more source documents within the collection and one or more additional documents within the collection, the one or more additional documents being identifiable by a process carried out for a number of iterations equal to the search level and comprising;
  
  (1) a first iteration that comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the electronic source documents, and adding to the subset each of the found citing documents that is not already in the subset, and (2) one or more subsequent iterations, each of which comprises finding one or more citing documents within the collection, each citing document comprising at least one reference to at least one of the documents added to the subset in the immediately previous iteration, and adding to the subset each of the found citing documents that is not already in the subset,receive through at least one interface operatively coupled to at least one of the processors a definition of a search query through an interface operatively coupled to at least one of the processors, the search query comprising one or more criteria that a user has explicitly entered, and the search query having an association with a topical area for a search;
  
  identify all responsive documents within the subset that satisfy the one or more criteria comprised by the search query such that each responsive document includes each of the one or more criteria of the search query, and retrieve a definition of a search space, the definition of the search space comprising one or more normalized citations to every document within the search space, and the search space having an association with the topical area for the search;
  
  filter the responsive documents resulting from the execution of the search query by checking each responsive document against the definition of the search space and removing from further consideration an responsive document not found in the definition of the search space;
  
  transmit through the at least one interface information for display the user that identifies one or more of the remaining responsive documents.
- View Dependent Claims (22, 23, 24)
- - 22. The computer program product of claim 21, wherein the instructions comprise instructions that, when executed by at least one of the processors, cause the computer system at least to:
    - compute rankings of the identified responsive documents, the computed ranking of each respective identified responsive document depending at least upon the iteration in which that document would first be added to the subset; and
      
      transmit through one of the interfaces information that identifies a plurality of the responsive documents;
      
      wherein the provided information that identifies a plurality of the responsive documents comprises information about the ranking of the identified documents.
  - 23. The computer-readable storage medium of claim 21, wherein the search query is a Boolean query that the user has entered.
  - 24. The computer-readable storage medium of claim 21, wherein the search query is a natural language query that the user has entered.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bloomberg Finance LP (Bloomberg LP), Bloomberg, Inc. (Bloomberg LP), Bloomberg LP, Bureau of National Affairs Incorporated (Bloomberg LP)
Original Assignee
Bureau of National Affairs Incorporated (Bloomberg LP)
Inventors
Kemp, Richard Douglas, Grenet, Philippe
Primary Examiner(s)
Chbouki, Tarek

Application Number

US12/814,729
Publication Number

US 20100257161A1
Time in Patent Office

2,374 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/382 using citations hypermedia ...

G06F 16/9535 Search customisation based ...

System and method for topical document searching

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for topical document searching

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links