Finding relevant documents

US 8,515,972 B1
Filed: 02/10/2010
Issued: 08/20/2013
Est. Priority Date: 02/10/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

automatically extracting a plurality of groups of words from a set comprising a first document;

wherein in the plurality of groups, each group comprises a word;

automatically determining a plurality of first counts of a number of times said each group of words in said plurality matches said set;

automatically determining a plurality of second counts of the number of times said each group of words in said plurality matches a corpus of second documents;

automatically performing function fitting on at least first counts of said plurality of groups of words and corresponding second counts of said plurality of groups of words, to obtain a fitted function;

using at least one processor in automatically comparing a first count of said each group of words in the plurality of first counts to an evaluation of said fitted function at a second count of said each group of words in the plurality of second counts, to obtain a weight of said each group of words; and

automatically storing at least said weight in a computer memory coupled to said at least one processor.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A programmed computer receives one or more documents that contain text that is relevant to a user (“interest documents”). The programmed computer automatically identifies groups of words that match the interest documents. The matching word groups are ranked by a weight that is assigned based on how infrequently a word group matches a reference corpus and how frequently the word group matches one or more interest document(s), in comparison to other word groups. A set of word groups are automatically identified based on ranking, and displayed to a user to select documents from a corpus. Selected documents are displayed to the user, e.g. with one or more group of words used in selecting the documents.

Citations

20 Claims

1. A computer-implemented method comprising:
- automatically extracting a plurality of groups of words from a set comprising a first document;
  
  wherein in the plurality of groups, each group comprises a word;
  
  automatically determining a plurality of first counts of a number of times said each group of words in said plurality matches said set;
  
  automatically determining a plurality of second counts of the number of times said each group of words in said plurality matches a corpus of second documents;
  
  automatically performing function fitting on at least first counts of said plurality of groups of words and corresponding second counts of said plurality of groups of words, to obtain a fitted function;
  
  using at least one processor in automatically comparing a first count of said each group of words in the plurality of first counts to an evaluation of said fitted function at a second count of said each group of words in the plurality of second counts, to obtain a weight of said each group of words; and
  
  automatically storing at least said weight in a computer memory coupled to said at least one processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computer-implemented method of claim 1 wherein:
    - multiple ordered pairs are fitted by said automatically performing function fitting, each pair comprising the first count and the second count of each group of words in the plurality of groups.
  - 3. The computer-implemented method of claim 1 wherein:
    - said automatically comparing comprises subtracting from said first count a value of said fitted function at said second count.
  - 4. The computer-implemented method of claim 1 wherein:
    - said automatically comparing comprises dividing said first count by a value of said fitted function at said second count.
  - 5. The computer-implemented method of claim 1 wherein:
    - said fitted function is automatically identified from among a predetermined family of functions to minimize a sum of deviations between the first counts in ordered pairs of said plurality of groups of words and corresponding values of said functions in said predetermined family evaluated at the respective second counts in ordered pairs of said plurality of groups of words.
  - 6. The computer-implemented method of claim 1 wherein:
    - said fitted function is identified from among a predetermined family of functions based on having the minimum sum of deviation of ordered pairs of said plurality of groups of words from each function in said predetermined family, such that each deviation has a value equal to (i) a predetermined multiple “
      
      tau”
      
      of the difference r of the first count from the corresponding value of said each function if r is non-negative, and (ii) a complement of the predetermined multiple (tau-1) of the difference r if r is negative, wherein 0<
      
      tau<
      
      1.
  - 7. The computer-implemented method of claim 1 wherein:
    - said fitted function is identified from among a predetermined family of functions based on the deviations of ordered pairs of said plurality of groups of words from the corresponding values of said functions in said predetermined family, such that each deviation has a value based on a difference of the first count from a corresponding value of each function in said predetermined family.
  - 8. The computer-implemented method of claim 1 wherein:
    - said fitted function is identified by use of at least quantile regression.
  - 9. The computer-implemented method of claim 1 wherein:
    - said fitted function is identified by use of at least linear regression.
  - 10. The computer-implemented method of claim 1 wherein:
    - said weight depends on a logarithmic function of the first count and a logarithmic function of the second count.
  - 11. The computer-implemented method of claim 1 further comprising:
    - automatically ranking based on said weight, said at least one group of words relative to another group of words in said multiple groups; and
      
      automatically storing in said computer memory coupled to said at least one processor, a sorted list resulting from said automatically ranking.
  - 12. The computer-implemented method of claim 1 further comprising:
    - automatically selecting a subset from a set of third documents based at least partially on matching said at least one group of words; and
      
      automatically storing in said computer memory coupled to said at least one processor, said subset.

13. A non-transitory computer-readable storage medium comprising a plurality of instructions, said instructions comprising:
- instructions to automatically extract multiple groups of words from a set comprising a first document;
  
  wherein in the multiple groups, each group comprises a word;
  
  instructions to automatically determine a plurality of first counts of a number of times said each group of words matches said set;
  
  instructions to automatically determine a plurality of second counts of the number of times said each group of words matches a corpus of second documents;
  
  instructions to automatically perform function fitting on at least first counts of said multiple groups of words and corresponding second counts of said multiple groups of words, to obtain a fitted function;
  
  instructions to at least one processor to automatically compare a first count of said each group of words in the plurality of first counts to an evaluation of said fitted function at a second count of said each group of words in the plurality of second counts, to obtain a weight of said each group; and
  
  instructions to automatically store at least said weight in a computer memory coupled to said at least one processor.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The non-transitory computer-readable storage medium of claim 13 wherein:
    - multiple ordered pairs are fitted by execution of said instructions to automatically perform function fitting, each pair comprising the first count and the second count of each group in the multiple groups.
  - 15. The non-transitory computer-readable storage medium of claim 13 wherein:
    - said fitted function is automatically identified by use of a predetermined family of functions, based on minimization of a sum of deviations between the first counts in ordered pairs of said multiple groups and corresponding values of said functions in said predetermined family evaluated at the respective second counts in ordered pairs of said multiple groups.
  - 16. The non-transitory computer-readable storage medium of claim 13 wherein:
    - said fitted function is identified by use of a predetermined family of functions, based on minimization of a sum of deviations of ordered pairs of said multiple groups from each function in said predetermined family, such that each deviation has a value equal to (i) a predetermined multiple “
      
      tau”
      
      of the difference r of the first count from the corresponding value of said each function if r is non-negative, and (ii) a complement of the predetermined multiple (tau-1) of the difference r if r is negative, wherein 0<
      
      tau<
      
      1.
  - 17. The non-transitory computer-readable storage medium of claim 13 wherein:
    - said fitted function is identified by use of a predetermined family of functions, based on deviations of ordered pairs of said multiple groups of words from the corresponding values of said functions in said predetermined family, such that each deviation has a value based on a difference of the first count from a corresponding value of a function in said predetermined family.
  - 18. The non-transitory computer-readable storage medium of claim 13 wherein:
    - said weight depends on a logarithmic function of the first count and a logarithmic function of the second count.
  - 19. The non-transitory computer-readable storage medium of claim 13 further comprising:
    - instructions to automatically select a subset from a set of third documents based at least partially on matching said at least one group of words; and
      
      instructions to automatically store said subset in computer memory.

20. An apparatus comprising:
- means for automatically extracting multiple groups of words from a set comprising a first document;
  
  wherein in the multiple groups, each group comprises a word;
  
  means for automatically determining a plurality of first counts of a number of times said each group of words matches said set;
  
  means for automatically determining a plurality of second counts of the number of times said each group of words matches a corpus of second documents;
  
  means for performing function fitting on at least first counts of said multiple groups of words and corresponding second counts of said multiple groups of words, to obtain a fitted function;
  
  means for automatically comparing a first count of said each group of words in the plurality of first counts to an evaluation of said fitted function at a second count of said each group of words in the plurality of second counts, to obtain a weight of said each group; and
  
  means for automatically storing at least said weight in a computer memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Python4fun Inc
Original Assignee
Python 4 Fun Incorporated
Inventors
Srikrishna, Devabhaktuni, Coram, Marc
Primary Examiner(s)
Trujillo, James
Assistant Examiner(s)
Spieler, William

Application Number

US12/703,758
Time in Patent Office

1,287 Days
Field of Search

707/742, 707/750
US Class Current

707/750
CPC Class Codes

G06F 16/248   Presentation of query results

G06F 16/3322   using system suggestions G0...

G06F 16/951   Indexing; Web crawling tech...

Finding relevant documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Finding relevant documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links