Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects

US 8,818,996 B2
Filed: 08/02/2013
Issued: 08/26/2014
Est. Priority Date: 09/27/2005
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for rapidly identifying and ranking relevant documents, said method comprising:

receiving, by a computer system comprising one or more computing devices, a first set of identification information identifying one or more input documents for which relevant output documents are sought, wherein the one or more input documents are identified from a body of data, said body of data comprising identification information identifying multiple millions of citationally related documents;

identifying, by said computer system, a second set of identification information identifying one or more output documents from said body of data that are citationally related to said one or more input documents through one or more direct or indirect citations;

determining, by said computer system, a first numerical score that statistically correlates to a probability that a direct citation exists between each input document relative to each citationally related output document, said first numerical score being determined based at least in part on how many indirect citations exist between each input document and each output document and, for each indirect citation, how many citation links separate each input document from each output document;

determining, by said computer system, a second numerical score that statistically correlates to a probability that a direct citation exists between any input document relative to each output document, said second numerical score being determined based at least in part on said first numerical score;

ranking, by said computer system, said one or more output documents in accordance with said second numerical score; and

displaying, by said computer system, a third set of identification information identifying a selected number of said one or more output documents selected or ranked in accordance with said second numerical score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment a method for probabilistically quantifying a degree of relevance between two or more citationally or contextually related data objects, such as patent documents, non-patent documents, web pages, personal and corporate contacts information, product information, consumer to behavior, technical or scientific information, address information, and the like is provided. In another embodiment a method for visualizing and displaying relevance between two or more citationally or contextually related data objects is provided. In another embodiment a search input/output interface that utilizes an iterative self-organizing mapping technique to automatically generate a visual map of relevant patents and/or other related documents desired to be explored, searched or analyzed is provided. In another embodiment, a search input/output interface that displays and/or communicates search input criteria and corresponding search results in a way that facilitates intuitive understanding and visualization of the logical relationships between two or more related concepts being searched is provided.

Citations

19 Claims

1. A computer-implemented method for rapidly identifying and ranking relevant documents, said method comprising:
- receiving, by a computer system comprising one or more computing devices, a first set of identification information identifying one or more input documents for which relevant output documents are sought, wherein the one or more input documents are identified from a body of data, said body of data comprising identification information identifying multiple millions of citationally related documents;
  
  identifying, by said computer system, a second set of identification information identifying one or more output documents from said body of data that are citationally related to said one or more input documents through one or more direct or indirect citations;
  
  determining, by said computer system, a first numerical score that statistically correlates to a probability that a direct citation exists between each input document relative to each citationally related output document, said first numerical score being determined based at least in part on how many indirect citations exist between each input document and each output document and, for each indirect citation, how many citation links separate each input document from each output document;
  
  determining, by said computer system, a second numerical score that statistically correlates to a probability that a direct citation exists between any input document relative to each output document, said second numerical score being determined based at least in part on said first numerical score;
  
  ranking, by said computer system, said one or more output documents in accordance with said second numerical score; and
  
  displaying, by said computer system, a third set of identification information identifying a selected number of said one or more output documents selected or ranked in accordance with said second numerical score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computer-implemented method of claim 1 wherein identifying said second set of identification information comprises using computer database logic to extend multiple generations of citations from each input document to identify said one or more output documents.
  - 3. The computer-implemented method of claim 2 wherein identifying said second set of identification information comprises extending at least three generations of citations from each input document to identify said one or more output documents.
  - 4. The computer-implemented method of claim 1 wherein determining said first numerical score comprises calculating, by said computer system, the output of a multivariate regression model configured to estimate the probability that a direct citation exists between each input document and each output document, and wherein a first independent variable of said multivariate regression model comprises the number of indirect citations between each input document and each output document, and one or more additional independent variables of said multivariate regression model comprise, for each indirect citation, how many citation links separate each input document from each output document.
  - 5. The computer-implemented method of claim 1 wherein said body of data comprises a data repository comprising identification information identifying multiple millions of potential input documents and, for each said potential input document, identification information identifying a selected number of citationally related potential output documents.
  - 6. The computer-implemented method of claim 5 wherein identifying said second set of identification information comprises accessing said data repository and using said first set of identification information to retrieve, for each said input document, said identification information identifying said selected number of citationally related potential output documents.
  - 7. The computer-implemented method of claim 5 wherein said data repository further comprises, for each possible pair of citationally related potential input document and potential output document, a pre-generated numerical score estimating or representing a probability that a direct citation exists between each said corresponding pair of documents.
  - 8. The computer-implemented method of claim 7 wherein determining said first numerical score comprises accessing said data repository and using said first set of identification information to retrieve said pre-generated numerical score for each input document relative to each citationally related output document.
  - 9. The computer-implemented method of claim 7 wherein said pre-generated numerical score is determined using a multivariate probit regression model configured to estimate the probability that a direct citation exists between each potential input document and each potential output document.
  - 10. The computer-implemented method of claim 9 wherein a dependent variable of said multivariate probit regression model comprises the existence or non-existence of a direct citation between each potential input document and each potential output document, and wherein the independent variables comprise the number of indirect citations between each potential input document and each potential output document and, for each indirect citation, how many citation links separate each potential input document from each potential output document.
  - 11. The computer-implemented method of claim 1 wherein said body of data comprises identification information identifying multiple millions of citationally related patent documents.
  - 12. The computer-implemented method of claim 1 wherein said body of data comprises identification information identifying more than 80 million citationally related patent documents and related scientific literature.

13. A computer-system for rapidly identifying and ranking relevant documents from a body of citationally related documents, said computer system comprising:
- a computer-accessible index, stored in a physical data store, comprising identification information identifying multiple potential input documents from said body of citationally related documents and, for each said potential input document, identification information identifying a selected number of citationally related potential output documents from said body of citationally related documents, said computer-accessible index further comprising for each possible pair of citationally related potential input document and potential output document a first numerical score that is statistically correlated to the probability that a direct citation exists between said corresponding pair of citationally related documents;
  
  wherein said first numerical score is determined based at least in part on how many indirect citations exist between each potential input document and each potential output document and, for each indirect citation, how many citation links separate each potential input document from each potential output document;
  
  an input interface configured to enable a user to select a first set of identification information identifying one or more input documents from said body of citationally related documents for which relevant output documents are sought;
  
  a computer processor configured to;
  
  access, from said computer-accessible index, said first set of identification information to identify a selection of citationally related output documents; and
  
  calculate, for each identified output document, a second numerical score that is statistically correlated to the probability that a direct citation exists between any input document and each said corresponding output document, and wherein said second numerical score is determined based at least in part on said first numerical score; and
  
  an output interface configured to display a second set of identification information identifying a selected number of said identified output documents selected or ranked in accordance with said second numerical score.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer system of claim 13 wherein said computer-accessible index further comprises, for each said potential input document, identification information identifying citationally related potential output documents extending at least three generations from each said potential input document.
  - 15. The computer system of claim 13 wherein said computer processor is configured to calculate said second numerical score for each said corresponding output document by calculating the mathematical sum of said first numerical score for each said corresponding output document relative to each said input patent.
  - 16. The computer system of claim 13 wherein said body of citationally related documents comprises a data repository comprising identification information identifying multiple millions of potential input documents and citationally related potential output documents.
  - 17. The computer system of claim 13 wherein said body of citationally related documents comprises multiple millions of citationally related patent documents.
  - 18. The computer system of claim 13 wherein said body of citationally related documents comprises more than 80 million citationally related patent documents and related scientific literature.
  - 19. The computer system of claim 13 wherein said output interface is configured to visually display said second set of identification information in the form of a self-organizing map.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
PatentRatings, LLC
Original Assignee
PatentRatings, LLC
Inventors
Barney, Jonathan A.
Primary Examiner(s)
Jami, Hares

Application Number

US13/958,386
Publication Number

US 20140067829A1
Time in Patent Office

389 Days
Field of Search

707/705, 707/722, 707/726, 707/728, 707/731, 707/923, 707/930, 707/933, 707/937, 707/999.1
US Class Current

707/722
CPC Class Codes

G06F 16/14   Details of searching files ...

G06F 16/2228   Indexing structures

G06F 16/24578   using ranking

G06F 16/2465   Query processing support fo...

G06F 16/248   Presentation of query results

G06F 16/26   Visual data mining; Browsin...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/3346   using probabilistic model

G06F 16/34   Browsing; Visualisation the...

G06F 16/382   using citations hypermedia ...

G06F 16/93   Document management systems

G06F 16/95   Retrieval from the web

G06F 16/951   Indexing; Web crawling tech...

G06F 2216/11   Patent retrieval

Y10S 707/912   Applications of a database

Y10S 707/923   Intellectual property

Y10S 707/93   intellectual property analysis

Y10S 707/933   Citation analysis

Y10S 707/937   intellectual property searc...

Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links