Concept indexing among database of documents using machine learning techniques

US 9,898,528 B2
Filed: 05/19/2016
Issued: 02/20/2018
Est. Priority Date: 12/22/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms;

querying a data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments;

determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment;

accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated occurrences of with the second concept within the plurality of segments;

determining a ranking of the first segment relative to the second segment by at least;

generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data;

generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and

combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences;

calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and

causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and techniques for indexing and/or querying a database are described herein. Discrete sections and/or segments from documents may be determined by a concept indexing system. The segments may be indexed by concept and/or higher-level category of interest to a user. A user may query the segments by one or more concepts. The segments may be analyzed to rank the segments by statistical accuracy and/or relatedness to one or more particular concepts. The rankings may be used for presentation of search results in a user interface. Furthermore, segments and/or documents may be ranked based on recency decay functions that distinguish between segments that maintain their relevance over time in contrast with temporal segments whose relevance decays quicker over time, for example.

Citations

17 Claims

1. A computer-implemented method comprising:
- receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms;
  
  querying a data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments;
  
  determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment;
  
  accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated occurrences of with the second concept within the plurality of segments;
  
  determining a ranking of the first segment relative to the second segment by at least;
  
  generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data;
  
  generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and
  
  combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences;
  
  calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and
  
  causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, wherein determining the ranking of the first segment relative to the second segment further comprises:
    - calculating a first value by multiplying the first quantity of occurrences and the first weight; and
      
      calculating a second value by multiplying the second quantity of occurrences and the second weight, wherein the ranking is based at least on the first and second values.
  - 3. The computer-implemented method of claim 1, wherein calculating the first recency score comprises:
    - determining a time associated with the first segment; and
      
      applying a decay function to the time to determine the first recency score.
  - 4. The computer-implemented method of claim 3, further comprising:
    - determining a quantity of temporal words within the first segment; and
      
      determining a second recency score by adjusting the first recency score based on the quantity of temporal words, wherein the ranking is further based at least on the second recency score.

5. A non-transitory computer storage medium storing computer executable instructions that when executed by a computer hardware processor perform operations comprising:
- receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms;
  
  querying a data store comprising a plurality of segments to retrieve a result set based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments;
  
  determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment;
  
  accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated with occurrences of the second concept within the plurality of segments;
  
  determining a ranking of the first segment relative to the second segment by at least;
  
  generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data;
  
  generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and
  
  combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences;
  
  calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and
  
  causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking.
- View Dependent Claims (6, 7, 8, 9, 10)
- - 6. The non-transitory computer storage medium of claim 5, wherein determining the first quantity of occurrences is based at least on a quantity of keywords associated with the first concept in the first segment.
  - 7. The non-transitory computer storage medium of claim 5, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - combining the first quantity of occurrences, the first weight, the second quantity of occurrences, and the second weight.
  - 8. The non-transitory computer storage medium of claim 5, wherein calculating the first recency score further comprises:
    - determining a time associated with the first segment; and
      
      applying a decay function to the time to determine the first recency score.
  - 9. The non-transitory computer storage medium of claim 8, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - determining a quantity of temporal words within the first segment; and
      
      determining a second recency score by adjusting the first recency score based on the quantity of temporal words, wherein the ranking is further based at least on the second recency score.
  - 10. The non-transitory computer storage medium of claim 8, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - decreasing the ranking for the first segment relative to the second segment where the first recency score indicates the first segment is less recent.

11. A computer system comprising:
- one or more hardware computer processors programmed, via executable code instructions, to;
  
  receive, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms;
  
  query data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments;
  
  determine a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment;
  
  access first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated with occurences of the second concept within the plurality of segments;
  
  determine a ranking of the first segment relative to the second segment by at least;
  
  generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data;
  
  generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and
  
  combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences;
  
  calculate a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and
  
  cause presentation, in a user interface, of the first segment and the second segment, wherein the presentation indicates the ranking.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The computer system of claim 11, wherein determining the first quantity of occurrences is based at least on a quantity of keywords associated with the first concept in the first segment.
  - 13. The computer system of claim 11, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - combining the first quantity, the first weight, the second quantity, and the second weight.
  - 14. The computer system of claim 11, wherein calculating the first recency score further comprises:
    - determining a time associated with the first segment; and
      
      applying a decay function to the time to determine the first recency score.
  - 15. The computer system of claim 14, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - determining a quantity of temporal words within the first segment; and
      
      determining a second recency score by lowering the first recency score by the quantity of temporal words, wherein the ranking is further based at least on the second recency score.
  - 16. The computer system of claim 14, wherein determining the ranking for the first segment relative to the second segment further comprises:
    - decreasing the ranking for the first segment relative to the second segment where the first recency score indicates the first segment is less recent.
  - 17. The computer system of claim 11, wherein determining the ranking for the first segment relative to the second segment is further based at least on a relationship R, wherein relationship R is defined substantially as:
    - ∝
      
      *geometric mean (P)*quanity of P−
      
      (1−
      
      ∝
      
      )*sum(O),where ∝
      
      is a constant,P comprises a first density of at least the first concept and the second concept in the first segment based on the first and second weights, andO comprises a second density of one or more other concepts in the first segment, wherein the one or more other concepts do not include at least the first concept and the second concept.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Palantir Technologies Incorporated
Original Assignee
Palantir Technologies Incorporated
Inventors
Kesin, Max
Primary Examiner(s)
ALAM, SHAHID AL

Application Number

US15/159,622
Publication Number

US 20160342681A1
Time in Patent Office

642 Days
Field of Search

707725
US Class Current
CPC Class Codes

G06F 16/248   Presentation of query results

G06F 16/282   Hierarchical databases, e.g...

G06F 16/31   Indexing; Data structures t...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/338   Presentation of query results

G06F 16/353   into predefined classes

G06F 16/367   Ontology

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

G06N 20/00   Machine learning

Concept indexing among database of documents using machine learning techniques

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Concept indexing among database of documents using machine learning techniques

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links