Concept indexing among database of documents using machine learning techniques
First Claim
1. A computer-implemented method comprising:
- receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms;
querying a data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments;
determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment;
accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated occurrences of with the second concept within the plurality of segments;
determining a ranking of the first segment relative to the second segment by at least;
generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data;
generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and
combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences;
calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and
causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and techniques for indexing and/or querying a database are described herein. Discrete sections and/or segments from documents may be determined by a concept indexing system. The segments may be indexed by concept and/or higher-level category of interest to a user. A user may query the segments by one or more concepts. The segments may be analyzed to rank the segments by statistical accuracy and/or relatedness to one or more particular concepts. The rankings may be used for presentation of search results in a user interface. Furthermore, segments and/or documents may be ranked based on recency decay functions that distinguish between segments that maintain their relevance over time in contrast with temporal segments whose relevance decays quicker over time, for example.
-
Citations
17 Claims
-
1. A computer-implemented method comprising:
-
receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms; querying a data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments; determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment; accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated occurrences of with the second concept within the plurality of segments; determining a ranking of the first segment relative to the second segment by at least; generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data; generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences; calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer storage medium storing computer executable instructions that when executed by a computer hardware processor perform operations comprising:
-
receiving, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms; querying a data store comprising a plurality of segments to retrieve a result set based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments; determining a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment; accessing first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated with occurrences of the second concept within the plurality of segments; determining a ranking of the first segment relative to the second segment by at least; generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data; generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences; calculating a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and causing presentation, in the user interface, of the first segment relative to the second segment according to the ranking. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A computer system comprising:
one or more hardware computer processors programmed, via executable code instructions, to; receive, in a user interface, a first concept and a second concept, wherein the first concept is associated with a first plurality of related terms and the second concept is associated with a second plurality of related terms; query data store comprising a plurality of segments based at least on the first concept and the second concept to retrieve a result set, the result set comprising a first segment and a second segment from the plurality of segments; determine a first quantity of occurrences of the first concept in the first segment, and a second quantity of occurrences of the second concept in the first segment; access first statistical distribution data associated with occurrences of the first concept within the plurality of segments, and second statistical distribution data associated with occurences of the second concept within the plurality of segments; determine a ranking of the first segment relative to the second segment by at least; generating a first weight by comparing the first quantity of occurrences against the first statistical distribution data; generating a second weight by comparing the second quantity of occurrences against the second statistical distribution data; and combining the first weight with the first quantity of occurrences, and the second weight with the second quantity of occurrences; calculate a first recency score associated with the first segment, wherein the ranking is based at least on the first recency score; and cause presentation, in a user interface, of the first segment and the second segment, wherein the presentation indicates the ranking. - View Dependent Claims (12, 13, 14, 15, 16, 17)
Specification