×

Concept indexing among database of documents using machine learning techniques

  • US 9,348,920 B1
  • Filed: 06/22/2015
  • Issued: 05/24/2016
  • Est. Priority Date: 12/22/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computing system for identifying concepts of interests to a user in specific segments of a plurality of documents each having one or more separate segments, the computing system including:

  • one or more hardware computer processors configured to execute software instructions; and

    one or more storage devices storing software instructions configured for execution by the one or more hardware computer processors in order to cause the computing system to;

    identify a plurality of segments within the plurality of documents, wherein at least some of the plurality of documents each include two or more segments, wherein identifying segments includes analyzing the plurality of documents for features indicative of possible section headings, including at least one of;

    casing, spacing, punctuation, common words, or groups of words;

    access a concept hierarchy including a plurality of concepts of interest to the user, the concept hierarchy further including concept keywords associated with respective concepts;

    for each concept, determine statistical likelihoods that respective identified segments are associated with the concept, the statistical likelihoods each based on at least one of, for each combination of a particular concept and a particular segment;

    a density of particular concept keywords in the particular segment, wherein the density is based at least on a ratio of a quantity of particular concept keywords in the particular segment to a quantity of words in the particular segment;

    ora distribution of particular concept keywords within the particular segment, wherein the distribution is based on at least one of a longest span in the particular segment without any mention of particular concept keywords or a median gap between consecutive mentions of respective concept keywords in the particular segment; and

    store in a concept indexing database the plurality of concepts and the statistical likelihoods that respective concepts are in each of the determined respective segments, wherein the concept indexing database is usable to identify, in response to a user query for a specific concept, a ranked listing of one or more segments having highest statistical likelihoods of being associated with the specific concept.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×