×

System and methods for quantitative assessment of information in natural language contents and for determining relevance using association data

  • US 9,201,927 B1
  • Filed: 01/01/2013
  • Issued: 12/01/2015
  • Est. Priority Date: 01/07/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method implemented on a computer comprising a processor, and for determining relevance between a text content and an object or a topic, the method comprising:

  • receiving a text content comprising one or more words or phrases or sentences as terms, and tokenizing the text content into one or more tokens, each being an instance of a term in the text content;

    identifying a grammatical attribute, or a semantic attribute, or an external term frequency associated with the one or more tokens or terms in the text content, wherein the grammatical attribute includes at least a subject, a predicate or part of a predicate, a modifier in a phrase, a head of a phrase, a sub-phrase of a phrase, an object, a noun, a verb, an adjective, or an adverb, wherein the semantic attribute includes at least semantic roles and attribute values, wherein the external term frequency is obtained from text contents other than the received text content;

    determining an importance measure for each token or term based on the grammatical attribute, or the semantic attribute, or the external term frequency;

    receiving one or more datasets, wherein each dataset is associated with a name or description representing an object, wherein the object comprises a physical or conceptual object, a topic, or a pre-defined attribute, and wherein each dataset comprises one or more words or phrases as names of properties associated with the corresponding object, wherein the names of properties represent other objects or concepts or topics or attributes-related to the object, wherein the names of properties collectively represent a type of definition or representation of the object;

    matching at least two tokens or terms in the text content with at least two property names in each of the one or more datasets;

    for each of the one or more datasets, producing a score based at least on the importance measure of the token or term that matches a property name in the dataset, when the importance measure is in the form of a term importance score that is calculated based on the external frequency, or based on the grammatical attribute, or based on the semantic attribute or attribute value, and when the score based on the importance measure is in the form of a relevance score, the relevance score is produced as a function of the term importance score; and

    marking or selecting one or more of the names or descriptions representing the one or more objects as being relevant to the text content if the corresponding score is above a predefined threshold.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×