×

Concept identification system and method for use in reducing and/or representing text content of an electronic document

  • US 6,823,331 B1
  • Filed: 08/28/2000
  • Issued: 11/23/2004
  • Est. Priority Date: 08/28/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-readable concept identification system including modules executable by said computer'"'"'s programmable processor for identifying a concept to which an electronic document relates, said concept identification system comprising:

  • (a) a concept knowledge base comprising a plurality of concept schemas wherein each said concept schema comprises;

    (i) a concept comprising concept terms, including synonyms, that represent said concept; and

    (ii) a plurality of subconcepts linked to said concept and/or to each other, on a hierarchical basis, and comprising subconcept terms, including synonyms, that represent said subconcept; and

    wherein said concept schemas comprise one or more sets of multi-relationship concepts wherein one or more subconcepts of a concept of said multi-relationship concepts of each said set is linked to another concept of said multi-relationship concepts of said set through said hierarchically linked subconcepts and concepts of said multi-relationship concepts of said set; and

    , (b) a concept matching module configured for;

    (i) comparing key word(s) and/or key phrase(s) and/or key sentence fragment(s) of said document to said concept terms and subconcept terms of said concept schemas and identifying matched terms from said comparing;

    (ii) counting said matched terms to determine a match count;

    (iii) identifying matched multi-relationship concepts from any subconcepts of multi-relationship concepts comprising said matched terms;

    (iv) firstly assigning threshold weights to only those of said matched terms which are not comprised in said matched multi-relationship concepts, wherein said firstly assigned threshold weight assigned to each said matched term is based on a level of inherent distinctiveness of said matched term to said concept of said concept schema comprising said matched term;

    (v) determining which of said multi-relationship concepts is more related to said document on the basis of said firstly assigned threshold weights;

    (vi) secondly assigning threshold weights to said matched key word(s) and/or key phrase(s) and/or key sentence fragment(s) which are matched to terms of subconcepts of said multi-relationship concepts on the basis of said multi-relationship concept determined to be more related to said document;

    (vii) for each said concept schema having said matched terms, calculating an overall matching weight representative of said match count and said assigned threshold weights; and

    , (viii) comparing each said overall matching weight calculated for a concept schema to a predetermined matching weight for that concept schema, and from said comparing, determining whether said document is characterized by the concept of said that concept schema.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×