×

Method and system relating to salient content extraction for electronic content

  • US 9,336,202 B2
  • Filed: 01/30/2013
  • Issued: 05/10/2016
  • Est. Priority Date: 05/15/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • a) receiving an item of content;

    b) identifying within the item of content using a microprocessor a set of lexical pattern cues for core content of the item of content and selecting a segment of the item of content having a highest likelihood as being the core content based upon a structural analysis of the item of content in dependence upon at least the set of lexical pattern cues;

    c) parsing the item of content to generate a hierarchy of content within the item of content;

    d) ranking the hierarchy of content in dependence upon at least the lexical pattern cues and sorting the resulting ranking;

    e) identifying a gap when searching down the ranking meeting a predetermined threshold and removing those portions of the hierarchy of content below the gap to generate truncated content;

    f) finding all occurrences for portions of the hierarchy of content with closest match to the lexical pattern cues closest to the start of the item of content;

    g) determining whether multiple matches to the lexical pattern cues exist and establishing an action in dependence upon at least whether multiple matches exist or not;

    h) performing the action, wherein the action is at least one of;

    establishing the occurrence for the portion of the hierarchy of content as the core content of the item of content when the determination of multiple matches is negative; and

    establishing the occurrence for the portion of the hierarchy of content that at least one of contains the largest portion of the item of content and is the first occurrence as the core content of the item of content when the determination of multiple matches is positive.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×