×

Database for mixed media document system

  • US 9,405,751 B2
  • Filed: 07/31/2006
  • Issued: 08/02/2016
  • Est. Priority Date: 08/23/2005
  • Status: Active Grant
First Claim
Patent Images

1. A database system for providing mixed media documents, comprising:

  • one or more processors;

    an index table, stored on a memory and accessible by the one or more processors, that stores electronic descriptions of features extracted from paper documents, wherein the features include word bounding boxes, feature location information for the features, and association information for each of the paper documents and locations with a mixed media document that combines printed and digital media;

    a feature extraction module, stored on the memory and executable by the one or more processors to;

    receive an image patch;

    determine word bounding boxes from the image patch by aligning the image patch with a horizontal axis, detecting text lines in the image patch based on the aligned image patch, locating an area within each text line that is above a threshold as a word, and identifying the bounding boxes for words within the text lines;

    generate a query from the image patch, at least one query term of the query comprising a two-dimensional geometric relationship between the word bounding boxes determined from the image patch, the two-dimensional geometric relationship specifying one or more of a direction, an angle, a distance between the word bounding boxes determined from the image patch, and geometric shape and contour of the word bounding boxes; and

    an accumulator module, stored on the memory and executable by the one or more processors to;

    locate at least one mixed media document that contains the word bounding boxes determined from the image patch; and

    determine that the at least one mixed media document is a potential match to the query based on determining a two-dimensional geometric relationship between the features stored in the index table, comparing the two-dimensional geometric relationship between the word bounding boxes determined from the image patch with the two-dimensional geometric relationship between the features stored in the index table, computing a matching score for the at least one mixed media document, and returning the at least one mixed media document as a match to the query if the matching score is above a threshold.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×