×

Automatic metadata identification

  • US 8,510,312 B1
  • Filed: 09/28/2007
  • Issued: 08/13/2013
  • Est. Priority Date: 09/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by one or more processors associated with one or more network devices, the method comprising:

  • capturing text of a document;

    comparing the text of the document to content of each of a plurality of metadata records, each of the plurality of metadata records storing information associated with a particular one of a plurality of documents that differs from the document;

    selecting, based on comparing the text of the document to the content, one or more of the plurality of metadata records, where, for each of the selected metadata records, a portion of the associated content corresponds to at least a portion of the text of the document;

    scoring each of the selected metadata records, including calculating a score representing a correspondence between the text of the document and the content of the respective one of the selected metadata records, where scoring each of the selected metadata records further includes;

    calculating a first probability associated with a likelihood of one or more common phrases, that appear in both the text of the document and the content of the one of the selected metadata records, also appearing in the contents of the plurality of metadata records,calculating a second probability associated with a likelihood of the one or more common phrases appearing in text of the plurality of documents, andscoring the one of the selected metadata records based on the first probability and second probability;

    ranking the selected metadata records based on scoring the selected metadata records; and

    storing an association between the document and a particular number of highest ranking ones of the selected metadata records.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×