×

Generating a multi-use vocabulary based on image data

  • US 8,396,331 B2
  • Filed: 07/31/2007
  • Issued: 03/12/2013
  • Est. Priority Date: 02/26/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for generating a vocabulary for non-textual items, comprising:

  • under control of one or more processors configured with executable instructions;

    providing a source dataset of a first type comprising a plurality of items of the first type;

    identifying features in the source dataset; and

    generating a plurality of words associated with the features to form a single vocabulary, the single vocabulary serving as a mechanism for use in retrieving items from plural different target datasets of different types in response to queries made to the plural different target datasets, wherein;

    the different types comprise different themes or scenes,each word of the plurality of words is associated with a weight with respect to a particular document, the weight being determined based on multiplying a term frequency (TF) of the word with an inverse document frequency (IDF) of the word,the term frequency of each word with respect to the particular document comprises a normalized frequency of the word in the particular document,the inverse document frequency of each word determines whether the word is useful for distinguishing a relevant document from an irrelevant document based on how frequently the word appears in a plurality of documents, andthe inverse document frequency of each word is determined based on a logarithmic function of a ratio between a total number of documents in a database and a total number of documents in which the word appears.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×