Generating a multi-use vocabulary based on image data
First Claim
Patent Images
1. A method for generating a vocabulary for non-textual items, comprising:
- under control of one or more processors configured with executable instructions;
providing a source dataset of a first type comprising a plurality of items of the first type;
identifying features in the source dataset; and
generating a plurality of words associated with the features to form a single vocabulary, the single vocabulary serving as a mechanism for use in retrieving items from plural different target datasets of different types in response to queries made to the plural different target datasets, wherein;
the different types comprise different themes or scenes,each word of the plurality of words is associated with a weight with respect to a particular document, the weight being determined based on multiplying a term frequency (TF) of the word with an inverse document frequency (IDF) of the word,the term frequency of each word with respect to the particular document comprises a normalized frequency of the word in the particular document,the inverse document frequency of each word determines whether the word is useful for distinguishing a relevant document from an irrelevant document based on how frequently the word appears in a plurality of documents, andthe inverse document frequency of each word is determined based on a logarithmic function of a ratio between a total number of documents in a database and a total number of documents in which the word appears.
2 Assignments
0 Petitions
Accused Products
Abstract
Functionality is described for generating a vocabulary from a source dataset of image items or other non-textual items. The vocabulary serves as a tool for retrieving items from a target dataset in response to queries. The vocabulary has at least one characteristic that allows it to be used to retrieve items from multiple different target datasets. A target dataset can have a different size than the source dataset and/or a different type than the source dataset. The enabling characteristic may correspond to a size of the source dataset above a prescribed minimum number of items and/or a size of the vocabulary above a prescribed minimum number of words.
-
Citations
19 Claims
-
1. A method for generating a vocabulary for non-textual items, comprising:
-
under control of one or more processors configured with executable instructions; providing a source dataset of a first type comprising a plurality of items of the first type; identifying features in the source dataset; and generating a plurality of words associated with the features to form a single vocabulary, the single vocabulary serving as a mechanism for use in retrieving items from plural different target datasets of different types in response to queries made to the plural different target datasets, wherein; the different types comprise different themes or scenes, each word of the plurality of words is associated with a weight with respect to a particular document, the weight being determined based on multiplying a term frequency (TF) of the word with an inverse document frequency (IDF) of the word, the term frequency of each word with respect to the particular document comprises a normalized frequency of the word in the particular document, the inverse document frequency of each word determines whether the word is useful for distinguishing a relevant document from an irrelevant document based on how frequently the word appears in a plurality of documents, and the inverse document frequency of each word is determined based on a logarithmic function of a ratio between a total number of documents in a database and a total number of documents in which the word appears. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more memory devices configured with computer-executable instructions that, when executed by one or more processors, configure the one or more processors to perform acts comprising:
-
providing a source dataset of a first type comprising a plurality of items of the first type; identifying features of the first type in the source dataset; and generating a plurality of words associated with the features to form a single vocabulary, the single vocabulary serving as a mechanism for use in retrieving items from plural different target datasets of different types in response to queries made to the plural different target datasets, wherein; the different types comprise different themes or scenes, each word of the plurality of words is associated with a weight with respect to a particular document, the weight being determined based on multiplying a term frequency (TF) of the word with an inverse document frequency (IDF) of the word, the term frequency of each word with respect to the particular document comprises a normalized frequency of the word in the particular document, the inverse document frequency of each word determines whether the word is useful for distinguishing a relevant document from an irrelevant document based on how frequent the word appears in a plurality of documents, and the inverse document frequency of each word is determined based on a logarithmic function of a ratio between a total number of documents in a database and a total number of documents in which the word appears. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. One or more computing devices, comprising:
-
one or more processors; and memory to store computer-executable instructions that, when executed by the one or more processors, perform acts comprising; providing a source dataset of a first type comprising a plurality of items of the first type; identifying features in the source dataset; and generating a plurality of words associated with the features to form a single vocabulary, the single vocabulary serving as a mechanism for use in retrieving items from plural different target datasets of different types in response to queries made to the plural different target datasets, wherein; the different types comprise different themes or scenes, each word of the plurality of words is associated with a weight with respect to a particular document, the weight being determined based on multiplying a term frequency (TF) of the word with an inverse document frequency (IDF) of the word, the term frequency of each word with respect to the particular document comprises a normalized frequency of the word in the particular document, the inverse document frequency of each word determines whether the word is useful for distinguishing a relevant document from an irrelevant document based on how frequent the word appears in a plurality of documents, and the inverse document frequency of each word is determined based on a logarithmic function of a ratio between a total number of documents in a database and a total number of documents in which the word appears.
-
Specification