Non-text content item search
First Claim
1. A system, comprising:
- a data store storing label data that specifies a set of initial labels for a non-text content item and a resource identifier that specifies, for each initial label, a web page to which the initial label is associated, wherein the non-text content item is associated with each of a plurality of web pages, and wherein each initial label includes one or more words; and
one or more computers coupled to the data store, the one or more computers configured to;
generate initial label groups for sets of matching web pages, each set of matching web pages including two or more matching web pages;
group, for each set of matching web pages, initial labels that are associated with the set of matching web pages into a separate initial label group that corresponds to the set of matching web pages;
generate initial label groups for sets of matching labels, each set of matching labels including two or more initial labels;
group, for each set of matching labels, initial labels that are associated with the set of matching labels into a separate initial label group that corresponds to the set of matching labels; and
select, as final labels for the non-text content item, n-grams of one or more words that are included in at least a threshold number of the separate initial label groups.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting labels for a non-text content item. In one aspect, a method receives a set of initial labels for a non-text content item, wherein the set of initial labels specifies text that has been identified as descriptive of the non-text content item and a web page to which the text corresponds. Initial labels corresponding to sets of matching web pages are grouped into separate initial label groups that correspond to each set of matching web pages. Sets of matching labels are grouped into other separate initial label groups that correspond to the sets of matching labels. One or more words that are included in at least a threshold number of the separate label groups are selected as final labels for the non-text content item.
104 Citations
32 Claims
-
1. A system, comprising:
-
a data store storing label data that specifies a set of initial labels for a non-text content item and a resource identifier that specifies, for each initial label, a web page to which the initial label is associated, wherein the non-text content item is associated with each of a plurality of web pages, and wherein each initial label includes one or more words; and one or more computers coupled to the data store, the one or more computers configured to; generate initial label groups for sets of matching web pages, each set of matching web pages including two or more matching web pages; group, for each set of matching web pages, initial labels that are associated with the set of matching web pages into a separate initial label group that corresponds to the set of matching web pages; generate initial label groups for sets of matching labels, each set of matching labels including two or more initial labels; group, for each set of matching labels, initial labels that are associated with the set of matching labels into a separate initial label group that corresponds to the set of matching labels; and select, as final labels for the non-text content item, n-grams of one or more words that are included in at least a threshold number of the separate initial label groups. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
receiving label data that specifies a set of initial labels for a non-text content item and a resource identifier for each initial label, wherein each initial label includes one or more words; selecting one or more sets of matching web pages from a plurality of web pages, wherein each set of matching web pages includes two or more matching web pages; grouping, for each set of matching web pages, initial labels that are associated with the set of matching web pages into a separate initial label group that corresponds to the set of matching web pages; selecting one or more sets of matching labels, wherein each set of matching labels includes two or more initial labels; grouping each set of matching labels into a separate initial label group that corresponds to the set of matching labels; and selecting, as a final label for the non-text content item, an n-gram of one or more words that are included in at least a threshold number of separate initial label groups. - View Dependent Claims (14, 15, 16)
-
-
17. A method performed by data processing apparatus, the method comprising:
-
selecting a non-text content item that is associated with each of a plurality of web pages; receiving label data that includes a set of initial labels for the non-text content item and a resource identifier for each initial label, wherein each initial label includes one or more words; selecting one or more sets of matching web pages from the plurality of web pages, wherein each set of matching web pages includes two or more matching web pages; grouping, for each set of matching web pages, initial labels that are associated with the set of matching web pages into a separate initial label group that corresponds to the set of matching web pages; selecting one or more sets of matching labels, wherein each set of matching labels includes two or more initial labels; grouping each set of matching labels into a separate initial label group that corresponds to the set of matching labels; and selecting, as a final label for the non-text content item, an n-gram of one or more words that are included in at least a threshold number of separate initial label groups. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification