Non-text content item search
First Claim
1. A method performed by data processing apparatus, the method comprising:
- identifying a non-text content item that is associated with each of a plurality of web pages;
receiving label data that includes a set of initial labels for the non-text content item, wherein each initial label includes one or more words;
grouping, for each of two or more sets of matching web pages among the plurality of web pages, initial labels that are associated with the set of matching web pages into a label group, the initial labels for different set of matching web pages being grouped to different label groups;
grouping different sets of matching labels from the set of initial labels into different label groups; and
selecting, as a final label for the non-text content item, an n-gram of one or more words that is included in at least a threshold number of different label groups.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting labels for a non-text content item. In one aspect, a method receives a set of initial labels for a non-text content item, wherein the set of initial labels specifies text that has been identified as descriptive of the non-text content item and a web page to which the text corresponds. Initial labels corresponding to sets of matching web pages are grouped into separate initial label groups that correspond to each set of matching web pages. Sets of matching labels are grouped into other separate initial label groups that correspond to the sets of matching labels. One or more words that are included in at least a threshold number of the separate label groups are selected as final labels for the non-text content item.
-
Citations
24 Claims
-
1. A method performed by data processing apparatus, the method comprising:
-
identifying a non-text content item that is associated with each of a plurality of web pages; receiving label data that includes a set of initial labels for the non-text content item, wherein each initial label includes one or more words; grouping, for each of two or more sets of matching web pages among the plurality of web pages, initial labels that are associated with the set of matching web pages into a label group, the initial labels for different set of matching web pages being grouped to different label groups; grouping different sets of matching labels from the set of initial labels into different label groups; and selecting, as a final label for the non-text content item, an n-gram of one or more words that is included in at least a threshold number of different label groups. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
identifying a non-text content item that is associated with each of a plurality of web pages; receiving label data that includes a set of initial labels for the non-text content item, wherein each initial label includes one or more words; grouping, for each of two or more sets of matching web pages among the plurality of web pages, initial labels that are associated with the set of matching web pages into a label group, the initial labels for different set of matching web pages being grouped to different label groups; grouping different sets of matching labels from the set of initial labels into different label groups; and selecting, as a final label for the non-text content item, an n-gram of one or more words that is included in at least a threshold number of different label groups. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a data store storing label data that specifies a set of initial labels for a non-text content item, wherein the non-text content item is associated with each of a plurality of web pages, and wherein each initial label includes one or more words; and one or more computers coupled to the data store, the one or more computers storing instructions that cause the one or more computers to interact with the data store and perform operations comprising; identifying a non-text content item that is associated with each of a plurality of web pages; receiving label data that includes a set of initial labels for the non-text content item, wherein each initial label includes one or more words; grouping, for each of two or more sets of matching web pages among the plurality of web pages, initial labels that are associated with the set of matching web pages into a label group, the initial labels for different set of matching web pages being grouped to different label groups; grouping different sets of matching labels from the set of initial labels into different label groups; and selecting, as a final label for the non-text content item, an n-gram of one or more words that is included in at least a threshold number of different label groups. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification