Scalable metadata extraction for video search
First Claim
Patent Images
1. A computer-implemented method for extracting metadata, the computer-implemented method comprising performing computer-implemented operations for:
- grouping, by a computer comprising one or more processors, a plurality of Web pages into a group based upon a common visual layout shared among the plurality of Web pages, wherein the group is one of a plurality of groups formed according to common visual layout shared among additional Web pages considered for grouping;
removing, by the computer, one or more regions shared among the plurality of Web pages from each of the plurality of Web pages in the group to define a candidate region, and removing one or more regions each containing common elements shared among the plurality of Web pages in the group;
extracting, by the computer, one or more candidate features from the candidate region; and
selecting, by the computer, one of the one or more candidate features for use in a video entity template.
2 Assignments
0 Petitions
Accused Products
Abstract
Video entity templates defining common features that relate to various metadata types shared among a group of video Web pages are generated for target Web sites. Metadata associated with videos contained within Web pages belonging to a particular target Web site can then be automatically and accurately extracted using a video entity template generated for the particular target Web site. This metadata can then be indexed for use by video search applications in providing video search results.
-
Citations
18 Claims
-
1. A computer-implemented method for extracting metadata, the computer-implemented method comprising performing computer-implemented operations for:
-
grouping, by a computer comprising one or more processors, a plurality of Web pages into a group based upon a common visual layout shared among the plurality of Web pages, wherein the group is one of a plurality of groups formed according to common visual layout shared among additional Web pages considered for grouping; removing, by the computer, one or more regions shared among the plurality of Web pages from each of the plurality of Web pages in the group to define a candidate region, and removing one or more regions each containing common elements shared among the plurality of Web pages in the group; extracting, by the computer, one or more candidate features from the candidate region; and selecting, by the computer, one of the one or more candidate features for use in a video entity template. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method for extracting metadata, the computer-implemented method comprising performing computer-implemented operations for:
-
generating, by a computer comprising one or more processors, a page group comprising a plurality of Web pages, each of which contains a video, wherein the page group is one of a plurality of page groups formed according to common visual layout shared among additional web pages considered for grouping; removing one or more regions shared among the plurality of Web pages from each of the plurality of Web pages in the page group to define a candidate region, and removing one or more regions each containing common elements shared among the plurality of Web pages in the page group; generating, by the computer, a video entity template from the page group, the video entity template comprising identification of an entity; matching, by the computer, a target page to the video entity template; extracting, by the computer, metadata associated with the entity from the target page utilizing the video entity template; and indexing, by the computer, the metadata extracted from the target page in a video search index. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer storage medium that does not include signals having computer readable instructions stored thereupon that, when executed by a computer, cause the computer to:
-
group Web pages of a Web site by visual layout into a plurality of groups, wherein each group of the plurality of groups is one of a plurality of groups formed according to common visual layout shared among additional Web pages considered for grouping; remove one or more regions shared among the Web pages from each of the Web pages in the plurality of groups to define a candidate region for each group of the plurality of groups, and removing one or more regions each containing common elements shared among the Web pages in each group; select a target group from the plurality of groups from which to generate a video entity template; remove common elements of the Web pages in the target group; remove repeat regions of the Web pages in the target group; extract one or more candidate features from a remaining candidate region for the Web pages in the target group, the one or more candidate features being candidate features for a particular target entity; select a particular candidate feature of the one or more candidate features for the particular target entity; cross-validate the particular candidate feature to previously selected candidate features from one or more other groups; if cross-validation fails, return to candidate feature extraction; if cross-validation is successful, generate the video entity template; and output the video entity template. - View Dependent Claims (17, 18)
-
Specification