Methods and apparatus for indexing and searching of multi-media web pages
First Claim
1. Apparatus for indexing a Web page which incorporates multimedia data by reference to one or more resources which supply said multimedia data, said method comprising, in combination:
- means for analyzing said web page to identify at least one markup tag containing a reference to a given one of said resources,means for selecting and executing a media processing program for analyzing the content of the multimedia data supplied by said given one of said resources to generate metadata describing said content,means for formatting said metadata into a character-based text annotation,means for combining said Web page and said annotation to form an enhanced Web page, andmeans for indexing said enhanced Web page.
3 Assignments
0 Petitions
Accused Products
Abstract
A system for automatically enhancing Web pages with annotations expressed in Extensible Markup Language (XML) which describes the pages'"'"' multimedia content. Each Web page is parsed or scanned to identify markup tags which contain the URLs of separately stored multimedia data (e.g. image, audio or video files). Each referenced multimedia data entity is then retrieved and analyzed by a type-specific process to extract metadata which describes its content. Additional descriptive metadata may be obtained from the referencing markup tag, accepted from a human editor, or fetched from operating system directories which provide access to the multimedia files. The resulting metadata is expressed in text-based XML format and inserted into a copy of the Web page to form an enhanced Web page whose multimedia content may then be processed by conventional text-based indexing and searching facilities.
-
Citations
17 Claims
-
1. Apparatus for indexing a Web page which incorporates multimedia data by reference to one or more resources which supply said multimedia data, said method comprising, in combination:
-
means for analyzing said web page to identify at least one markup tag containing a reference to a given one of said resources, means for selecting and executing a media processing program for analyzing the content of the multimedia data supplied by said given one of said resources to generate metadata describing said content, means for formatting said metadata into a character-based text annotation, means for combining said Web page and said annotation to form an enhanced Web page, and means for indexing said enhanced Web page. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Apparatus for collecting and storing metadata describing a hypertext Web page, said Web page including markup tags which identify multimedia data from one or more different external resources, said apparatus comprising, in combination,
a parser for identifying said markup tags in said Web page, processing means for analyzing the content of said multimedia data identified by said markup tags to generate metadata describing said multimedia data, means for translating said metadata into a character-based text annotation describing said multimedia data, and means for storing the combination of a copy of said Web page and said annotation to form an enhanced Web page suitable for processing by text-based indexing and searching facilities.
-
10. The method of automatically enhancing the content of a Web page which contains multimedia data incorporated by reference which comprises, in combination, the steps of:
-
identifying one or more markup tags in said Web page which respectively identify one or more external resources which provide said multimedia data; generating metadata which de scribes said multimedia data, translating said metadata into a character-based text annotation, and inserting said annotation into said Web page to form an enhanced Web page suitable for processing by a character-based text processing system. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification