Software web crowler and method therefor
First Claim
Patent Images
1. A computerized system comprising:
- a crawler obtaining multimedia files from a network, the crawler comprising;
a multi-threaded downloader that downloads web pages;
a queue storing links corresponding to the download web pages;
a scheduler obtaining the stored links from the queue and passing the obtained links to the multi-threaded downloader, wherein the multi-threaded downloader downloads multiple multimedia files concurrently from said links;
a multimedia processor receiving said multimedia files from the crawler and processing said multimedia files by translating speech in the multimedia files into a textual representation,wherein said multimedia processor determines sound effects in said multimedia files by comparing said sound effects in said multimedia files against a predetermined set of sounds, wherein generated metadata is determined by the comparison, and wherein said metadata comprises keywords identifying a type of said sound effects;
a data mining module that extracts text information from the textual representation; and
an indexer that indexes the multimedia files based on said keywords and said text information.
20 Assignments
0 Petitions
Accused Products
Abstract
System for crawling the web for multimedia files and indexing the files based on sound analysis and algorithmic translation.
-
Citations
10 Claims
-
1. A computerized system comprising:
a crawler obtaining multimedia files from a network, the crawler comprising; a multi-threaded downloader that downloads web pages; a queue storing links corresponding to the download web pages; a scheduler obtaining the stored links from the queue and passing the obtained links to the multi-threaded downloader, wherein the multi-threaded downloader downloads multiple multimedia files concurrently from said links; a multimedia processor receiving said multimedia files from the crawler and processing said multimedia files by translating speech in the multimedia files into a textual representation, wherein said multimedia processor determines sound effects in said multimedia files by comparing said sound effects in said multimedia files against a predetermined set of sounds, wherein generated metadata is determined by the comparison, and wherein said metadata comprises keywords identifying a type of said sound effects; a data mining module that extracts text information from the textual representation; and an indexer that indexes the multimedia files based on said keywords and said text information. - View Dependent Claims (2, 3, 4, 5, 9, 10)
-
6. A computer implemented method comprising:
-
obtaining multimedia files from a network by utilizing a multi-threaded downloader that download multiple multimedia files concurrently from links acquired from downloaded web pages; processing, using a processor, the obtained multimedia files by translating speech in the multimedia files into a textual representation, wherein said processing further determines sound effects in said multimedia files by comparing said sound effects in said multimedia files against a predetermined set of sounds, wherein generated metadata is determined by the comparison, and wherein said metadata comprises keywords identifying a type of said sound effects; extracting text information from the textual representation, and; indexing the multimedia files based on said keywords and said text information. - View Dependent Claims (7, 8)
-
Specification