SYSTEM AND METHOD FOR PHONETIC SEARCHING OF DATA
First Claim
1. A method of indexing media information for phonetic searching comprising:
- providing pointers to respective locations of source media files including audio information which is to be made searchable;
providing subsets of one or more pointers to respective ones of a plurality of parallel processing indexing jobs;
each indexing job, reading each media file from a pointer location of said subset of pointers, providing a phonetic stream corresponding to the audio information for said media file and generating an index file for said media file containing said phonetic stream; and
appending each index file to an archive file, said archive file stored within a distributed filing system (DFS) in which sequential blocks of data comprising said archive file are replicated to be locally available to one or more processors from a cluster of processors for sequential reading of said blocks, each block storing index files corresponding to a plurality of said source media files.
21 Assignments
0 Petitions
Accused Products
Abstract
A method of phonetically searching media information comprises receiving a plurality of search queries from one or more client systems and providing a phonetic representation of each search query. One or more search jobs are instantiated, each search job comprising a plurality of tasks, each task being arranged to sequentially read a block from an archive file. The archive file is stored within a distributed filing system (DFS) in which sequential blocks of data comprising the archive file are replicated to be locally available to one or more processors from a cluster of processors for executing the tasks. Each block stores index files corresponding to a plurality of source media files, each index file containing a phonetic stream corresponding to audio information for a given source media file. Each task obtains phonetic representations of outstanding search queries for a block and sequentially searches the block for each outstanding search query.
-
Citations
20 Claims
-
1. A method of indexing media information for phonetic searching comprising:
-
providing pointers to respective locations of source media files including audio information which is to be made searchable; providing subsets of one or more pointers to respective ones of a plurality of parallel processing indexing jobs; each indexing job, reading each media file from a pointer location of said subset of pointers, providing a phonetic stream corresponding to the audio information for said media file and generating an index file for said media file containing said phonetic stream; and appending each index file to an archive file, said archive file stored within a distributed filing system (DFS) in which sequential blocks of data comprising said archive file are replicated to be locally available to one or more processors from a cluster of processors for sequential reading of said blocks, each block storing index files corresponding to a plurality of said source media files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method of phonetically searching media information comprising:
-
receiving a plurality of search queries from one or more client systems; providing a phonetic representation of each search query; instantiating one or more search jobs, each search job comprising a plurality of tasks, each task being arranged to sequentially read a block from an archive file, said archive file stored within a distributed filing system (DFS) in which sequential blocks of data comprising said archive file are replicated to be locally available to one or more processors from a cluster of processors for executing said tasks, each block storing index files corresponding to a plurality of said source media files, each index file containing a phonetic stream corresponding to audio information for a given source media file; each task, obtaining phonetic representations of outstanding search queries for a block and sequentially searching said block for each outstanding search query; and responsive to matching a search query with a location within said phonetic stream for an index file, returning said location and an identifier of said source media for responding to said search query. - View Dependent Claims (17, 18, 19, 20)
-
Specification