×

System and method for phonetic searching of data

  • US 9,405,828 B2
  • Filed: 09/06/2012
  • Issued: 08/02/2016
  • Est. Priority Date: 09/06/2012
  • Status: Active Grant
First Claim
Patent Images

1. A multiprocessor-implemented method of indexing media information within a Hadoop framework for phonetic searching, the method comprising:

  • providing, within a Hadoop framework of processors, pointers to respective locations of source media files including audio information which is to be made searchable;

    wherein each pointer corresponds to a respective source media file;

    providing, within the Hadoop framework of processors, a respective set of one or more of the pointers to respective ones of a plurality of Hadoop Map Reduce Framework (MR) jobs,wherein each respective set comprises one or more subsets of the one or more of the pointers;

    wherein each MR job instantiates concurrently executing Map tasks, each Map task associated with one of the subsets of the one or more pointers and wherein each Map task;

    processes each of the corresponding source media files corresponding to the associated one of the subsets of the one or more pointers, andreads each of the corresponding source media files and generates a respective binary index file corresponding to a probabilistic phonetic stream of audio information for that corresponding source media file;

    appending, within the Hadoop framework of processors, each of the respective binary index files to a respective associated one of a plurality of different archive files;

    each respective archive file comprising a searchable phonetic representation of the audio information appended thereto; and

    appending, within the Hadoop framework of processors, the respective binary index file of the concurrently executing Map tasks to different ones of the plurality of different archive files in order for the concurrently executing Map tasks to run in parallel using separate processors, said plurality of different archive files stored within a Hadoop distributed filing system (DFS) in which sequential blocks of data comprising each respective archive file are replicated to be locally available to one or more processors from a cluster of processors for sequential reading of said sequential blocks, each block storing a plurality of the respective binary index files, wherein each respective binary index file is formatted to be compatible with search tasks running a phonetic speech search engine.

View all claims
  • 21 Assignments
Timeline View
Assignment View
    ×
    ×