Methods and apparatus relating to searching of spoken audio data
First Claim
1. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:
- analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data; and
determining, for each of said frames and independently of the others of said frames, a score for each of a set of all reference phones of a language or dialect based on one or more features of said set of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to said phone, wherein for each of said frames, the independently-determined score indicates a probability of each phone from the set of reference phones appearing in the audio data;
generating index data for each of said frames corresponding to said independently-determined scores for each of said set of reference phones;
forming said index data into a data stream directed to a search engine, said engine uses a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be indexed and searched only once and in sequence as a one pass search; and
presenting search results from the audio data in response to the user query.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.
32 Citations
16 Claims
-
1. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:
-
analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data; and determining, for each of said frames and independently of the others of said frames, a score for each of a set of all reference phones of a language or dialect based on one or more features of said set of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to said phone, wherein for each of said frames, the independently-determined score indicates a probability of each phone from the set of reference phones appearing in the audio data; generating index data for each of said frames corresponding to said independently-determined scores for each of said set of reference phones; forming said index data into a data stream directed to a search engine, said engine uses a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be indexed and searched only once and in sequence as a one pass search; and presenting search results from the audio data in response to the user query. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:
-
analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data and determining, for each of said frames and independently of the others of said frames, a score for each of a plurality of reference phones that comprise a complete set of phones of a particular language or dialect, based on one or more features corresponding to said plurality of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to a specific phone of said reference phones and indicating a probability of each phone appearing in the audio data, wherein a searchable data file stores independently-determined scores for each said audio frame in a simple matrix format, and generating, for each of said audio frames, indexing data corresponding to the said independently-determined scores for a plurality of the phones; forming said index data into a data stream directed to a search engine, said engine using a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query, searched only once and in sequence as a one pass search; and presenting search results from the audio data in response to the user query. - View Dependent Claims (8, 9)
-
-
10. A method of searching both live audio data and prerecorded audio data for a phonetic search sequence, said method comprising the steps of;
-
(i) directing a data stream to a search engine, said data stream comprising index data for each of a plurality of audio frames and independently of the other frames, said index data corresponding to the likelihood of a match for a plurality of reference phones that comprise a complete set of phones of a particular language or dialect to a user inputted query for searching the audio data; ii) searching said data stream to find likely matches to a phonetic search sequence in response to the user inputted query, using a dynamic programming method wherein frame-independent scores for the reference phones contained in the data stream, based on one or more features of the reference phones, for each audio frame are used to determine the likely matches using the indexed data searched only once and in sequence as a one pass search, wherein for each audio frame, each frame independent score indicates a probability of each phone from the complete set of reference phones appearing in the input data and presenting search results from the audio data in response to the user query. - View Dependent Claims (11, 12, 13, 14)
-
-
15. An apparatus for acting on live audio data and prerecorded audio data to create a searchable data file comprising:
-
a complete reference set of phones of a particular language or dialect having one or more features corresponding thereto; a phonetic recognizer, implemented by a processor, adapted to compare a frame of audio data with the reference set of phones based on said one or more features and to output a score indicative of the likelihood that said frame corresponds to each phone for each of said frames and independently of the other said frames, wherein each score indicates a probability of each phone appearing in the audio data; a data output store for creating a searchable data file comprising, for each audio frame, the frame-independent score for each of the set of reference phones, said data output store directing the searchable data file to a search engine, said engine using a dynamic programming method to combine said frame-independent scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be searched only once and in sequence as a one pass search; and a display for presenting search results from the audio data in response to the user query.
-
-
16. An apparatus for acting on live audio data and prerecorded audio data to create a searchable data file comprising:
-
a reference set of phones having one or more features corresponding thereto; a phonetic recogniser adapted to compare a frame of audio data with the reference set of phones based on said one or more features and to output a score indicative of the likelihood that said frame corresponds to each phone for each of said frames and independently of the other said frames, wherein each score indicates a probability of each phone appearing in the audio data; and a data output store for creating a searchable data file comprising, for each audio frame, the frame-independent score for each of the set of reference phones, said data output store directing the searchable data file to a search engine, said engine using a dynamic programming method to combine said frame-independent scores with model connectivity information derived from the search term thereby enabling said audio data to be searched only once and in sequence as a one pass search.
-
Specification