Methods and apparatus relating to searching of spoken audio data

US 8,694,317 B2
Filed: 02/06/2006
Issued: 04/08/2014
Est. Priority Date: 02/05/2005
Status: Active Grant

First Claim

Patent Images

1. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:

analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data; and

determining, for each of said frames and independently of the others of said frames, a score for each of a set of all reference phones of a language or dialect based on one or more features of said set of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to said phone, wherein for each of said frames, the independently-determined score indicates a probability of each phone from the set of reference phones appearing in the audio data;

generating index data for each of said frames corresponding to said independently-determined scores for each of said set of reference phones;

forming said index data into a data stream directed to a search engine, said engine uses a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be indexed and searched only once and in sequence as a one pass search; and

presenting search results from the audio data in response to the user query.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.

32 Citations

View as Search Results

16 Claims

1. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:
- analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data; and
  
  determining, for each of said frames and independently of the others of said frames, a score for each of a set of all reference phones of a language or dialect based on one or more features of said set of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to said phone, wherein for each of said frames, the independently-determined score indicates a probability of each phone from the set of reference phones appearing in the audio data;
  
  generating index data for each of said frames corresponding to said independently-determined scores for each of said set of reference phones;
  
  forming said index data into a data stream directed to a search engine, said engine uses a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be indexed and searched only once and in sequence as a one pass search; and
  
  presenting search results from the audio data in response to the user query.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method according to claim 1 wherein the phonetic recognizer determines an interim score for each phone in a particular audio frame and modifies the interim score for each phone based on the scores for the phones determined for one or more of the audio frames immediately preceding and/or following said particular frame.
  - 3. A method according to claim 1 wherein the method includes the step of storing the data in a searchable data output store.
  - 4. A method according to claim 3 wherein the stored data contains information about the relative position of at least some of the audio frames in the audio data.
  - 5. A method according to claim 3 wherein the method includes the step of processing the audio to calculate features relevant to speaker identification and storing these features in the searchable data output store.
  - 6. A method according to claim 5 wherein the method includes the step of storing time references along with the features relevant to speaker identification in the searchable data file.

7. A method of indexing and searching both live audio data and prerecorded audio data, said method comprising the steps of:
- analyzing the audio data with a phonetic recognizer wherein the phonetic recognizer acts on frames of the audio data and determining, for each of said frames and independently of the others of said frames, a score for each of a plurality of reference phones that comprise a complete set of phones of a particular language or dialect, based on one or more features corresponding to said plurality of reference phones, the independently-determined score indicating the likelihood that said frame corresponds to a specific phone of said reference phones and indicating a probability of each phone appearing in the audio data, wherein a searchable data file stores independently-determined scores for each said audio frame in a simple matrix format, and generating, for each of said audio frames, indexing data corresponding to the said independently-determined scores for a plurality of the phones;
  
  forming said index data into a data stream directed to a search engine, said engine using a dynamic programming method to combine said independently-determined scores with phone sequence information derived from a user inputted query, searched only once and in sequence as a one pass search; and
  
  presenting search results from the audio data in response to the user query.
- View Dependent Claims (8, 9)
- - 8. A method according to claim 7 wherein the set of reference phones is a preselected sub-set of a larger database of phones.
  - 9. A method according to claim 8 wherein the method comprises the step of using a language recogniser to identify the language/dialect being spoken in the audio data and select an appropriate sub-set of phones as the reference phone set.

10. A method of searching both live audio data and prerecorded audio data for a phonetic search sequence, said method comprising the steps of;
- (i) directing a data stream to a search engine, said data stream comprising index data for each of a plurality of audio frames and independently of the other frames, said index data corresponding to the likelihood of a match for a plurality of reference phones that comprise a complete set of phones of a particular language or dialect to a user inputted query for searching the audio data;
  
  ii) searching said data stream to find likely matches to a phonetic search sequence in response to the user inputted query, using a dynamic programming method wherein frame-independent scores for the reference phones contained in the data stream, based on one or more features of the reference phones, for each audio frame are used to determine the likely matches using the indexed data searched only once and in sequence as a one pass search, wherein for each audio frame, each frame independent score indicates a probability of each phone from the complete set of reference phones appearing in the input data and presenting search results from the audio data in response to the user query.
- View Dependent Claims (11, 12, 13, 14)
- - 11. A method as claimed in claim 10 comprising the step of determining at least one phonetic search sequence from a defined search term.
  - 12. A method as claimed in claim 11 wherein the method comprises the step of converting a text search term into one or more phonetic search sequences using a processor.
  - 13. A method as claimed in claim 12 wherein the processor uses letter-to-sound trees and/or phonetic dictionaries to create the one or more search phonetic sequences.
  - 14. A method as claimed in claim 11 wherein the search term is supplied as audio data and wherein the method comprises the step of using a phonetic recogniser/ speech recogniser to determine the phonetic search sequence.

15. An apparatus for acting on live audio data and prerecorded audio data to create a searchable data file comprising:
- a complete reference set of phones of a particular language or dialect having one or more features corresponding thereto;
  
  a phonetic recognizer, implemented by a processor, adapted to compare a frame of audio data with the reference set of phones based on said one or more features and to output a score indicative of the likelihood that said frame corresponds to each phone for each of said frames and independently of the other said frames, wherein each score indicates a probability of each phone appearing in the audio data;
  
  a data output store for creating a searchable data file comprising, for each audio frame, the frame-independent score for each of the set of reference phones, said data output store directing the searchable data file to a search engine, said engine using a dynamic programming method to combine said frame-independent scores with phone sequence information derived from a user inputted query for searching the audio data, thereby enabling said audio data to be searched only once and in sequence as a one pass search; and
  
  a display for presenting search results from the audio data in response to the user query.

16. An apparatus for acting on live audio data and prerecorded audio data to create a searchable data file comprising:
- a reference set of phones having one or more features corresponding thereto;
  
  a phonetic recogniser adapted to compare a frame of audio data with the reference set of phones based on said one or more features and to output a score indicative of the likelihood that said frame corresponds to each phone for each of said frames and independently of the other said frames, wherein each score indicates a probability of each phone appearing in the audio data; and
  
  a data output store for creating a searchable data file comprising, for each audio frame, the frame-independent score for each of the set of reference phones, said data output store directing the searchable data file to a search engine, said engine using a dynamic programming method to combine said frame-independent scores with model connectivity information derived from the search term thereby enabling said audio data to be searched only once and in sequence as a one pass search.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Aurix Ltd. (Avaya Incorporated), Avaya LLC (Avaya Incorporated), Avaya Management L.P. (Avaya Incorporated)
Original Assignee
Aurix Ltd. (Avaya Incorporated)
Inventors
Skilling, Adrian I, Wright, Howard A K
Primary Examiner(s)
Shah, Paras D

Application Number

US11/347,313
Publication Number

US 20060206324A1
Time in Patent Office

2,983 Days
Field of Search

704/243, 704231-2568
US Class Current

704/254
CPC Class Codes

G06F 16/68   Retrieval characterised by ...

G06F 16/685   using automatically derived...

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

Methods and apparatus relating to searching of spoken audio data

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Methods and apparatus relating to searching of spoken audio data

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others