AUDIO CLASSIFICATION FOR INFORMATION RETRIEVAL USING SPARSE FEATURES

US 20100257129A1
Filed: 03/11/2010
Published: 10/07/2010
Est. Priority Date: 03/11/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

generating a collection of auditory images, each auditory image being generated from respective audio files according to an auditory model;

extracting sparse features from each auditory image in the collection to generate a sparse feature vector representing the corresponding audio file; and

ranking the audio files in response to a query including one or more words using the sparse feature vectors and a matching function relating sparse feature vectors to words in the query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, are provided for using audio features to classify audio for information retrieval. In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of generating a collection of auditory images, each auditory image being generated from respective audio files according to an auditory model; extracting sparse features from each auditory image in the collection to generate a sparse feature vector representing the corresponding audio file; and ranking the audio files in response to a query including one or more words using the sparse feature vectors and a matching function relating sparse feature vectors to words in the query.

55 Citations

View as Search Results

33 Claims

1. A computer-implemented method comprising:
- generating a collection of auditory images, each auditory image being generated from respective audio files according to an auditory model;
  
  extracting sparse features from each auditory image in the collection to generate a sparse feature vector representing the corresponding audio file; and
  
  ranking the audio files in response to a query including one or more words using the sparse feature vectors and a matching function relating sparse feature vectors to words in the query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, where extracting sparse features from each auditory image comprises:
    - dividing an auditory image into multiple sub-images;
      
      applying a feature extractor to each sub-image to generate corresponding local sparse codes; and
      
      combining the sparse codes from each sub-image to form a sparse vector for the auditory image.
  - 3. The method of claim 1, where the matching function is generated using a training collection of annotated audio files, and where generating the matching function includes:
    - receiving the collection of annotated audio files, each annotated audio file having an auditory image and one or more keywords associated with the content of the audio file;
      
      generating a sparse feature vector for each audio file in the collection; and
      
      training the matching function using the sparse feature vectors and the one or more keywords for the collection of annotated audio files to determine a matrix of weights matching sparse features and keywords.
  - 4. The method of claim 1, further comprising:
    - training the matching function using a passive-aggressive model using extracted audio features.
  - 5. The method of claim 4, where the training learns a matrix W representing a mapping between spars features and keywords such that F_W(q_k, a_k⁺)>
    - F_W(q_k, a_k⁻) for all k.
  - 6. The method of claim 1, where ranking the audio files further comprises:
    - scoring each query word relative to each sparse feature vector and combining the scores across words to rank audio files relative to the query.
  - 7. The method of claim 6, where scoring each query word includes calculating a dot product between a set of weights for that word and a representation of the audio file with a particular sparse feature vector.
  - 8. The method of claim 1, where the auditory model is a cochlear model that mimics the behavior of a cochlea.
  - 9. The method of claim 1, where the auditory image is a stabilized auditory image.
  - 10. The method of claim 1, where the auditory image is a an auditory correllogram.

11. A computer-implemented method comprising:
- receiving a text query, the query including one or more query terms;
  
  retrieving a matching function that relates keywords and sparse feature vectors, each sparse feature vector being derived from a particular audio file;
  
  identifying one or more keywords from the query terms;
  
  identifying one or more audio files responsive to the query using the matching function; and
  
  presenting search results identifying the one or more audio files.

12. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
- generating a collection of auditory images, each auditory image being generated from respective audio files according to an auditory model;
  
  extracting sparse features from each auditory image in the collection to generate a sparse feature vector representing the corresponding audio file; and
  
  ranking the audio files in response to a query including one or more words using the sparse feature vectors and a matching function relating sparse feature vectors to words in the query.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The computer storage medium of claim 12, where extracting sparse features from each auditory image comprises:
    - dividing an auditory image into multiple sub-images;
      
      applying a feature extractor to each sub-image to generate corresponding local sparse codes; and
      
      combining the sparse codes from each sub-image to form a sparse vector for the auditory image.
  - 14. The computer storage medium of claim 12, where the matching function is generated using a training collection of annotated audio files, and where generating the matching function includes:
    - receiving the collection of annotated audio files, each annotated audio file having an auditory image and one or more keywords associated with the content of the audio file;
      
      generating a sparse feature vector for each audio file in the collection; and
      
      training the matching function using the sparse feature vectors and the one or more keywords for the collection of annotated audio files to determine a matrix of weights matching sparse features and keywords.
  - 15. The computer storage medium of claim 12, further comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations including:
    - training the matching function using a passive-aggressive model using extracted audio features.
  - 16. The computer storage medium of claim 15, where the training learns a matrix W representing a mapping between spars features and keywords such that F_W(q_k, a_k⁺)>
    - F_W(q_k, a_k⁻) for all k.
  - 17. The computer storage medium of claim 12, where ranking the audio files further comprises:
    - scoring each query word relative to each sparse feature vector and combining the scores across words to rank audio files relative to the query.
  - 18. The computer storage medium of claim 17, where scoring each query word includes calculating a dot product between a set of weights for that word and a representation of the audio file with a particular sparse feature vector.
  - 19. The computer storage medium of claim 12, where the auditory model is a cochlear model that mimics the behavior of a cochlea.
  - 20. The computer storage medium of claim 12, where the auditory image is a stabilized auditory image.
  - 21. The computer storage medium of claim 12, where the auditory image is a an auditory correllogram.

22. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
- receiving a text query, the query including one or more query terms;
  
  retrieving a matching function that relates keywords and sparse feature vectors, each sparse feature vector being derived from a particular audio file;
  
  identifying one or more keywords from the query terms;
  
  identifying one or more audio files responsive to the query using the matching function; and
  
  presenting search results identifying the one or more audio files.

23. A system comprising:
- one or more computers configured to perform operations including;
  
  generating a collection of auditory images, each auditory image being generated from respective audio files according to an auditory model;
  
  extracting sparse features from each auditory image in the collection to generate a sparse feature vector representing the corresponding audio file; and
  
  ranking the audio files in response to a query including one or more words using the sparse feature vectors and a matching function relating sparse feature vectors to words in the query.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 24. The system of claim 23, where extracting sparse features from each auditory image comprises:
    - dividing an auditory image into multiple sub-images;
      
      applying a feature extractor to each sub-image to generate corresponding local sparse codes; and
      
      combining the sparse codes from each sub-image to form a sparse vector for the auditory image.
  - 25. The system of claim 23, where the matching function is generated using a training collection of annotated audio files, and where generating the matching function includes:
    - receiving the collection of annotated audio files, each annotated audio file having an auditory image and one or more keywords associated with the content of the audio file;
      
      generating a sparse feature vector for each audio file in the collection; and
      
      training the matching function using the sparse feature vectors and the one or more keywords for the collection of annotated audio files to determine a matrix of weights matching sparse features and keywords.
  - 26. The system of claim 23, further configured to perform operations comprising:
    - training the matching function using a passive-aggressive model using extracted audio features.
  - 27. The system of claim 26, where the training learns a matrix W representing a mapping between spars features and keywords such that F_W(q_k, a_k⁺)>
    - F_W(q_k, a_k⁻) for all k.
  - 28. The system of claim 23, where ranking the audio files further comprises:
    - scoring each query word relative to each sparse feature vector and combining the scores across words to rank audio files relative to the query.
  - 29. The system of claim 28, where scoring each query word includes calculating a dot product between a set of weights for that word and a representation of the audio file with a particular sparse feature vector.
  - 30. The system of claim 23, where the auditory model is a cochlear model that mimics the behavior of a cochlea.
  - 31. The system of claim 23, where the auditory image is a stabilized auditory image.
  - 32. The system of claim 23, where the auditory image is a an auditory correllogram.

33. A system comprising:
- one or more computers configured to perform operations including;
  
  receiving a text query, the query including one or more query terms;
  
  retrieving a matching function that relates keywords and sparse feature vectors, each sparse feature vector being derived from a particular audio file;
  
  identifying one or more keywords from the query terms;
  
  identifying one or more audio files responsive to the query using the matching function; and
  
  presenting search results identifying the one or more audio files.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Chechik, Gal, Walters, Thomas, Rehn, Martin, Lyon, Robert F., Bengio, Samy

Granted Patent

US 8,463,719 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

G06F 16/683 using metadata automaticall...

G10L 25/48 specially adapted for parti...

AUDIO CLASSIFICATION FOR INFORMATION RETRIEVAL USING SPARSE FEATURES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

AUDIO CLASSIFICATION FOR INFORMATION RETRIEVAL USING SPARSE FEATURES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links