Multimedia data management by speech recognizer annotation
First Claim
1. A method for multimedia data management, comprising the steps of:
- generating a recognition result for a speech-annotation feature through an automatic speech recognizer;
obtaining n-best syllable candidates of said speech-annotation feature from said recognition result, said n being a nature number;
establishing a confusion matrix for said n-best syllable candidates, said confusion matrix comprising a matrix of similarity scores, each similarity score measuring similarity of each syllable candidate against one of said n-best syllable candidates;
transforming said confusion matrix into an image with each pixel of said image having a gray scale representing the similarity measured by a corresponding similarity score;
constructing an index database by capturing eigen-images from said image using an eigen-image processing method; and
searching for said multimedia data using said index database with natural speech as input.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for multimedia data management are disclosed. The method provides an indexing and retrieval scheme for digital photos with speech annotations based on image-like patterns transformed from the recognized syllable candidates. For annotated spoken content, the recognized n-best syllable candidates are transformed into a sequence of syllable-transformed patterns. Eigen-image analysis is further adopted to extract the significant information to reduce the dimensionality. Vector quantization is applied to quantize the syllable-transformed patterns into feature vectors for indexing. The invention of indexing scheme reduces the dimensionality and noise of data, and achieves better performance of 16.26% for speech annotated photo retrieval.
34 Citations
9 Claims
-
1. A method for multimedia data management, comprising the steps of:
-
generating a recognition result for a speech-annotation feature through an automatic speech recognizer; obtaining n-best syllable candidates of said speech-annotation feature from said recognition result, said n being a nature number; establishing a confusion matrix for said n-best syllable candidates, said confusion matrix comprising a matrix of similarity scores, each similarity score measuring similarity of each syllable candidate against one of said n-best syllable candidates; transforming said confusion matrix into an image with each pixel of said image having a gray scale representing the similarity measured by a corresponding similarity score; constructing an index database by capturing eigen-images from said image using an eigen-image processing method; and searching for said multimedia data using said index database with natural speech as input. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for multimedia data management, comprising:
-
a multimedia data index production module having; an automatic speech recognizer for generating a recognition result from a speech-annotation feature, said recognition result including n-best syllable candidates of said speech-annotation feature, said n being a natural number; an image simulation module for establishing a confusion matrix for said n-best syllable candidates and transforming said confusion matrix into an image, said confusion matrix comprising a matrix of similarity scores, each similarity score measuring similarity of each syllable candidate against one of said n-best syllable candidates, and each pixel of said image having a gray scale representing the similarity measured by a corresponding similarity score; and an eigen-image capture and index construction module for constructing an index database by capturing eigen-images from said image using an eigen-image processing method; and a multimedia data index searching module, for searching for said multimedia data using said index database with natural speech as input. - View Dependent Claims (8, 9)
-
Specification