Concept based cross media indexing and retrieval of speech documents
First Claim
1. A method of cross media indexing, registering and retrieving speech documents comprising the steps of:
- registering a set of training documents;
pre-processing each training document;
constructing a terms-phonemes/document matrix from the training document metadata where a row is created for term and each phoneme in the training documents and a column is created for each training document;
normalizing entries in the terms-phonemes/document matrix;
computing a concept vector space from the training documents by computing from the terms-phonemes/document matrix;
computing vectors for new documents and adding the vectors to the vector space;
searching the computed vector space for vectors that are close to a vector computed for a query term or phoneme; and
providing a list of those speech and/or text documents with the highest values.
5 Assignments
0 Petitions
Accused Products
Abstract
Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.
-
Citations
13 Claims
-
1. A method of cross media indexing, registering and retrieving speech documents comprising the steps of:
-
registering a set of training documents;
pre-processing each training document;
constructing a terms-phonemes/document matrix from the training document metadata where a row is created for term and each phoneme in the training documents and a column is created for each training document;
normalizing entries in the terms-phonemes/document matrix;
computing a concept vector space from the training documents by computing from the terms-phonemes/document matrix;
computing vectors for new documents and adding the vectors to the vector space;
searching the computed vector space for vectors that are close to a vector computed for a query term or phoneme; and
providing a list of those speech and/or text documents with the highest values. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for cross media indexing, registering and retrieving speech documents comprising the steps of:
-
document collection means for registering a set of training documents, preparing the set of training documents for cataloging; and
indexing the set of training documents, including document terms and phonemes;
pre-processor for pre-processing each training document and computing vectors forming a concept-vector space from the training documents by computing vectors from the set of training documents;
terms-phonemes/document matrix constructed from the training document metadata where a row is created for each term and each phoneme in the training documents and a column is created for each training document, and entries are normalized in the terms-phonemes/document matrix;
singular value decomposition means for computing a vector space from the terms-phonemes/document matrix;
said pre-processor also pre-processing each new document and computing vectors from the new documents and adding the vectors to the vector space; and
query engine for searching the computed vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
providing a list of those textual and/or speech documents with the highest values. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification