Concept based cross media indexing and retrieval of speech documents
First Claim
1. A method of cross media indexing, registering and retrieving speech documents, the method comprising:
- a computing device pre-processing a set of training documents, including at least creating training document metadata;
the computing device constructing a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document;
the computing device normalizing entries in the terms-phonemes/document matrix;
the computing device computing a vector space from the training documents by computing from the terms-phonemes/document matrix and storing the vector space in a catalog; and
the computing device computing vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors.
5 Assignments
0 Petitions
Accused Products
Abstract
Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.
16 Citations
32 Claims
-
1. A method of cross media indexing, registering and retrieving speech documents, the method comprising:
-
a computing device pre-processing a set of training documents, including at least creating training document metadata; the computing device constructing a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document; the computing device normalizing entries in the terms-phonemes/document matrix; the computing device computing a vector space from the training documents by computing from the terms-phonemes/document matrix and storing the vector space in a catalog; and the computing device computing vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus comprising:
-
a pre-processor configured to register a set of training documents, including creating metadata comprising at least document terms and phonemes counts; a computing device configured to; compute a terms-phonemes/document matrix from the metadata; normalize the terms-phonemes/document matrix; and compute a vector space from the normalized terms-phonemes/ document matrix; the pre-processor further configured to compute vectors for new documents and to add the vectors to the vector space without computing a new vector space in response to adding the vectors to the vector space; a query engine configured to search the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and an interface configured to provide a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A tangible computer readable medium having instructions stored thereon, the instructions configured to cause a computing device to:
-
pre-process a set of training documents, including at least creating training document metadata; construct a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and a column is created for each training document; normalize entries in the terms-phonemes/document matrix; compute a vector space from the training documents by computing from the terms-phonemes/document matrix; and compute vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
a memory; a processor device in communication with the memory and configured, in combination with the memory, to; pre-process a set of training documents, including at least creating training document metadata; construct a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document; normalize entries in the terms-phonemes/document matrix; compute a vector space from the training documents by computing from the terms-phonemes/document matrix; and compute vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors. - View Dependent Claims (23, 24, 25)
-
-
26. A method comprising:
-
a computing device pre-processing a set of training documents, including at least tokenizing the set of training documents to create counts for index terms and phonemes; the computing device constructing a terms-phonemes/document matrix from the counts; the computing device normalizing entries in the terms-phonemes/document matrix; the computing device computing a vector space from the normalized terms-phonemes/document matrix; the computing device computing vectors for new documents and adding the vectors to the vector space. - View Dependent Claims (27, 28, 29, 30)
-
-
31. An apparatus for cross media indexing, registering, and retrieving speech documents, the apparatus comprising:
-
an interface configured to receive a set of documents; a pre-processor configured to transcribe the set of documents and tokenize the transcriptions to create counts for index terms and phonemes; a computing device configured to compute a term-phoneme/document matrix, normalize the term-phoneme/document matrix, and compute a vector space from the normalized term-phoneme/document matrix; a database configured to store the vector space and the counts; a query engine configured to search the vector space for documents closest to a query vector. - View Dependent Claims (32)
-
Specification