Method and System of Indexing Speech Data
First Claim
1. A method of indexing speech data, the method comprising:
- indexing word transcripts, including a timestamp for a word occurrence;
indexing sub-word transcripts, including a timestamp for a sub-word occurrence;
wherein a timestamp in the index indicates the time of occurrence of the word or sub-word in the speech data;
and wherein word and sub-word occurrences can be correlated using the timestamps.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word. The method of searching includes extracting the search terms from the phrase, retrieving a list of occurrence of words for an in-vocabulary search term from an index of words having timestamps, retrieving a list of occurrences of sub-words for an out-of-vocabulary search term from an index of sub-words having timestamps, and merging the retrieved lists of occurrences of words and sub-words according to their timestamps.
75 Citations
20 Claims
-
1. A method of indexing speech data, the method comprising:
-
indexing word transcripts, including a timestamp for a word occurrence; indexing sub-word transcripts, including a timestamp for a sub-word occurrence; wherein a timestamp in the index indicates the time of occurrence of the word or sub-word in the speech data; and wherein word and sub-word occurrences can be correlated using the timestamps. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of searching speech data including in-vocabulary and out-of-vocabulary words, comprising:
-
receiving a search query in the form of a phrase to be searched including at least one in-vocabulary word and at least one out-of-vocabulary word; extracting the search terms from the phrase; retrieving a list of occurrences of words for an in-vocabulary search term from an index of words having timestamps; retrieving a list of occurrences of sub-words for an out-of-vocabulary search term from an index of sub-words having timestamps; and merging the retrieved lists of occurrences of words and sub-words according to their timestamps. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system for searching speech transcripts from speech recognition systems, wherein a speech recognition system has a vocabulary of words, the system including an indexing system comprising:
-
indexing word transcripts, including a timestamp for a word occurrence; indexing sub-word transcripts, including a timestamp for a sub-word occurrence; wherein a timestamp in the index indicates the time of occurrence of the word or sub-word in the speech data; and wherein word and sub-word occurrences can be correlated using the timestamps. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer program product stored on a computer readable storage medium for indexing speech data, comprising computer readable program code means for performing the steps of:
-
indexing word transcripts, including a timestamp for a word occurrence; indexing sub-word transcripts, including a timestamp for a sub-word occurrence; wherein a timestamp in the index indicates the time of occurrence of the word or sub-word in the speech data; and wherein word and sub-word occurrences can be correlated using the timestamps.
-
Specification