Method and system of indexing speech data
First Claim
1. A computer-implemented method of searching speech data comprising in-vocabulary and out-of-vocabulary words, the method comprising, via a computer processor executing stored program instructions:
- Receiving, by the computer processor, a search query comprising a phrase comprising at least one in-vocabulary word and at least one out-of-vocabulary word;
extracting, by the computer processor, search terms from the phrase, the search terms comprising at least one in-vocabulary search term and at least one out-of-vocabulary search term;
retrieving, by the computer processor, a first list of occurrences of words for the at least one in-vocabulary search term, the first list retrieved from a first index of words having first timestamps;
retrieving, by the computer processor, a second list of occurrences of sub-words for the at least one out-of-vocabulary search term, the second list retrieved from a second index of sub-words having second timestamps; and
merging, by the computer processor, the first list of occurrences of words and the second list of occurrences of sub-words to create a merged list, wherein merging the first list and the second list comprises evaluating the first timestamps and the second timestamps and adding to the merged list occurrences of combinations of words from the first list and sub-words from the second list that satisfy at least one evaluation criterion, wherein the at least one evaluation criterion comprises a threshold for a difference between first and second timestamps.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word. The method of searching includes extracting the search terms from the phrase, retrieving a list of occurrence of words for an in-vocabulary search term from an index of words having timestamps, retrieving a list of occurrences of sub-words for an out-of-vocabulary search term from an index of sub-words having timestamps, and merging the retrieved lists of occurrences of words and sub-words according to their timestamps.
59 Citations
24 Claims
-
1. A computer-implemented method of searching speech data comprising in-vocabulary and out-of-vocabulary words, the method comprising, via a computer processor executing stored program instructions:
-
Receiving, by the computer processor, a search query comprising a phrase comprising at least one in-vocabulary word and at least one out-of-vocabulary word; extracting, by the computer processor, search terms from the phrase, the search terms comprising at least one in-vocabulary search term and at least one out-of-vocabulary search term; retrieving, by the computer processor, a first list of occurrences of words for the at least one in-vocabulary search term, the first list retrieved from a first index of words having first timestamps; retrieving, by the computer processor, a second list of occurrences of sub-words for the at least one out-of-vocabulary search term, the second list retrieved from a second index of sub-words having second timestamps; and merging, by the computer processor, the first list of occurrences of words and the second list of occurrences of sub-words to create a merged list, wherein merging the first list and the second list comprises evaluating the first timestamps and the second timestamps and adding to the merged list occurrences of combinations of words from the first list and sub-words from the second list that satisfy at least one evaluation criterion, wherein the at least one evaluation criterion comprises a threshold for a difference between first and second timestamps. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 20)
-
-
9. A system comprising:
at least one processor programmed to; receive a search query comprising a phrase comprising at least one in-vocabulary word and at least one out-of-vocabulary word; extract search terms from the phrase, the search terms comprising at least one in-vocabulary search term and at least one out-of-vocabulary search term; retrieve a first list of occurrences of words for the at least one in-vocabulary search term, the first list retrieved from a first index of words having first timestamps; retrieve a second list of occurrences of sub-words for the at least one out-of-vocabulary search term, the second list retrieved from a second index of sub-words having second timestamps; and merge the first list of occurrences of words and the second list of occurrences of sub-words to create a merged list, wherein merging the first list and the second list comprises evaluating the first timestamps and the second timestamps and adding to the merged list occurrences of combinations of words from the first list and sub-words from the second list that satisfy at least one evaluation criterion, wherein the at least one evaluation criterion comprises a threshold for a difference between first and second timestamps. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. At least one non-transitory computer-readable storage medium comprising
instructions that, when executed by at least one processor, execute a method comprising: -
receiving a search query comprising a phrase comprising at least one in-vocabulary word and at least one out-of-vocabulary word; extracting search terms from the phrase, the search terms comprising at least one in-vocabulary search term and at least one out-of-vocabulary search term; retrieving a first list of occurrences of words for the at least one in-vocabulary search term, the first list retrieved from a first index of words having first timestamps; retrieving a second list of occurrences of sub-words for the at least one out-of-vocabulary search term, the second list retrieved from a second index of sub-words having second timestamps; and merging the first list of occurrences of words and the second list of occurrences of sub-words to create a merged list, wherein merging the first list and the second list comprises evaluating the first timestamps and the second timestamps and adding to the merged list occurrences of combinations of words from the first list and sub-words from the second list that satisfy at least one evaluation criterion, wherein the at least one evaluation criterion comprises a threshold for a difference between first and second timestamps. - View Dependent Claims (18, 19, 21, 22, 23, 24)
-
Specification