System and method of lattice-based search for spoken utterance retrieval
First Claim
Patent Images
1. A method comprising:
- receiving a query from a user, the query comprising a query word;
retrieving, based on the query, a spoken document;
converting, via a processor, the query word into query word phoneme strings based on a query word pronunciation of the query word;
searching a phoneme-based index of a lattice representation of the spoken document for phoneme strings that correspond to the query word phoneme strings to yield search results, the phoneme-based index comprising an index for each arc label that records a lattice number, an input-state of each labeled arc, a probability mass leading to each state, a probability associated with each arc, and an index for a next state; and
returning audio segments from the spoken document that correspond to the query based on the search results.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for retrieving audio segments from a spoken document. The spoken document preferably is one having moderate word error rates such as telephone calls or teleconferences. The method comprises converting speech associated with a spoken document into a lattice representation and indexing the lattice representation of speech. These steps are performed typically off-line. Upon receiving a query from a user, the method further comprises searching the indexed lattice representation of speech and returning retrieved audio segments from the spoken document that match the user query.
31 Citations
20 Claims
-
1. A method comprising:
-
receiving a query from a user, the query comprising a query word; retrieving, based on the query, a spoken document; converting, via a processor, the query word into query word phoneme strings based on a query word pronunciation of the query word; searching a phoneme-based index of a lattice representation of the spoken document for phoneme strings that correspond to the query word phoneme strings to yield search results, the phoneme-based index comprising an index for each arc label that records a lattice number, an input-state of each labeled arc, a probability mass leading to each state, a probability associated with each arc, and an index for a next state; and returning audio segments from the spoken document that correspond to the query based on the search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving a query from a user, the query comprising a query word; retrieving, based on the query, a spoken document; converting the query word into phoneme strings based on a word pronunciation of the query word; searching a phoneme-based indexed lattice representation of the spoken document for phoneme strings that correspond to the phoneme strings to yield search results, the phoneme-based indexed lattice representation comprising an index for each arc label that records a lattice number, an input-state of each labeled arc, a probability mass leading to each state, a probability of each arc, and an index for a next state; and returning audio segments from the spoken document that match the query based on the search results. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving a query from a user, the query comprising a query word; retrieving, based on the query, a spoken document; converting the query word into phoneme strings based on a query word pronunciation of the query word; searching a phoneme-based indexed lattice representation of the spoken document for phoneme strings that correspond to the query word phoneme strings, to yield search results, the phoneme-based index comprising an index for each arc label that records a lattice number, an input-state of each labeled arc, a probability mass leading to each state, a probability of the arc itself, and an index for a next state; and returning audio segments from the spoken document that correspond to the query based on the search results. - View Dependent Claims (17, 18, 19, 20)
-
Specification