System and method of lattice-based search for spoken utterance retrieval
First Claim
Patent Images
1. A method of retrieving a spoken document, the method being performed by a computing device, the method comprising:
- converting via a processor high word error rate speech from the spoken document into a lattice representation related to a recognition network represented as weighted finite state machines, the spoken document generated from a telephone call;
indexing via the processor the lattice representation of speech based on phones;
upon receiving a query from a user;
converting each query word into phone strings based on the query word pronunciation;
searching the phone-based index of the lattice representation of speech for each phone string of the phone strings having a minimum pronunciation length;
searching the indexed lattice representation of speech; and
returning audio segments from the spoken document that match the user query.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for retrieving audio segments from a spoken document. The spoken document preferably is one having moderate word error rates such as telephone calls or teleconferences. The method comprises converting speech associated with a spoken document into a lattice representation and indexing the lattice representation of speech. These steps are performed typically off-line. Upon receiving a query from a user, the method further comprises searching the indexed lattice representation of speech and returning retrieved audio segments from the spoken document that match the user query.
317 Citations
23 Claims
-
1. A method of retrieving a spoken document, the method being performed by a computing device, the method comprising:
-
converting via a processor high word error rate speech from the spoken document into a lattice representation related to a recognition network represented as weighted finite state machines, the spoken document generated from a telephone call; indexing via the processor the lattice representation of speech based on phones; upon receiving a query from a user; converting each query word into phone strings based on the query word pronunciation; searching the phone-based index of the lattice representation of speech for each phone string of the phone strings having a minimum pronunciation length; searching the indexed lattice representation of speech; and returning audio segments from the spoken document that match the user query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of retrieving a spoken document wherein a word index and a sub-word index related to the spoken document exist, and are related to a recognition network represented as weighted finite state machines, wherein the word index and sub-word index are generated based on high word error rate speech from the spoken document, the method comprising, upon receiving a query from a user:
-
converting via a processor each query word into phone strings based on the query word pronunciation, the spoken document generated from a telephone call; searching via the processor the phone-based index of lattice representation of speech for each phone string of the phone string having a minimum pronunciation length; searching the word index based on the user query; searching the sub-word index based on the user query; and combining the results to retrieve the audio segments from the spoken document that match the user query. - View Dependent Claims (19, 20)
-
-
21. A method of retrieving a spoken document wherein a word index and a sub-word index related to the spoken document exist and are related to a recognition network represented as weighted finite state machines, the method comprising, upon receiving a user query from a user:
-
converting via a processor each query word from the user into phone strings based on a query word pronunciation, the spoken document generated from a telephone call; searching the word index based on the user query if the user query is in-vocabulary; searching the sub-word index based on the user query if the user query for each phone string of the phone string having a minimum pronunciation length is out of vocabulary; and retrieving the spoken document based on one of the searched word index on the searched sub-word index.
-
-
22. A system for retrieving a spoken document, the system comprising:
-
a processor; a first module controlling the processor to convert high word error rate speech from the spoken document into a lattice representation related to a recognition network represented as weighted finite state machines, the spoken document generated from a telephone call; a second module controlling the processor to index the lattice representation of speech; upon receiving a query from a user; a third module controlling the processor to convert each query word into phone strings based on the query word pronunciation; a fourth module controlling the processor to search the phone-based index of lattice representation of speech for each phone string of the phone strings having a minimum pronunciation length; a fifth module controlling the processor to search the indexed lattice representation of speech; and a sixth module controlling the processor to return audio segments from the spoken document that match the user query.
-
-
23. A computer-readable storage medium storing a computer program having instructions for controlling a computing device to retrieve a spoken document, the instructions causing the computing device to perform a method comprising:
-
converting high word error rate speech from the spoken document into a lattice representation related to a recognition network represented as weighted finite state machines, the spoken document generated from a telephone call; indexing the lattice representation of speech; upon receiving a query from a user; converting each query word into phone strings based on the query word pronunciation; searching the phone-based index of lattice representation of speech for each phone string of the phone strings having a minimum pronunciation length; searching the indexed lattice representation of speech; and returning audio segments from the spoken document that match the user query.
-
Specification