Method and system for efficient spoken term detection using confusion networks
First Claim
Patent Images
1. A method for spoken term detection, comprising:
- receiving phone level out-of-vocabulary (OOV) keyword queries;
converting the phone level OOV keyword queries to words;
generating a confusion network (CN) based keyword searching (KWS) index; and
using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries;
wherein converting the phone level OOV keyword queries to words comprises;
converting the phone level OOV keyword queries to phonetic finite state acceptors, wherein phone sequences for IV terms are looked up in a recognition lexicon and phone sequences for OOV terms are generated with a grapheme-to-phoneme model;
expanding the phone level OOV keyword queries through composition with a weighted finite state transducer (WFST) that models probabilities of confusions between different phones;
extracting N-best hypotheses represented by each expanded WFST; and
mapping back the N-best hypotheses to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences; and
wherein the receiving, converting, generating and using steps are performed by a computer system comprising a memory and at least one processor coupled to the memory.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.
12 Citations
20 Claims
-
1. A method for spoken term detection, comprising:
-
receiving phone level out-of-vocabulary (OOV) keyword queries; converting the phone level OOV keyword queries to words; generating a confusion network (CN) based keyword searching (KWS) index; and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries; wherein converting the phone level OOV keyword queries to words comprises; converting the phone level OOV keyword queries to phonetic finite state acceptors, wherein phone sequences for IV terms are looked up in a recognition lexicon and phone sequences for OOV terms are generated with a grapheme-to-phoneme model; expanding the phone level OOV keyword queries through composition with a weighted finite state transducer (WFST) that models probabilities of confusions between different phones; extracting N-best hypotheses represented by each expanded WFST; and mapping back the N-best hypotheses to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences; and wherein the receiving, converting, generating and using steps are performed by a computer system comprising a memory and at least one processor coupled to the memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for spoken term detection, comprising:
-
a query module capable of receiving phone level out-of-vocabulary (OOV) keyword queries; a mapping module capable of; converting the phone level OOV keyword queries to words; converting the phone level OOV keyword queries to phonetic finite state acceptors, wherein phone sequences for IV terms are looked up in a recognition lexicon and phone sequences for OOV terms are generated with a grapheme-to-phoneme model; expanding the phone level OOV keyword queries through composition with a weighted finite state transducer (WFST) that models probabilities of confusions between different phones; extracting N-best hypotheses represented by each expanded WFST; and mapping back the N-best hypotheses to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences; an indexing module capable of generating a confusion network (CN) based keyword searching (KWS) index; and a search module capable of using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries; wherein the query module, the mapping module, the indexing module, and the search module are implemented in at least one processor device coupled to a memory. - View Dependent Claims (13)
-
-
14. A computer program product for spoken term detection, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising:
-
receiving phone level out-of-vocabulary (OOV) keyword queries; converting the phone level OOV keyword queries to words; generating a confusion network (CN) based keyword searching (KWS) index; and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries; wherein converting the phone level OOV keyword queries to words comprises; converting the phone level OOV keyword queries to phonetic finite state acceptors, wherein phone sequences for IV terms are looked up in a recognition lexicon and phone sequences for OOV terms are generated with a grapheme-to-phoneme model; expanding the phone level OOV keyword queries through composition with a weighted finite state transducer (WFST) that models probabilities of confusions between different phones; extracting N-best hypotheses represented by each expanded WFST; and mapping back the N-best hypotheses to a set of N or fewer word sequences through composition with a finite state transducer that maps from phone sequences to word sequences. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification