Vocabulary independent speech decoder system and method using subword units
First Claim
1. A method for detecting an input sequence of input words in a spoken input, comprising computer implemented steps of:
- generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;
expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and
determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system provides a subword decoder and a dictionary lookup to process a spoken input. In a first stage of processing, the subword decoder decodes the speech input based on subword units or particles and identifies hypothesized subword sequences using a particle dictionary and particle language model, but independently of a word dictionary or word vocabulary. Further stages of processing involve a particle to word graph expander and a word decoder. The particle to word graph expander expands the subword representation produced by the subword decoder into a word graph of word candidates using a word dictionary. The word decoder uses the word dictionary and a word language model to determine a best sequence of word candidates from the word graph that is most likely to match the words of the spoken input.
94 Citations
15 Claims
-
1. A method for detecting an input sequence of input words in a spoken input, comprising computer implemented steps of:
-
generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;
expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and
determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech detection system for detecting an input sequence of input words in a spoken input, the system comprising:
-
a subword decoder for generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input; and
a dictionary lookup module for expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences, the dictionary lookup determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising:
-
a computer usable medium for detecting an input sequence of input words in a spoken input; and
a set of computer program instructions embodied on the computer usable medium, including instructions to;
generate a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;
expand the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and
determine a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.
-
Specification