Vocabulary independent speech decoder system and method using subword units

US 20030187643A1
Filed: 03/27/2002
Published: 10/02/2003
Est. Priority Date: 03/27/2002
Status: Active Grant

First Claim

Patent Images

1. A method for detecting an input sequence of input words in a spoken input, comprising computer implemented steps of:

generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;

expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and

determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system provides a subword decoder and a dictionary lookup to process a spoken input. In a first stage of processing, the subword decoder decodes the speech input based on subword units or particles and identifies hypothesized subword sequences using a particle dictionary and particle language model, but independently of a word dictionary or word vocabulary. Further stages of processing involve a particle to word graph expander and a word decoder. The particle to word graph expander expands the subword representation produced by the subword decoder into a word graph of word candidates using a word dictionary. The word decoder uses the word dictionary and a word language model to determine a best sequence of word candidates from the word graph that is most likely to match the words of the spoken input.

94 Citations

View as Search Results

15 Claims

1. A method for detecting an input sequence of input words in a spoken input, comprising computer implemented steps of:
- generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;
  
  expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and
  
  determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the step of generating the subword representation includes:
    - identifying the subword unit tokens based on the spoken input to produce the subword representation of the spoken input, and inserting end of word markers in the subword representation, each end of word marker indicating each terminating subword unit token that identifies an end of a hypothesized subword sequence.
  - 3. The method of claim 2, wherein the step of identifying the subword unit tokens includes determining the subword unit tokens based on a subword unit dictionary and a subword unit language model.
  - 4. The method of claim 3, wherein the subword unit language model is a statistical language model.
  - 5. The method of claim 1, wherein the step of expanding the subword representation into the word graph includes:
    - generating a sequence of phonemes by expanding the subword unit tokens in the subword representation, the sequence of phonemes including end of word delimiters, each end of word delimiter based on the respective end of word marker in the respective subword representation and each end of word delimiter indicating a termination of a word phoneme string within the sequence of phonemes; and
      
      expanding each word phoneme string into a list of phonetically similar word candidates based on a word vocabulary to form the word graph.
  - 6. The method of claim 1, wherein the step of determining the preferred sequence of word candidates includes decoding the word graph using a word decoder and a language model based on a word vocabulary.
  - 7. The method of claim 1, wherein the subword unit tokens are particles, each particle including at least one phoneme.

8. A speech detection system for detecting an input sequence of input words in a spoken input, the system comprising:
- a subword decoder for generating a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input; and
  
  a dictionary lookup module for expanding the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences, the dictionary lookup determining a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The speech detection system of claim 8, wherein the subword decoder:
    - identifies the subword unit tokens based on the spoken input to produce the subword representation of the spoken input, and inserts end of word markers in the subword representation, each end of word marker indicating each terminating subword unit token that identifies an end of each hypothesized subword sequence.
  - 10. The speech detection system of claim 9, wherein the subword decoder determines the subword unit tokens based on a subword unit dictionary and a subword unit language model.
  - 11. The speech detection system of claim 10, wherein the subword unit language model is a statistical language model.
  - 12. The speech detection system of claim 8, wherein the dictionary lookup module expands the subword representation into the word graph by:
    - generating a sequence of phonemes by expanding the subword unit tokens in the subword representation, the sequence of phonemes including end of word delimiters, each end of word delimiter based on the respective end of word marker in the respective subword representation and each end of word delimiter indicating a termination of a word phoneme string within the sequence of phonemes; and
      
      expanding each word phoneme string into a list of phonetically similar word candidates based on a word vocabulary to form the word graph.
  - 13. The speech detection system of claim 8, wherein the dictionary lookup module determines the preferred sequence of word candidates by decoding the word graph using a word decoder and a word language model based on a word vocabulary.
  - 14. The speech detection system of claim 8, wherein the subword unit tokens are particles, each particle including at least one phoneme.

15. A computer program product comprising:
- a computer usable medium for detecting an input sequence of input words in a spoken input; and
  
  a set of computer program instructions embodied on the computer usable medium, including instructions to;
  
  generate a subword representation of the spoken input, the subword representation including (i) subword unit tokens based on the spoken input and (ii) end of word markers that identify boundaries of hypothesized subword sequences that potentially match the input words in the spoken input;
  
  expand the subword representation into a word graph of word candidates for the input words in the spoken input, each word candidate being phonetically similar to one of the hypothesized subword sequences; and
  
  determine a preferred sequence of word candidates based on the word graph, the preferred sequence of word candidates representing a most likely match to the spoken sequence of the input words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Compaq Computer Corporation (HP Inc.)
Inventors
Whittaker, Edward, Van Thong, Jean-Manuel, Moreno, Pedro

Granted Patent

US 7,181,398 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 15/08 Speech classification or se...

Vocabulary independent speech decoder system and method using subword units

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

94 Citations

15 Claims

Specification

Use Cases

Quick Links

Others

Vocabulary independent speech decoder system and method using subword units

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

94 Citations

15 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others