GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES
First Claim
1. A method comprising:
- receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps;
processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step;
for each subsequent time step of the plurality of time steps;
receiving the acoustic representation for the time step,receiving an output generated by the acoustic modeling neural network for a preceding time step,generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, andprocessing the modified input using the acoustic modeling neural network to generate an output for the time step; and
generating a phoneme representation for the utterance from the outputs for each of the time steps.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step; for each subsequent time step of the plurality of time steps; receiving the acoustic representation for the time step, receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers causes the one or more computers to perform operations comprising:
-
receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step; for each subsequent time step of the plurality of time steps; receiving the acoustic representation for the time step, receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
-
receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step; for each subsequent time step of the plurality of time steps; receiving the acoustic representation for the time step, receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps. - View Dependent Claims (18, 19, 20)
-
Specification