GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES

US 20150170640A1
Filed: 12/03/2014
Published: 06/18/2015
Est. Priority Date: 12/17/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps;

processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step;

for each subsequent time step of the plurality of time steps;

receiving the acoustic representation for the time step,receiving an output generated by the acoustic modeling neural network for a preceding time step,generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, andprocessing the modified input using the acoustic modeling neural network to generate an output for the time step; and

generating a phoneme representation for the utterance from the outputs for each of the time steps.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Citations

20 Claims

1. A method comprising:
- receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps;
  
  processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step;
  
  for each subsequent time step of the plurality of time steps;
  
  receiving the acoustic representation for the time step,receiving an output generated by the acoustic modeling neural network for a preceding time step,generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, andprocessing the modified input using the acoustic modeling neural network to generate an output for the time step; and
  
  generating a phoneme representation for the utterance from the outputs for each of the time steps.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the acoustic modeling neural network is a feed-forward neural network.
  - 3. The method of claim 1, wherein the acoustic modeling neural network is a recurrent neural network.
  - 4. The method of claim 3, wherein the acoustic modeling neural network is a long short-term memory (LSTM) neural network.
  - 5. The method of claim 1, wherein the output generated by the acoustic modeling neural network for each time step is a set of scores for a set of phonemes or phoneme subdivisions, wherein the score for each phoneme or phoneme subdivision represents a likelihood that the phoneme or phoneme subdivision is a representation of the utterance at the time step.
  - 6. The method of claim 5, wherein generating the modified input comprises appending the set of scores for the preceding time step to the acoustic feature representation for the time step.
  - 7. The method of claim 5, wherein generating the modified input comprises appending data identifying a highest-scoring phoneme or phoneme subdivision according to the set of scores for the preceding time step to the acoustic feature representation for the time step.
  - 8. The method of claim 5, wherein the set of scores defines a probability distribution over a set of Hidden Markov Model (HMM) states.

9. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers causes the one or more computers to perform operations comprising:
- receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps;
  
  processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step;
  
  for each subsequent time step of the plurality of time steps;
  
  receiving the acoustic representation for the time step,receiving an output generated by the acoustic modeling neural network for a preceding time step,generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, andprocessing the modified input using the acoustic modeling neural network to generate an output for the time step; and
  
  generating a phoneme representation for the utterance from the outputs for each of the time steps.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the acoustic modeling neural network is a feed-forward neural network.
  - 11. The system of claim 9, wherein the acoustic modeling neural network is a recurrent neural network.
  - 12. The system of claim 11, wherein the acoustic modeling neural network is a long short-term memory (LSTM) neural network.
  - 13. The system of claim 9, wherein the output generated by the acoustic modeling neural network for each time step is a set of scores for a set of phonemes or phoneme subdivisions, wherein the score for each phoneme or phoneme subdivision represents a likelihood that the phoneme or phoneme subdivision is a representation of the utterance at the time step.
  - 14. The system of claim 13, wherein generating the modified input comprises appending the set of scores for the preceding time step to the acoustic feature representation for the time step.
  - 15. The system of claim 13, wherein generating the modified input comprises appending data identifying a highest-scoring phoneme or phoneme subdivision according to the set of scores for the preceding time step to the acoustic feature representation for the time step.
  - 16. The system of claim 13, wherein the set of scores defines a probability distribution over a set of Hidden Markov Model (HMM) states.

17. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
- receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps;
  
  processing the acoustic feature representation at an initial time step using an acoustic modeling neural network to generate an output for the initial time step;
  
  for each subsequent time step of the plurality of time steps;
  
  receiving the acoustic representation for the time step,receiving an output generated by the acoustic modeling neural network for a preceding time step,generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, andprocessing the modified input using the acoustic modeling neural network to generate an output for the time step; and
  
  generating a phoneme representation for the utterance from the outputs for each of the time steps.
- View Dependent Claims (18, 19, 20)
- - 18. The computer storage medium of claim 17, wherein the output generated by the acoustic modeling neural network for each time step is a set of scores for a set of phonemes or phoneme subdivisions, wherein the score for each phoneme or phoneme subdivision represents a likelihood that the phoneme or phoneme subdivision is a representation of the utterance at the time step.
  - 19. The computer storage medium of claim 18, wherein generating the modified input comprises appending the set of scores for the preceding time step to the acoustic feature representation for the time step.
  - 20. The computer storage medium of claim 18, wherein generating the modified input comprises appending data identifying a highest-scoring phoneme or phoneme subdivision according to the set of scores for the preceding time step to the acoustic feature representation for the time step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sak, Hasim, Senior, Andrew W.

Granted Patent

US 9,721,562 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/16   using artificial neural net...

G10L 2015/025   Phonemes, fenemes or fenone...

GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links