CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK

US 20150127327A1
Filed: 05/20/2014
Published: 05/07/2015
Est. Priority Date: 11/01/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving an audio signal encoding a portion of an utterance;

providing, to a first neural network, data corresponding to the audio signal; and

generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network,wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

63 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- receiving an audio signal encoding a portion of an utterance;
  
  providing, to a first neural network, data corresponding to the audio signal; and
  
  generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network,wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1,wherein the second neural network is trained using vectors of acoustic data representing features of utterances.
  - 3. The method of claim 2, wherein the features include one or more phonemes.
  - 4. The method of claim 2, wherein training the second neural network comprises assigning labels representing the context-independent states to the vectors of acoustic data.
  - 5. The method of claim 1, wherein the multiple context-dependent states are derived from the plurality of context-independent states using one or more decision trees.
  - 6. The method of claim 5, wherein at a given parent node of the one or more decision trees, a training frame corresponding to a context independent state is assigned to one of a plurality of context dependent child nodes.
  - 7. The method of claim 1, wherein the multiple context-dependent states are derived from the plurality of context-independent states using divisive, likelihood-based K-means clustering.

8. A system comprisinga speech recognition engine comprising a processor, the speech recognition engine configured to:
- receive an audio signal encoding a portion of an utterance; and
  
  generate data representing a transcription for the utterance based on an output of a first neural network, which accepts as input, data corresponding to the audio signal,wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8,wherein the second neural network is trained using vectors of acoustic data representing features of utterances.
  - 10. The system of claim 9, wherein the features include one or more phonemes.
  - 11. The system of claim 9, wherein the second neural network is trained by assigning labels representing the context-independent states to the vectors of acoustic data.
  - 12. The system of claim 8, wherein the multiple context-dependent states are derived from the plurality of context-independent states using one or more decision trees.
  - 13. The system of claim 12, wherein at a given parent node of the one or more decision trees, a training frame corresponding to a context independent state is assigned to one of a plurality of context dependent child nodes.
  - 14. The system of claim 8, wherein the multiple context-dependent states are derived from the plurality of context-independent states using divisive, likelihood-based K-means clustering.

15. A non-transitory computer readable storage device encoding one or more computer readable instructions, which upon execution by one or more processors cause operations comprising:
- receiving an audio signal encoding a portion of an utterance;
  
  providing, to a first neural network, data corresponding to the audio signal; and
  
  generating data representing a transcription for the utterance based on an output of the first neural network,wherein the first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer readable storage device of claim 15,wherein the second neural network is trained using vectors of acoustic data representing features of utterances.
  - 17. The computer readable storage device of claim 16, wherein the features include one or more phonemes.
  - 18. The computer readable storage device of claim 16, wherein training the second neural network comprises assigning labels representing the context-independent states to the vectors of acoustic data.
  - 19. The computer readable storage device of claim 15, wherein the multiple context-dependent states are derived from the plurality of context-independent states using one or more decision trees.
  - 20. The computer readable storage device of claim 15, wherein the multiple context-dependent states are derived from the plurality of context-independent states using divisive, likelihood-based K-means clustering.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Bacchiani, Michiel A.U., Rybach, David

Granted Patent

US 9,620,145 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/202
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 25/30   using neural networks

CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

63 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

63 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others