Method for Automated Training of a Plurality of Artificial Neural Networks

US 20100217589A1
Filed: 02/17/2010
Published: 08/26/2010
Est. Priority Date: 02/20/2009
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method, operational on at least one processor, for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame, the method comprising:

a computer process for providing a sequence of frames from the training data, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks;

a computer process for assigning to each of the artificial neural networks a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames;

a computer process for determining a common phoneme label for the sequence of frames based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence; and

a computer process for training each artificial neural network using the common phoneme label.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.

Citations

28 Claims

1. A computer implemented method, operational on at least one processor, for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame, the method comprising:
- a computer process for providing a sequence of frames from the training data, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks;
  
  a computer process for assigning to each of the artificial neural networks a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames;
  
  a computer process for determining a common phoneme label for the sequence of frames based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence; and
  
  a computer process for training each artificial neural network using the common phoneme label.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The computer implemented method according to claim 1, wherein one or more frames of the provided sequence of frames are part of one, more than one or no subsequence assigned to an artificial neural network.
  - 3. The computer implemented method according to claim 1, wherein the concatenation or combination of the subsequences corresponds to the provided sequence of frames or wherein the concatenation or combination of the subsequences corresponds to a sequence of frames different from the provided sequence of frames.
  - 4. The computer implemented method according to claim 1, wherein the common phoneme label is based on one or more phoneme labels of the frames from one or more central subsequences or artificial neural networks.
  - 5. The computer implemented method according to claim 1, wherein the predetermined number of frames is the same for each subsequence of frames.
  - 6. The computer implemented method according to claim 1, wherein the frames of the provided sequence of frames are subsequent or adjacent in time and/or wherein the subsequences of the provided sequence are subsequent or adjacent in time.
  - 7. The computer implemented method according to according to claim 1, wherein the plurality of artificial neural networks comprises two subsets, wherein a subsequence assigned to an artificial neural network of a second subset comprises at least one frame which is also part of a subsequence assigned to an artificial neural networks of a first subset.
  - 8. The computer implemented method according to according to claim 1, wherein the subsequences are separated from each other in time, in particular, separated by one or more frames comprised in the provided sequence of frames.
  - 9. The computer implemented method according to claim 8, wherein the plurality of artificial neural networks comprises two subsets, wherein subsequences assigned to artificial neural networks of a first subset are separated from each other in time, and wherein each of the subsequences assigned to artificial neural networks of a second subset comprises at least one frame separating two subsequences assigned to artificial neural networks of the first subset.
  - 10. The computer implemented method according to claim 1, wherein the predetermined number of frames corresponds to the number of frames, which yields, when using only one artificial neural network for phoneme recognition, a predetermined phoneme recognition accuracy, in particular according to a predetermined criterion.
  - 11. The computer implemented method according to claim 10, wherein the predetermined phoneme recognition accuracy corresponds to the maximum phoneme recognition accuracy as a function of the number of frames.
  - 12. The computer implemented method according to claim 1, wherein the predetermined number of frames corresponds to the average phoneme length in the training data, wherein the average is calculated using all phonemes of the training data.
  - 13. The computer implemented method according to claim 1, wherein for each frame of each subsequence of frames a feature vector is provided, in particular comprising a predetermined number of Mel Frequency Cepstral Coefficients.
  - 14. The computer implemented method according to claim 1, further comprising:
    - a computer process for receiving a sequence of frames from a speech signal;
      
      a computer process for assigning to each of the artificial neural networks a different subsequence of the received sequence, wherein each subsequence comprises a predetermined number of frames; and
      
      a computer process for combining the output of the artificial neural networks for estimating posterior probabilities of phonemes, phoneme classes and/or phoneme states.

15. A computer program product including a computer readable storage medium having computer executable code thereon for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame, the computer code comprising:
- computer code for providing a sequence of frames from the training data, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks;
  
  computer code for assigning to each of the artificial neural networks a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames;
  
  computer code for determining a common phoneme label for the sequence of frames based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence; and
  
  computer code for training each artificial neural network using the common phoneme label.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The computer program product according to claim 15, wherein one or more frames of the provided sequence of frames are part of one, more than one or no subsequence assigned to an artificial neural network.
  - 17. The computer program product according claim 15, wherein the concatenation or combination of the subsequences corresponds to the provided sequence of frames or wherein the concatenation or combination of the subsequences corresponds to a sequence of frames different from the provided sequence of frames.
  - 18. The computer program product according to claim 15, wherein the common phoneme label is based on one or more phoneme labels of the frames from one or more central subsequences or artificial neural networks, in particular, on a central frame of a central subsequence.
  - 19. The computer program product according to claim 15, wherein the predetermined number of frames is the same for each subsequence of frames.
  - 20. The computer program product according to claim 15, wherein the frames of the provided sequence of frames are subsequent or adjacent in time and/or wherein the subsequences of the provided sequence are subsequent or adjacent in time.
  - 21. The computer program product according to claim 15, wherein the plurality of artificial neural networks comprises two subsets, wherein a subsequence assigned to an artificial neural networks of a second subset comprises at least one frame which is also part of a subsequence assigned to an artificial neural networks of a first subset.
  - 22. The computer program product according to claim 15, wherein the subsequences are separated from each other in time, in particular, separated by one or more frames comprised in the provided sequence of frames.
  - 23. The computer program product according to claim 22, wherein the plurality of artificial neural networks comprises two subsets, wherein subsequences assigned to artificial neural networks of a first subset are separated from each other in time, and wherein each of the subsequences assigned to artificial neural networks of a second subset comprises at least one frame separating two subsequences assigned to artificial neural networks of the first subset.
  - 24. The computer program product according to claim 15, wherein the predetermined number of frames corresponds to the number of frames, which yields, when using only one artificial neural network for phoneme recognition, a predetermined phoneme recognition accuracy, in particular according to a predetermined criterion.
  - 25. The computer program product according to claim 24, wherein the predetermined phoneme recognition accuracy corresponds to the maximum phoneme recognition accuracy as a function of the number of frames.
  - 26. The computer program product according to claim 15, wherein the predetermined number of frames corresponds to the average phoneme length in the training data, wherein the average is calculated using all phonemes of the training data.
  - 27. The computer program product according to claim 15, wherein for each frame of each subsequence of frames a feature vector is provided, in particular comprising a predetermined number of Mel Frequency Cepstral Coefficients.
  - 28. A computer program product according to claim 15, further comprising:
    - computer code for receiving a sequence of frames from a speech signal;
      
      computer code for assigning to each of the artificial neural networks a different subsequence of the received sequence, wherein each subsequence comprises a predetermined number of frames; and
      
      computer code for combining the output of the artificial neural networks for estimating posterior probabilities of phonemes, phoneme classes and/or phoneme states.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Vasquez, Daniel, Aradilla, Guillermo, Gruhn, Rainer

Granted Patent

US 8,554,555 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/232
CPC Class Codes

G06N 3/045   Combinations of networks

G10L 15/063   Training

G10L 15/16   using artificial neural net...

Method for Automated Training of a Plurality of Artificial Neural Networks

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method for Automated Training of a Plurality of Artificial Neural Networks

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links