METHOD AND DEVICE FOR ACOUSTIC LANGUAGE MODEL TRAINING

US 20140222417A1
Filed: 12/17/2013
Published: 08/07/2014
Est. Priority Date: 02/01/2013
Status: Active Grant

First Claim

Patent Images

1. A method of training an acoustic language model, comprising:

at a device having one or more processors and memory;

conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels;

performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels;

using the first word segmentation data containing word class labels to train a first language model containing word class labels;

using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and

in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.

Citations

20 Claims

1. A method of training an acoustic language model, comprising:
- at a device having one or more processors and memory;
  
  conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels;
  
  performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels;
  
  using the first word segmentation data containing word class labels to train a first language model containing word class labels;
  
  using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and
  
  in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the initial word segmentation data containing no word class labels; and
      
      replacing the one or more respective words in the initial word segmentation data containing no word class labels with the identified respective word class labels to obtain the first word segmentation data containing word class labels.
  - 3. The method of claim 1, wherein using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the training samples in the training corpus;
      
      replacing the one or more respective words in the training samples with the identified respective word class labels to obtain new training samples containing word class labels; and
      
      conducting word segmentation for the new training samples using the first language model containing word class labels, to obtain the second word segmentation data containing word class labels.
  - 4. The method of claim 3, further comprising:
    - after obtaining the second word segmentation data containing word class labels;
      
      comparing segmentation results of corresponding training samples in the first and the second word segmentation data; and
      
      in accordance with a determination that the first word segmentation data is consistent with the second word segmentation data, approving the second word segmentation data for use in the training of the acoustic language model.
  - 5. The method of claim 4, further comprising:
    - after obtaining the second word segmentation data containing word class labels;
      
      in accordance with a determination that the first word segmentation data is inconsistent with the second word segmentation data, retrain the first language model using the second word segmentation data.
  - 6. The method of claim 5, further comprising:
    - after the first language model is retrained, repeating the word segmentation for the second training sample using the first language model containing word class labels, to obtain revised second word segmentation data; and
      
      in accordance with a determination that the revised second word segmentation data is consistent with the second word segmentation data, approving the revised second word segmentation data for use in the training of the acoustic language model.
  - 7. The method of claim 4, wherein a determining that the first word segmentation data is consistent with the second word segmentation data further comprises a determination that respective word class label replacements in the first word segmentation data are identical to respective word class label replacements in the second word segmentation data.

8. A system for training an acoustic language model, comprising:
- one or more processors; and
  
  memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising;
  
  conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels;
  
  performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels;
  
  using the first word segmentation data containing word class labels to train a first language model containing word class labels;
  
  using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and
  
  in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the initial word segmentation data containing no word class labels; and
      
      replacing the one or more respective words in the initial word segmentation data containing no word class labels with the identified respective word class labels to obtain the first word segmentation data containing word class labels.
  - 10. The system of claim 8, wherein using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the training samples in the training corpus;
      
      replacing the one or more respective words in the training samples with the identified respective word class labels to obtain new training samples containing word class labels; and
      
      conducting word segmentation for the new training samples using the first language model containing word class labels, to obtain the second word segmentation data containing word class labels.
  - 11. The system of claim 10, wherein the operations further comprise:
    - after obtaining the second word segmentation data containing word class labels;
      
      comparing segmentation results of corresponding training samples in the first and the second word segmentation data; and
      
      in accordance with a determination that the first word segmentation data is consistent with the second word segmentation data, approving the second word segmentation data for use in the training of the acoustic language model.
  - 12. The system of claim 11, wherein the operations further comprise:
    - after obtaining the second word segmentation data containing word class labels;
      
      in accordance with a determination that the first word segmentation data is inconsistent with the second word segmentation data, retrain the first language model using the second word segmentation data.
  - 13. The system of claim 12, wherein the operations further comprise:
    - after the first language model is retrained, repeating the word segmentation for the second training sample using the first language model containing word class labels, to obtain revised second word segmentation data; and
      
      in accordance with a determination that the revised second word segmentation data is consistent with the second word segmentation data, approving the revised second word segmentation data for use in the training of the acoustic language model.
  - 14. The system of claim 11, wherein a determining that the first word segmentation data is consistent with the second word segmentation data further comprises a determination that respective word class label replacements in the first word segmentation data are identical to respective word class label replacements in the second word segmentation data.

15. A non-transitory computer-readable medium for training an acoustic language model, having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
- conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels;
  
  performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels;
  
  using the first word segmentation data containing word class labels to train a first language model containing word class labels;
  
  using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and
  
  in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable medium of claim 15, wherein performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the initial word segmentation data containing no word class labels; and
      
      replacing the one or more respective words in the initial word segmentation data containing no word class labels with the identified respective word class labels to obtain the first word segmentation data containing word class labels.
  - 17. The computer-readable medium of claim 15, wherein using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels further comprises:
    - identifying, in a classification glossary, respective word class labels for one or more respective words in the training samples in the training corpus;
      
      replacing the one or more respective words in the training samples with the identified respective word class labels to obtain new training samples containing word class labels; and
      
      conducting word segmentation for the new training samples using the first language model containing word class labels, to obtain the second word segmentation data containing word class labels.
  - 18. The computer-readable medium of claim 17, wherein the operations further comprise:
    - after obtaining the second word segmentation data containing word class labels;
      
      comparing segmentation results of corresponding training samples in the first and the second word segmentation data; and
      
      in accordance with a determination that the first word segmentation data is consistent with the second word segmentation data, approving the second word segmentation data for use in the training of the acoustic language model.
  - 19. The computer-readable medium of claim 18, wherein the operations further comprise:
    - after obtaining the second word segmentation data containing word class labels;
      
      in accordance with a determination that the first word segmentation data is inconsistent with the second word segmentation data, retrain the first language model using the second word segmentation data.
  - 20. The computer-readable medium of claim 19, wherein the operations further comprise:
    - after the first language model is retrained, repeating the word segmentation for the second training sample using the first language model containing word class labels, to obtain revised second word segmentation data; and
      
      in accordance with a determination that the revised second word segmentation data is consistent with the second word segmentation data, approving the revised second word segmentation data for use in the training of the acoustic language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
LU, Duling, LI, Lu, RAO, Feng, CHEN, Bo, LU, Li, ZHANG, Xiang, WANG, Eryu, YUE, Shuai

Granted Patent

US 9,396,723 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/40   Processing or translation o...

G10L 15/063   Training

G10L 15/183   using context dependencies,...

METHOD AND DEVICE FOR ACOUSTIC LANGUAGE MODEL TRAINING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND DEVICE FOR ACOUSTIC LANGUAGE MODEL TRAINING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links