SPEAKER ADAPTION METHOD AND APPARATUS, AND STORAGE MEDIUM

US 20180366109A1
Filed: 03/22/2018
Published: 12/20/2018
Est. Priority Date: 06/16/2017
Status: Active Grant

First Claim

Patent Images

1. A speaker adaption method, comprising:

acquiring first speech data of a target speaker;

inputting the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model comprising a speech parameter of the target speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker adaption method and a speaker adaption apparatus, a device and a storage medium are provided. The method includes: acquiring first speech data of a target speaker; inputting the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model including a speech parameter of the target speaker.

11 Citations

View as Search Results

18 Claims

1. A speaker adaption method, comprising:
- acquiring first speech data of a target speaker;
  
  inputting the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model comprising a speech parameter of the target speaker.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, further comprising:
    - acquiring the speech parameter of the target speaker according to second speech data of the target speaker;
      
      inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire corresponding text information.
  - 3. The method according to claim 2, further comprising:
    - acquiring speech data of a reference speaker;
      
      performing a training according to the speech data of the reference speaker to acquire the BN network comprising a global speech parameter and the speech recognition model comprising the global speech parameter.
  - 4. The method according to claim 3, wherein inputting the first speech data to the pre-trained BN network to be subjected to the adaptive training to acquire the speech recognition model comprising the speech parameter of the target speaker comprises:
    - inputting the first speech data to the BN network to acquire the speech parameter of the target speaker,replacing the global speech parameter with the speech parameter of the target speaker to acquire the speech recognition model comprising the speech parameter of the target speaker.
  - 5. The method according to claim 3, wherein inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire the corresponding text information comprises:
    - calculating weights for the speech parameter of the target speaker and the global speech parameter;
      
      inputting the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 6. The method according to claim 1, wherein the speech parameter is at least one of a variance and an average.

7. A speaker adaption apparatus, comprising:
- one or more processors;
  
  a memory;
  
  one or more software modules stored in the memory and executable by the one or more processors, and comprising;
  
  a speech data acquiring module configured to acquire first speech data of a target speaker;
  
  a model training module configured to input the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model comprising a speech parameter of the target speaker.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The apparatus according to claim 7, further comprising:
    - a speech recognition module configured to acquire the speech parameter of the target speaker according to second speech data of the target speaker, and acquire the corresponding text information by inputting the speech parameter of the target speaker into the speech recognition model for recognition.
  - 9. The apparatus according to claim 8, whereinthe speech data acquiring module is further configured to acquire speech data of a reference speaker;
    - andthe model training module is further configured to perform a training according to the speech data of the reference speaker to acquire the BN network comprising a global speech parameter and the speech recognition model comprising the global speech parameter.
  - 10. The apparatus according to claim 9, wherein the model training module is specifically configured to:
    - input the first speech data into the BN network to acquire the speech parameter of the target speaker,replace the global speech parameter with the speech parameter of the target speaker to acquire the speech recognition model comprising the speech parameter of the target speaker.
  - 11. The apparatus according to claim 9, wherein the speech recognition module is specifically configured to:
    - calculate weights for the speech parameter of the target speaker and the global speech parameter;
      
      input the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 12. The apparatus according to claim 7, wherein the speech parameter is at least one of a variance and an average.

13. A computer-readable storage medium having stored therein computer programs that, when executed by a processor of a terminal, cause the terminal to perform a speaker adaption method, the method comprising:
- acquiring first speech data of a target speaker;
  
  inputting the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model comprising a speech parameter of the target speaker.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-readable storage medium according to claim 13, wherein the method further comprises:
    - acquiring the speech parameter of the target speaker according to second speech data of the target speaker;
      
      inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire corresponding text information.
  - 15. The computer-readable storage medium according to claim 14, wherein the method further comprises:
    - acquiring speech data of a reference speaker;
      
      performing a training according to the speech data of the reference speaker to acquire the BN network comprising a global speech parameter and the speech recognition model comprising the global speech parameter.
  - 16. The computer-readable storage medium according to claim 15, wherein inputting the first speech data to the pre-trained BN network to be subjected to the adaptive training to acquire the speech recognition model comprising the speech parameter of the target speaker comprises:
    - inputting the first speech data to the BN network to acquire the speech parameter of the target speaker,replacing the global speech parameter with the speech parameter of the target speaker to acquire the speech recognition model comprising the speech parameter of the target speaker.
  - 17. The computer-readable storage medium according to claim 15, wherein inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire the corresponding text information comprises:
    - calculating weights for the speech parameter of the target speaker and the global speech parameter;
      
      inputting the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 18. The computer-readable storage medium according to claim 13, wherein the speech parameter is at least one of a variance and an average.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Original Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Inventors
HUANG, Jun, LI, Xiangang, JIANG, Bing

Granted Patent

US 10,665,225 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

SPEAKER ADAPTION METHOD AND APPARATUS, AND STORAGE MEDIUM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

SPEAKER ADAPTION METHOD AND APPARATUS, AND STORAGE MEDIUM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others