Speaker adaption method and apparatus, and storage medium

US 10,665,225 B2
Filed: 03/22/2018
Issued: 05/26/2020
Est. Priority Date: 06/16/2017
Status: Active Grant

First Claim

Patent Images

1. A speaker adaption method, comprising:

acquiring speech data of a reference speaker;

performing a training according to the speech data of the reference speaker to acquire a batch normalization (BN) network comprising a global speech parameter and a speech recognition model comprising the global speech parameter;

acquiring first speech data of a target speaker;

inputting the first speech data to the BN network to acquire a speech parameter of the target speaker, and replacing the global speech parameter with the speech parameter of the target speaker to acquire a speech recognition model comprising the speech parameter of the target speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker adaption method and a speaker adaption apparatus, a device and a storage medium are provided. The method includes: acquiring first speech data of a target speaker; inputting the first speech data to a pre-trained batch normalization (BN) network to be subjected to an adaptive training to acquire a speech recognition model including a speech parameter of the target speaker.

11 Citations

12 Claims

1. A speaker adaption method, comprising:
- acquiring speech data of a reference speaker;
  
  performing a training according to the speech data of the reference speaker to acquire a batch normalization (BN) network comprising a global speech parameter and a speech recognition model comprising the global speech parameter;
  
  acquiring first speech data of a target speaker;
  
  inputting the first speech data to the BN network to acquire a speech parameter of the target speaker, and replacing the global speech parameter with the speech parameter of the target speaker to acquire a speech recognition model comprising the speech parameter of the target speaker.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, further comprising:
    - acquiring the speech parameter of the target speaker according to second speech data of the target speaker;
      
      inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire corresponding text information.
  - 3. The method according to claim 1, wherein inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire the corresponding text information comprises:
    - calculating weights for the speech parameter of the target speaker and the global speech parameter;
      
      inputting the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 4. The method according to claim 1, wherein the speech parameter is at least one of a variance and an average.

5. A speaker adaption apparatus, comprising:
- one or more processors;
  
  a memory;
  
  one or more software modules stored in the memory and executable by the one or more processors, and comprising;
  
  a speech data acquiring module configured to acquire speech data of a reference speaker; and
  
  a model training module configured to perform a training according to the speech data of the reference speaker to acquire a batch normalization (BN) network comprising a global speech parameter and a speech recognition model comprising the global speech parameter, wherein;
  
  the speech data acquiring module is further configured to acquire first speech data of a target speaker; and
  
  the model training module is further configured to input the first speech data to the BN network to acquire a speech parameter of the target speaker, replace the global speech parameter with the speech parameter of the target speaker to acquire a speech recognition model comprising 6 the speech parameter of the target speaker.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus according to claim 5, further comprising:
    - a speech recognition module configured to acquire the speech parameter of the target speaker according to second speech data of the target speaker, and acquire the corresponding text information by inputting the speech parameter of the target speaker into the speech recognition model for recognition.
  - 7. The apparatus according to claim 5, wherein the speech recognition module is specifically configured to:
    - calculate weights for the speech parameter of the target speaker and the global speech parameter;
      
      input the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 8. The apparatus according to claim 5, wherein the speech parameter is at least one of a variance and an average.

9. A computer-readable storage medium having stored therein computer programs that, when executed by a processor of a terminal, cause the terminal to perform a speaker adaption method, the method comprising:
- acquiring speech data of a reference speaker;
  
  performing a training according to the speech data of the reference speaker to acquire a batch normalization (BN) network comprising a global speech parameter and a speech recognition model comprising the global speech parameter;
  
  acquiring first speech data of a target speaker;
  
  inputting the first speech data to the BN network to acquire a speech parameter of the target speaker, and replacing the global speech parameter with the speech parameter of the target speaker to acquire a speech recognition model the speech parameter of the target speaker.
- View Dependent Claims (10, 11, 12)
- - 10. The computer-readable storage medium according to claim 9, wherein the method further comprises:
    - acquiring the speech parameter of the target speaker according to second speech data of the target speaker;
      
      inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire corresponding text information.
  - 11. The computer-readable storage medium according to claim 9, wherein inputting the speech parameter of the target speaker into the speech recognition model for recognition to acquire the corresponding text information comprises:
    - calculating weights for the speech parameter of the target speaker and the global speech parameter;
      
      inputting the weights into the speech recognition model for recognition to acquire the corresponding text information.
  - 12. The computer-readable storage medium according to claim 9, wherein the speech parameter is at least one of a variance and an average.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Original Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Inventors
Huang, Jun, Li, Xiangang, Jiang, Bing
Primary Examiner(s)
Colucci, Michael

Application Number

US15/933,064
Publication Number

US 20180366109A1
Time in Patent Office

796 Days
Field of Search

704234, 704 7, 704 2, 704226, 704233, 704235, 704224, 704201, 707713, 707722, 707758
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Speaker adaption method and apparatus, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker adaption method and apparatus, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links