SPEECH ENHANCEMENT METHOD, SPEECH RECOGNITION METHOD, CLUSTERING METHOD AND DEVICE

US 20160358599A1
Filed: 06/03/2016
Published: 12/08/2016
Est. Priority Date: 06/03/2015
Status: Abandoned Application

First Claim

Patent Images

1. A speech enhancement method, comprising:

selecting a feature vector clustering center best matched with the feature vector of a first frame speech part contained in a test speech from feature vector clustering centers obtained by training by a selection unit;

performing direct to the feature vectors of other frame speech parts contained in the test speech;

selecting a feature vector clustering center best matched with the feature vector of the speech part from a feature vector clustering center best matched with the feature vector of a previous frame speech part to the speech part and obtained by training and a feature vector clustering center adjacent to the feature vector clustering center best matched with the feature vector of the previous frame speech part, wherein a set formed by each of the feature vector clustering centers obtained by training and at least one adjacent feature vector clustering center thereof has an ability to describe speech continuity; and

reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center by a reconstruction unit; and

performing speech recognition on a the reconstructed feature vector of the test speech by a speech recognition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention discloses a speech enhancement method, a speech recognition method, a clustering method and a device. The method includes: selecting a feature vector clustering center best matched with the feature vector of a first frame speech part of a test speech; performing direct to the feature vectors of other frame speech parts contained in the test speech: selecting a feature vector clustering center best matched with the feature vector of the speech part from a feature vector clustering center best matched with the feature vector of a previous frame speech part to the speech part and a feature vector clustering center adjacent to the feature vector clustering center best matched with the feature vector of the previous frame speech part; and reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center. Because a feature capable of representing speech continuity is utilized during speech enhancement, the present invention can achieve a better speech enhancement effect relative to a traditional speech enhancement model in the prior art.

Citations

18 Claims

1. A speech enhancement method, comprising:
- selecting a feature vector clustering center best matched with the feature vector of a first frame speech part contained in a test speech from feature vector clustering centers obtained by training by a selection unit;
  
  performing direct to the feature vectors of other frame speech parts contained in the test speech;
  
  selecting a feature vector clustering center best matched with the feature vector of the speech part from a feature vector clustering center best matched with the feature vector of a previous frame speech part to the speech part and obtained by training and a feature vector clustering center adjacent to the feature vector clustering center best matched with the feature vector of the previous frame speech part, wherein a set formed by each of the feature vector clustering centers obtained by training and at least one adjacent feature vector clustering center thereof has an ability to describe speech continuity; and
  
  reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center by a reconstruction unit; and
  
  performing speech recognition on a the reconstructed feature vector of the test speech by a speech recognition.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center comprises:
    - performing an interpolation operation on a vector set formed by the feature vectors of all the speech parts contained in the test speech according to the selected feature vector clustering center, so as to obtain the reconstructed feature vector of the test speech.
  - 3. The method according to claim 1, wherein the method, before selecting the feature vector clustering center best matched with the feature vector of the first frame speech part contained in the test speech from the feature vector clustering center obtained by training, further comprises:
    - respectively extracting feature vector samples from each frame speech part contained in a training corpus;
      
      determining the distribution information of the feature vector samples in a multidimensional space;
      
      determining initial clustering centers according to the distribution information;
      
      performing iterative clustering on each initial clustering center to obtain undetermined clustering centers according to the similarity between the feature vector samples and each initial clustering center; and
      
      performing iterative clustering on the undetermined clustering centers to obtain a feature vector clustering center according to given iterative clustering rules;
      
      wherein, the given iterative clustering rules comprise;
      
      performing iterative clustering on the undetermined clustering centers according to the feature vectors of each speech part of the training corpus;
      
      the feature vector pursuant when performing single iterative clustering on the undetermined clustering centers being the feature vector of single speech part in the training corpus; and
      
      the feature vectors respectively pursuant when performing every two adjacent iterative clustering on the undetermined clustering centers being the feature vectors of adjacent speech parts in the training corpus.
  - 4. The method according to claim 3, wherein performing iterative clustering on the undetermined clustering centers to obtain the feature vector clustering center according to the given iterative clustering rules comprises:
    - performing iterative clustering operation direct to each training corpus according to the given iterative clustering rules, and when an iterative convergence condition is satisfied, determining each undetermined clustering center having the parameter value calculated when the iterative convergence condition is satisfied as the feature vector clustering center, wherein the iterative clustering operation comprises the following steps;
      
      determining the similarity between the feature vector of the first frame speech part of the training corpus and the undetermined clustering center best matched with the feature vector of the first frame speech part, and the similarity between the feature vector of the first frame speech part and the undetermined clustering center adjacent to the best matched undetermined clustering center;
      
      performing direct to other frame speech parts of the training corpus;
      
      determining the undetermined clustering center best matched with the speech part, and determining the similarity between the feature vector of the speech part and the best matched undetermined clustering center and the similarity between the feature vector of the speech part and the undetermined clustering center adjacent to the best matched undetermined clustering center from the undetermined clustering center best matched with the feature vector of the previous frame speech part adjacent to the speech part and the clustering center adjacent to the undetermined clustering center best matched with the feature vector of the previous frame speech part adjacent to the speech part in the specific space; and
      
      calculating the parameter values of each undetermined clustering center according to each similarity determined.

5-12. -12. (canceled)

13. An electrical apparatus, comprising:
- a processor; and
  
  an memory for storing commands executed by the processor;
  
  wherein the processor is configured to;
  
  selecting a feature vector clustering center best matched with the feature vector of a first frame speech part contained in a test speech from feature vector clustering centers obtained by training;
  
  performing direct to the feature vectors of other frame speech parts contained in the test speech;
  
  selecting a feature vector clustering center best matched with the feature vector of the speech part from a feature vector clustering center best matched with the feature vector of a previous frame speech part to the speech part and obtained by training and a feature vector clustering center adjacent to the feature vector clustering center best matched with the feature vector of the previous frame speech part, wherein a set formed by each of the feature vector clustering centers obtained by training and at least one adjacent feature vector clustering center thereof has an ability to describe speech continuity;
  
  reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center; and
  
  performing speech recognition on the reconstructed feature vector of the test speech.
- View Dependent Claims (15, 16, 17)
- - 15. The apparatus according to claim 13, wherein the processor is configured to:
    - performing an interpolation operation on a vector set formed by the feature vectors of all the speech parts contained in the test speech according to the selected feature vector clustering center, so as to obtain the reconstructed feature vector of the test speech.
  - 16. The apparatus according to claim 13, wherein the processor is configured to:
    - respectively extracting feature vector samples from each frame speech part contained in a training corpus;
      
      determining the distribution information of the feature vector samples in a multidimensional space;
      
      determining initial clustering centers according to the distribution information;
      
      performing iterative clustering on each initial clustering center to obtain undetermined clustering centers according to the similarity between the feature vector samples and each initial clustering center; and
      
      performing iterative clustering on the undetermined clustering centers to obtain a feature vector clustering center according to given iterative clustering rules;
      
      wherein, the given iterative clustering rules comprise;
      
      performing iterative clustering on the undetermined clustering centers according to the feature vectors of each speech part of the training corpus;
      
      the feature vector pursuant when performing single iterative clustering on the undetermined clustering centers being the feature vector of single speech part in the training corpus; and
      
      the feature vectors respectively pursuant when performing every two adjacent iterative clustering on the undetermined clustering centers being the feature vectors of adjacent speech parts in the training corpus.
  - 17. The apparatus according to claim 13, wherein the processor is configured to:
    - performing iterative clustering operation direct to each training corpus according to the given iterative clustering rules, and when an iterative convergence condition is satisfied, determining each undetermined clustering center having the parameter value calculated when the iterative convergence condition is satisfied as the feature vector clustering center, wherein the iterative clustering operation comprises the following steps;
      
      determining the similarity between the feature vector of the first frame speech part of the training corpus and the undetermined clustering center best matched with the feature vector of the first frame speech part, and the similarity between the feature vector of the first frame speech part and the undetermined clustering center adjacent to the best matched undetermined clustering center;
      
      performing direct to other frame speech parts of the training corpus;
      
      determining the undetermined clustering center best matched with the speech part, and determining the similarity between the feature vector of the speech part and the best matched undetermined clustering center and the similarity between the feature vector of the speech part and the undetermined clustering center adjacent to the best matched undetermined clustering center from the undetermined clustering center best matched with the feature vector of the previous frame speech part adjacent to the speech part and the clustering center adjacent to the undetermined clustering center best matched with the feature vector of the previous frame speech part adjacent to the speech part in the specific space; and
      
      calculating the parameter values of each undetermined clustering center according to each similarity determined.

14. (canceled)

18. A non-transitory computer storage media having computer-executable instructions stored thereon which, when executed by a computer, cause the computer to:
- respectively extracting feature vector samples from each frame speech part contained in a training corpus;
  
  determining the distribution information of the feature vector samples in a multidimensional space;
  
  determining initial clustering centers according to the distribution information;
  
  performing iterative clustering on each initial clustering center to obtain undetermined clustering centers according to the similarity between the feature vector samples and each initial clustering center; and
  
  performing iterative clustering on the undetermined clustering centers to obtain a feature vector clustering center according to given iterative clustering rules;
  
  wherein, the given iterative clustering rules comprise;
  
  performing iterative clustering on the undetermined clustering centers according to the feature vectors of each speech part of the training corpus;
  
  the feature vector pursuant when performing single iterative clustering on the undetermined clustering centers being the feature vector of single speech part in the training corpus; and
  
  the feature vectors respectively pursuant when performing every two adjacent iterative clustering on the undetermined clustering centers being the feature vectors of adjacent speech parts in the training corpus.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Le Shi Zhi Xin Electronic Technology (Tianjin) Limited
Original Assignee
Le Shi Zhi Xin Electronic Technology (Tianjin) Limited
Inventors
WANG, Yujun

Application Number

US15/173,579
Publication Number

US 20160358599A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 3/088   Non-supervised learning, e....

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/20   Speech recognition techniqu...

G10L 2015/0633   using lexical or orthograph...

SPEECH ENHANCEMENT METHOD, SPEECH RECOGNITION METHOD, CLUSTERING METHOD AND DEVICE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH ENHANCEMENT METHOD, SPEECH RECOGNITION METHOD, CLUSTERING METHOD AND DEVICE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links