Speech data recognition method, apparatus, and server for distinguishing regional accent

US 9,928,831 B2
Filed: 12/18/2014
Issued: 03/27/2018
Est. Priority Date: 12/19/2013
Status: Active Grant

First Claim

Patent Images

1. A speech data recognition method for distinguishing regional accents, comprising:

receiving speech data from a user device, wherein the user device comprises a processor and a memory;

calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises;

extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm;

obtaining a noise data of the received speech data based on the extracted fundamental tone data; and

calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data;

selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold;

calculating a speech recognition confidence of the selected portion of the received speech data;

screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and

determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech data recognition method, apparatus, and server are for distinguishing regional accent. The speech data recognition method includes: calculating a speech recognition confidence and/or a signal-to-noise ratio of the speech data, and screening a regional speech data from the speech data based on the speech recognition confidence and/or the signal-to-noise ratio of the speech dat; and determining a region to which the regional speech data belongs based on a regional attribute of the regional speech data. The regional speech data are automatically recognized from the mass speech data by calculating the speech recognition confidence, the signal-to-noise ratio of the speech data or the combination thereof, thereby avoiding manual labeling of the speech data and enhancing the efficiency of the speech data processing.

5 Citations

9 Claims

1. A speech data recognition method for distinguishing regional accents, comprising:
- receiving speech data from a user device, wherein the user device comprises a processor and a memory;
  
  calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises;
  
  extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm;
  
  obtaining a noise data of the received speech data based on the extracted fundamental tone data; and
  
  calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data;
  
  selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold;
  
  calculating a speech recognition confidence of the selected portion of the received speech data;
  
  screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and
  
  determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein the regional attribute includes a location corresponding to a source IP address of the received speech data, or a location corresponding to a source user of the received speech data.
  - 3. The method according to claim 1, wherein the calculating the speech recognition confidence of the selected portion of the received speech data comprises:
    - calculating the speech recognition confidence of the selected portion of the received speech data based on likelihood, state residing information, likelihood ratio of the selected portion of the received speech data, or a combination thereof.
  - 4. The method according to claim 1, wherein the fundamental tone extracting algorithm comprises at least one of a spectral subtraction, a Wiener-filtration, or a short-term spectrum minimum mean square error estimation method.

5. A speech data recognition apparatus for distinguishing regional accents, comprising:
- one or more hardware processors and a memory, the one or more hardware processors configured to;
  
  receive speech data from a user device;
  
  extract a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm;
  
  obtain a noise data of the received speech data based on the extracted fundamental tone data;
  
  determine a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data to calculate a signal-to-noise ratio of the received speech data;
  
  select a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold;
  
  calculate a speech recognition confidence of the selected portion of the received speech data;
  
  screen a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and
  
  determine a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus according to claim 5, wherein the regional attribute comprises a location corresponding to a source IP address of the received speech data, or a location corresponding to a source user of the received speech data.
  - 7. The apparatus according to claim 5, wherein the one or more hardware processors configured to calculate the speech recognition confidence of the selected portion of the received speech data based on likelihood, state residing information, likelihood ratio of the selected portion of the received speech data, or a combination thereof.
  - 8. The apparatus according to claim 5, wherein the fundamental tone extracting algorithm comprises at least one of a spectral subtraction, a Wiener-filtration, or a short-term spectrum minimum mean square error estimation method.

9. A server for performing a speech data recognition for distinguishing regional accents in received speech data, the server comprising:
- a processor,a memory, anda computer program, wherein the computer program is stored in the memory, wherein the computer program is executed by the processor, and wherein the computer program comprises instructions for;
  
  calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises;
  
  extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm;
  
  obtaining a noise data of the received speech data based on the extracted fundamental tone data; and
  
  calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data;
  
  selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold;
  
  calculating a speech recognition confidence of the selected portion of the received speech data;
  
  screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and
  
  determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Original Assignee
Baidu Online Network Technology (Beijing) Co., Ltd (Baidu Incorporated)
Inventors
Su, Dan, Yin, Zhao
Primary Examiner(s)
Pham, Thierry L

Application Number

US14/896,368
Publication Number

US 20160284344A1
Time in Patent Office

1,195 Days
Field of Search

704226, 704233
US Class Current
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/01   Assessment or evaluation of...

G10L 15/02   Feature extraction for spee...

G10L 15/06   Creation of reference templ...

G10L 15/07   to the speaker

G10L 25/84   for discriminating voice fr...

G10L 25/90   Pitch determination of spee...

Speech data recognition method, apparatus, and server for distinguishing regional accent

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

5 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Speech data recognition method, apparatus, and server for distinguishing regional accent

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

5 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links