Speech data recognition method, apparatus, and server for distinguishing regional accent
First Claim
1. A speech data recognition method for distinguishing regional accents, comprising:
- receiving speech data from a user device, wherein the user device comprises a processor and a memory;
calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises;
extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm;
obtaining a noise data of the received speech data based on the extracted fundamental tone data; and
calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data;
selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold;
calculating a speech recognition confidence of the selected portion of the received speech data;
screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and
determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech data recognition method, apparatus, and server are for distinguishing regional accent. The speech data recognition method includes: calculating a speech recognition confidence and/or a signal-to-noise ratio of the speech data, and screening a regional speech data from the speech data based on the speech recognition confidence and/or the signal-to-noise ratio of the speech dat; and determining a region to which the regional speech data belongs based on a regional attribute of the regional speech data. The regional speech data are automatically recognized from the mass speech data by calculating the speech recognition confidence, the signal-to-noise ratio of the speech data or the combination thereof, thereby avoiding manual labeling of the speech data and enhancing the efficiency of the speech data processing.
5 Citations
9 Claims
-
1. A speech data recognition method for distinguishing regional accents, comprising:
-
receiving speech data from a user device, wherein the user device comprises a processor and a memory; calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises; extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm; obtaining a noise data of the received speech data based on the extracted fundamental tone data; and calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data; selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold; calculating a speech recognition confidence of the selected portion of the received speech data; screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data. - View Dependent Claims (2, 3, 4)
-
-
5. A speech data recognition apparatus for distinguishing regional accents, comprising:
-
one or more hardware processors and a memory, the one or more hardware processors configured to; receive speech data from a user device; extract a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm; obtain a noise data of the received speech data based on the extracted fundamental tone data; determine a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data to calculate a signal-to-noise ratio of the received speech data; select a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold; calculate a speech recognition confidence of the selected portion of the received speech data; screen a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and determine a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data. - View Dependent Claims (6, 7, 8)
-
-
9. A server for performing a speech data recognition for distinguishing regional accents in received speech data, the server comprising:
-
a processor, a memory, and a computer program, wherein the computer program is stored in the memory, wherein the computer program is executed by the processor, and wherein the computer program comprises instructions for; calculating a signal-to-noise ratio of the received speech data, wherein calculating the signal-to-noise ratio of the received speech data comprises; extracting a fundamental tone data of the received speech data by using a fundamental tone extracting algorithm; obtaining a noise data of the received speech data based on the extracted fundamental tone data; and calculating the signal-to-noise ratio of the received speech data by determining a ratio between signal power of the extracted fundamental tone data and signal power of the noise data in the received speech data; selecting a portion of the received speech data having a signal-to-noise ratio greater than a preset threshold; calculating a speech recognition confidence of the selected portion of the received speech data; screening a regional speech data from the selected portion of the speech data based on the speech recognition confidence, wherein the screened regional speech data has a speech recognition confidence between about 30% and about 80%; and determining a region to which the screened regional speech data belongs based on a regional attribute of the screened regional speech data.
-
Specification