Voice recognition terminal, voice recognition server, and voice recognition method for performing personalized voice recognition
First Claim
1. A voice recognition terminal, comprising:
- one or more units being configured and executed by a processor using an algorithm associated with least one non-transitory storage device, the one or more units comprising,a feature extraction unit for extracting feature data from an input voice signal;
an acoustic score calculation unit for calculating acoustic model scores using the feature data;
a data selection unit for selecting acoustic model scores to be transmitted to a voice recognition server;
a communication unit for transmitting the acoustic model scores and state information to a voice recognition server in units of one or more frames, and receiving transcription data from the voice recognition server;
a storage unit for matching the extracted feature data with the transcription data received from the voice recognition server, and storing a result of matching as adaptation data; and
an acoustic model adaptation unit for performing adaptation of an acoustic model using the stored adaptation data, whereinthe adaptation of the acoustic model is performed by the processor based on the transcription data received from the voice recognition server in response to detection of a time corresponding to at least one of a preset time, a time during which the voice signal not being input, and a time during which communication with the voice recognition server is not performed, whereinthe transcription data is recognized using a calculated path of a language network when the voice recognition server calculates the path of the language network using the acoustic model scores, whereinthe data selection unit selects n-best state scores of a last hidden layer from among the calculated acoustic model scores, the last hidden layer being configured between an input layer and an output layer and multiple hidden layers being configured between the input layer and the output layer, and a number of states corresponding to the last hidden layer is less than a number of states corresponding to the output layer, whereinthe acoustic model of the voice recognition terminal includes hidden layers up to the last hidden layer, and the voice recognition server includes a model corresponding to a final output layer, and whereinthe voice recognition server calculate a final acoustic model score by applying only the n-best state scores of the last hidden layer received from the voice recognition terminal corresponding to the final output layer.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice recognition terminal, a voice recognition server, and a voice recognition method for performing personalized voice recognition. The voice recognition terminal includes a feature extraction unit for extracting feature data from an input voice signal, an acoustic score calculation unit for calculating acoustic model scores using the feature data, and a communication unit for transmitting the acoustic model scores and state information to a voice recognition server in units of one or more frames, and receiving transcription data from the voice recognition server, wherein the transcription data is recognized using a calculated path of a language network when the voice recognition server calculates the path of the language network using the acoustic model scores.
46 Citations
7 Claims
-
1. A voice recognition terminal, comprising:
-
one or more units being configured and executed by a processor using an algorithm associated with least one non-transitory storage device, the one or more units comprising, a feature extraction unit for extracting feature data from an input voice signal; an acoustic score calculation unit for calculating acoustic model scores using the feature data; a data selection unit for selecting acoustic model scores to be transmitted to a voice recognition server; a communication unit for transmitting the acoustic model scores and state information to a voice recognition server in units of one or more frames, and receiving transcription data from the voice recognition server; a storage unit for matching the extracted feature data with the transcription data received from the voice recognition server, and storing a result of matching as adaptation data; and an acoustic model adaptation unit for performing adaptation of an acoustic model using the stored adaptation data, wherein the adaptation of the acoustic model is performed by the processor based on the transcription data received from the voice recognition server in response to detection of a time corresponding to at least one of a preset time, a time during which the voice signal not being input, and a time during which communication with the voice recognition server is not performed, wherein the transcription data is recognized using a calculated path of a language network when the voice recognition server calculates the path of the language network using the acoustic model scores, wherein the data selection unit selects n-best state scores of a last hidden layer from among the calculated acoustic model scores, the last hidden layer being configured between an input layer and an output layer and multiple hidden layers being configured between the input layer and the output layer, and a number of states corresponding to the last hidden layer is less than a number of states corresponding to the output layer, wherein the acoustic model of the voice recognition terminal includes hidden layers up to the last hidden layer, and the voice recognition server includes a model corresponding to a final output layer, and wherein the voice recognition server calculate a final acoustic model score by applying only the n-best state scores of the last hidden layer received from the voice recognition terminal corresponding to the final output layer. - View Dependent Claims (2, 3, 4)
-
-
5. A voice recognition server including a processor, the server comprising:
-
a reception unit for receiving, from a voice recognition terminal that extracts feature data from a voice signal and calculates acoustic model scores, both state information and the acoustic model scores that are clustered into units of one or more frames; a voice recognition unit for generating transcription data by applying the received acoustic model scores to a large-capacity language network; and a transmission unit for transmitting the transcription data, generated as a result of voice recognition, to the voice recognition terminal, wherein the voice recognition unit calculates by the processor a final acoustic model score by applying n-best state scores of a last hidden layer, received from the voice recognition terminal, to a model corresponding to a final output layer, and, performs voice recognition using the calculated final acoustic model score, the last hidden layer being configured between an input layer and an output layer and a multiple hidden layers being configured between the input layer and the output layer, and a number of states corresponding to the last hidden layer is less than a number of states corresponding to the output layer, wherein the acoustic model of the voice recognition terminal includes hidden layers up to the last hidden layer, and the voice recognition server includes a model corresponding to a final output layer, and wherein the voice recognition server calculate a final acoustic model score by applying only the n-best state scores of the last hidden layer received from the voice recognition terminal corresponding to the final output layer.
-
-
6. A computer-implemented voice recognition method using a voice recognition terminal, comprising:
-
extracting feature data from an input voice signal; calculating acoustic model scores using the extracted feature data; selecting acoustic model scores to be transmitted to the voice recognition server; transmitting the acoustic model scores and state information to a voice recognition server in units of one or more frames; matching the extracted feature data with the transcription data received from the voice recognition server, and storing a result of matching as adaptation data; performing adaptation of an acoustic model using the stored adaptation data, wherein the adaptation of the acoustic model is performed by the processor based on the transcription data received from the voice recognition server in response to detection of a time corresponding to at least one of a preset time, a time during which the voice signal not being input, and a time during which communication with the voice recognition server is not performed; selecting n-best state scores of a last hidden layers from among the calculated acoustic model scores, the last hidden layer being configured between an input layer and an output layer and a multiple hidden layers being configured between the input layer and the output layer, and a number of states corresponding to the last hidden layer is less than a number of states corresponding to the output layer; and receiving transcription data from the voice recognition server, wherein the transcription data is recognized using a calculated path of a language network when the voice recognition server calculates the path of the language network using the acoustic model scores, wherein the acoustic model of the voice recognition terminal includes hidden layers up to the last hidden layer, and the voice recognition server includes a model corresponding to a final output layer; calculating a final acoustic model score by applying only the n-best state scores of the last hidden layer received from the voice recognition terminal corresponding to the final output layer. - View Dependent Claims (7)
-
Specification