Method and device for training acoustic model, computer device and storage medium
First Claim
1. A method for training an acoustic model, comprising:
- obtaining supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation and the unsupervised speech data is speech data with machine annotation;
extracting speech features from the supervised speech data, and extracting speech features from the unsupervised speech data; and
performing a supervised learning task on the speech features of the supervised speech data, and performing an unsupervised learning task on the speech features of the unsupervised speech data, by using a deep learning network, to train and obtain the acoustic model;
wherein the deep learning network comprises an input layer, at least one hidden layer and an output layer;
wherein the input layer is shared by the supervised learning task and the unsupervised learning task, such that the supervised learning task and the unsupervised learning task are performed in parallel; and
after training the model, a final acoustic model is that of obtained by retaining all the parameters of the model, to retain both outputs of the supervised learning task and outputs of the unsupervised learning task in the reasoning phase, and merging the outputs as a final output.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present disclosure provide a method and a device for training an acoustic model, a computer device and a storage medium. The method includes obtaining supervised speech data and unsupervised speech data, in which, the supervised speech data is speech data with manual annotation and the unsupervised speech data is speech data with machine annotation; extracting speech features from the supervised speech data and the unsupervised speech data; and performing a multi-task learning having a supervised learning task and an unsupervised learning task on the speech features of the supervised speech data and the unsupervised speech data by using a deep learning network, to train and obtain the acoustic model.
52 Citations
20 Claims
-
1. A method for training an acoustic model, comprising:
-
obtaining supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation and the unsupervised speech data is speech data with machine annotation; extracting speech features from the supervised speech data, and extracting speech features from the unsupervised speech data; and performing a supervised learning task on the speech features of the supervised speech data, and performing an unsupervised learning task on the speech features of the unsupervised speech data, by using a deep learning network, to train and obtain the acoustic model; wherein the deep learning network comprises an input layer, at least one hidden layer and an output layer; wherein the input layer is shared by the supervised learning task and the unsupervised learning task, such that the supervised learning task and the unsupervised learning task are performed in parallel; and after training the model, a final acoustic model is that of obtained by retaining all the parameters of the model, to retain both outputs of the supervised learning task and outputs of the unsupervised learning task in the reasoning phase, and merging the outputs as a final output. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer device, comprising:
-
one or more processors; a storage device, configured to store one or more programs; wherein the one or more processors are configured to read the one or more programs from the storage device to perform acts of; obtaining supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation and the unsupervised speech data is speech data with machine annotation; extracting speech features from the supervised speech data, and extracting speech features from the unsupervised speech data; and performing a supervised learning task on the speech features of the supervised speech data, and performing an unsupervised learning task on the speech features of the unsupervised speech data by using a deep learning network, to train and obtain the acoustic model; wherein the deep learning network comprises an input layer, at least one hidden layer and an output layer; wherein the input layer is shared by the supervised learning task and the unsupervised learning task, such that the supervised learning task and the unsupervised learning task are performed in parallel; and after training the model, a final acoustic model is that of obtained by retaining all the parameters of the model, to retain both outputs of the supervised learning task and outputs of the unsupervised learning task in the reasoning phase, and merging the outputs as a final output. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage medium, configured to store computer instructions, wherein when the instructions are executed by a processor, a method for training an acoustic model is implemented and the method comprises:
-
obtaining supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation and the unsupervised speech data is speech data with machine annotation; extracting speech features from the supervised speech data and extracting speech features from the unsupervised speech data; and performing a supervised learning task on the speech features of the supervised speech data, and performing an unsupervised learning task on the speech features of the unsupervised speech data by using a deep learning network, to train and obtain the acoustic model; wherein the deep learning network comprises an input layer, at least one hidden layer and an output layer; wherein the input layer is shared by the supervised learning task and the unsupervised learning task, such that the supervised learning task and the unsupervised learning task are performed in parallel; and after training the model, a final acoustic model is that of obtained by retaining all the parameters of the model, to retain both outputs of the supervised learning task and outputs of the unsupervised learning task in the reasoning phase, and merging the outputs as a final output. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification