MAPPING BETWEEN SPEECH SIGNAL AND TRANSCRIPT
First Claim
1. A computer-implemented method for mapping between a speech signal and a transcript of the speech signal, the method comprising:
- obtaining the speech signal and the transcript of the speech signal;
segmenting the speech signal to obtain one or more segmented speech signals;
segmenting the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal;
performing automatic speech recognition of the one or more segmented speech signals to obtain recognized texts;
converting the recognized texts into estimated phone sequences;
converting the one or more segmented transcripts of the speech signal into reference phone sequences;
calculating costs of correspondences between the estimated phone sequences and the reference phone sequences;
determining a series of the estimated phone sequences, the series of the estimated phone sequences being with a smallest cost;
selecting a partial series of the estimated phone sequences, from the series of the estimated phone sequences; and
generating mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences, wherein the corresponding series corresponds to the partial series of the estimated phone sequences.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, a computer program product, and a computer system for mapping between a speech signal and a transcript of the speech signal. The computer system segments the speech signal to obtain one or more segmented speech signals and the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal. The computer system generates estimated phone sequences and reference phone sequences, calculates costs of correspondences between the estimated phone sequences and the reference phone sequences, determines a series of the estimated phone sequences with a smallest cost, selects a partial series of the estimated phone sequences from the series of the estimated phone sequences, and generates mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences.
-
Citations
18 Claims
-
1. A computer-implemented method for mapping between a speech signal and a transcript of the speech signal, the method comprising:
-
obtaining the speech signal and the transcript of the speech signal; segmenting the speech signal to obtain one or more segmented speech signals; segmenting the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal; performing automatic speech recognition of the one or more segmented speech signals to obtain recognized texts; converting the recognized texts into estimated phone sequences; converting the one or more segmented transcripts of the speech signal into reference phone sequences; calculating costs of correspondences between the estimated phone sequences and the reference phone sequences; determining a series of the estimated phone sequences, the series of the estimated phone sequences being with a smallest cost; selecting a partial series of the estimated phone sequences, from the series of the estimated phone sequences; and generating mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences, wherein the corresponding series corresponds to the partial series of the estimated phone sequences. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for mapping between a speech signal and a transcript of the speech signal, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable to:
-
obtain the speech signal and the transcript of the speech signal; segment the speech signal to obtain one or more segmented speech signals; segment the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal; perform automatic speech recognition of the one or more segmented speech signals to obtain recognized texts; convert the recognized texts into estimated phone sequences; convert the one or more segmented transcripts of the speech signal into reference phone sequences; calculate costs of correspondences between the estimated phone sequences and the reference phone sequences; determine a series of the estimated phone sequences, the series of the estimated phone sequences being with a smallest cost; select a partial series of the estimated phone sequences, from the series of the estimated phone sequences; and generate mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences, wherein the corresponding series corresponds to the partial series of the estimated phone sequences. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer system for mapping between a speech signal and a transcript of the speech signal, the computer system comprising:
-
one or more processors, one or more computer readable tangible storage devices, and program instructions stored on at least one of the one or more computer readable tangible storage devices for execution by at least one of the one or more processors, the program instructions executable to; obtain the speech signal and the transcript of the speech signal; segment the speech signal to obtain one or more segmented speech signals; segment the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal; perform automatic speech recognition of the one or more segmented speech signals to obtain recognized texts; convert the recognized texts into estimated phone sequences; convert the one or more segmented transcripts of the speech signal into reference phone sequences; calculate costs of correspondences between the estimated phone sequences and the reference phone sequences; determine a series of the estimated phone sequences, the series of the estimated phone sequences being with a smallest cost; select a partial series of the estimated phone sequences, from the series of the estimated phone sequences; and generate mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences, wherein the corresponding series corresponds to the partial series of the estimated phone sequences. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification