Synchronization of an input text of a speech with a recording of the speech
First Claim
1. A method for synchronizing words in an input text of a speech with a recording of the speech, comprising:
- performing, by a processor of a computer system, speech recognition of input speech data representing the speech, by comparing the input speech data with pronunciation speech data associated with the input text, to generate a recognition text comprising recognized words of the input text;
determining, by the processor of the computer system, by comparing the input text with the recognition text, an erroneous recognition text comprising words of the input text not matching respective words of the recognition text;
generating, by the processor of the computer system, synthetic speech data corresponding to the erroneous recognition text;
computing, by the processor of the computer system, from the input speech data to which each word of the synthetic speech data corresponds, ratio data comprising a ratio of a pronunciation time in the input speech data of each word of the erroneous recognition text to a pronunciation time in the input speech data of each other word of the erroneous recognition text; and
determining, by the processor of the computer system, based on the computed ratio data, an association between each word of the erroneous recognition text and a time to reproduce the input speech data corresponding to said each word of the erroneous recognition text.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.
-
Citations
21 Claims
-
1. A method for synchronizing words in an input text of a speech with a recording of the speech, comprising:
-
performing, by a processor of a computer system, speech recognition of input speech data representing the speech, by comparing the input speech data with pronunciation speech data associated with the input text, to generate a recognition text comprising recognized words of the input text; determining, by the processor of the computer system, by comparing the input text with the recognition text, an erroneous recognition text comprising words of the input text not matching respective words of the recognition text; generating, by the processor of the computer system, synthetic speech data corresponding to the erroneous recognition text; computing, by the processor of the computer system, from the input speech data to which each word of the synthetic speech data corresponds, ratio data comprising a ratio of a pronunciation time in the input speech data of each word of the erroneous recognition text to a pronunciation time in the input speech data of each other word of the erroneous recognition text; and determining, by the processor of the computer system, based on the computed ratio data, an association between each word of the erroneous recognition text and a time to reproduce the input speech data corresponding to said each word of the erroneous recognition text. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product, comprising a computer-readable storage device having a computer-readable program code stored therein, said computer-readable program code containing instructions that, when executed by a processor of a computer system, implement a method for synchronizing words in an input text of a speech with a recording of the speech, said method comprising:
-
performing speech recognition of input speech data representing the speech, by comparing the input speech data with pronunciation speech data associated with the input text, to generate a recognition text comprising recognized words of the input text; determining, by comparing the input text with the recognition text, an erroneous recognition text comprising words of the input text not matching respective words of the recognition text; generating synthetic speech data corresponding to the erroneous recognition text; computing, from the input speech data to which each word of the synthetic speech data corresponds, ratio data comprising a ratio of a pronunciation time in the input speech data of each word of the erroneous recognition text to a pronunciation time in the input speech data of each other word of the erroneous recognition text; and determining, based on the computed ratio data, an association between each word of the erroneous recognition text and a time to reproduce the input speech data corresponding to said each word of the erroneous recognition text. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. The computer program product of clam 8, further comprising:
performing speech recognition of the input speech data corresponding to the erroneous recognition text to generate a second recognition text, determining, by comparing the second recognition text with the erroneous recognition text, a second erroneous recognition text, and computing the ratio data based at least in part on the second erroneous recognition text.
-
15. A computer system comprising:
-
a processor and a computer-readable memory unit coupled to the processor, said memory unit containing instructions that, when executed by the processor, implement a method for synchronizing words in an input text of a speech with a recording of the speech, said method comprising; performing speech recognition of input speech data representing the speech, by comparing the input speech data with pronunciation speech data associated with the input text, to generate a recognition text comprising recognized words of the input text; determining, by comparing the input text with the recognition text, an erroneous recognition text comprising words of the input text not matching respective words of the recognition text; generating synthetic speech data corresponding to the erroneous recognition text; computing, from the input speech data to which each word of the synthetic speech data corresponds, ratio data comprising a ratio of a pronunciation time in the input speech data of each word of the erroneous recognition text to a pronunciation time in the input speech data of each other word of the erroneous recognition text; and determining, based on the computed ratio data, an association between each word of the erroneous recognition text and a time to reproduce the input speech data corresponding to said each word of the erroneous recognition text. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification