Apparatus, method, and computer program product for correcting speech recognition error
First Claim
1. An apparatus for correcting a character string in a text, the apparatus comprising:
- a hardware processor electrically coupled to a memory, and configured to;
acquire a first audio of a first speech of a first speaker;
convert the first audio to a first text;
output a first caption image of the first text;
acquire a second audio of a second speech of a second speaker for correcting a character string that is included in the first text;
convert the second audio to a second text;
search the first text for one or more similar blocks that are similar to the second text, wherein each block of the one or more similar blocks comprises at least a portion of a character string that matches the second text;
calculate, for each of the one or more similar blocks, a measure of text similarity between the similar block and the second text;
calculate, for each of the one or more similar blocks, a measure of acoustic similarity by performing acoustic matching between comparison sections of the first audio corresponding to the one or more similar blocks and the second audio;
determine an estimated similarity for each of the one or more similar blocks by at least combining the measure of text similarity and the measure of acoustic similarity in a comparison section that corresponds to the similar block;
determine that a character string in a similar block of the one or more similar blocks is the character string to be corrected, wherein the estimated similarity associated with the similar block is equal to or more than a threshold; and
output a second caption image indicating the first text, the second text, and a position of the character string to be corrected in the first text, and indicating that the character string to be corrected is to be replaced with the second text.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for correcting a character string in a text of an embodiment includes a first converter, a first output unit, a second converter, an estimation unit, and a second output unit. The first converter recognizes a first speech of a first speaker, and converts the first speech to a first text. The first output unit outputs a first caption image indicating the first text. The second converter recognizes a second speech of a second speaker for correcting a character string to be corrected in the first text, and converts the second speech to a second text. The estimation unit estimates the character string to be corrected, based on text matching between the first text and the second text. The second output unit outputs a second caption image indicating that the character string to be corrected is to be replaced with the second text.
19 Citations
11 Claims
-
1. An apparatus for correcting a character string in a text, the apparatus comprising:
a hardware processor electrically coupled to a memory, and configured to; acquire a first audio of a first speech of a first speaker; convert the first audio to a first text; output a first caption image of the first text; acquire a second audio of a second speech of a second speaker for correcting a character string that is included in the first text; convert the second audio to a second text; search the first text for one or more similar blocks that are similar to the second text, wherein each block of the one or more similar blocks comprises at least a portion of a character string that matches the second text; calculate, for each of the one or more similar blocks, a measure of text similarity between the similar block and the second text; calculate, for each of the one or more similar blocks, a measure of acoustic similarity by performing acoustic matching between comparison sections of the first audio corresponding to the one or more similar blocks and the second audio; determine an estimated similarity for each of the one or more similar blocks by at least combining the measure of text similarity and the measure of acoustic similarity in a comparison section that corresponds to the similar block; determine that a character string in a similar block of the one or more similar blocks is the character string to be corrected, wherein the estimated similarity associated with the similar block is equal to or more than a threshold; and output a second caption image indicating the first text, the second text, and a position of the character string to be corrected in the first text, and indicating that the character string to be corrected is to be replaced with the second text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method for correcting a character string in a text, the method comprising:
-
acquiring a first audio of a first speech of a first speaker; converting the first audio to a first text; outputting a first caption image of the first text; acquiring a second audio of a second speech of a second speaker for correcting a character string that is included in the first text; converting the second audio to a second text; searching the first text for one or more similar blocks that are similar to the second text, wherein each block of the one or more similar blocks comprises at least a portion of a character string that matches the second text; calculating, for each of the one or more similar blocks, a measure of text similarity between the similar block and the second text; calculating, for each of the one or more similar blocks, a measure of acoustic similarity by performing acoustic matching between comparison sections of the first audio corresponding to the one or more similar blocks and the second audio; determining an estimated similarity for each of the one or more similar blocks by at least combining the measure of text similarity and the measure of acoustic similarity in a comparison section that corresponds to the similar block; determine that a character string in a similar block of the one or more similar blocks is the character string to be corrected, wherein the estimated similarity associated with the similar block is equal to or more than a threshold; and outputting a second caption image indicating the first text, the second text, and a position of the character string to be corrected in the first text, and indicating that the character string to be corrected is to be replaced with the second text.
-
-
11. A computer program product having a non-transitory computer readable medium including programmed instructions that cause a computer to execute:
-
acquiring a first audio of a first speech of a first speaker; converting the first audio to a first text; outputting a first caption image of the first text; acquiring a second audio of a second speech of a second speaker for correcting a character string that is included in the first speech; converting the second audio to a second text; searching the first text for one or more similar blocks that are similar to the second text, wherein each block of the one or more similar blocks comprises at least a portion of a character string that matches or is close to the second text; calculating, for each of the one or more similar blocks, a measure of text similarity between the similar block and the second text; calculating, for each of the one or more similar blocks, a measure of acoustic similarity by performing acoustic matching between comparison sections of the first audio corresponding to the one or more similar blocks and the second audio; determining an estimated similarity for each of the one or more similar blocks by at least combining the measure of text similarity and the measure of acoustic similarity in a comparison section that corresponds to the similar block; determining that a character string in a similar block of the one or more similar blocks is the character string to be corrected, wherein the estimated similarity associated with the similar block is equal to or more than a threshold; and outputting a second caption image indicating the first text, the second text, and a position of the character string to be corrected in the first text, and indicating that the character string to be corrected is to be replaced with the second text.
-
Specification